| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inline writes skip transfer map/unamp at the cost of an extra copy
on the data during execbuffer. That is generally a win for small
transfers. But the heuristic to use inline writes based on buffer
sizes rather than transfer sizes makes little sense. More
importantly, inline writes miss optimizations that are done for
buffer transfers.
Let's just use transfers.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
virglrender has been changed such that
- VIRGL_CCMD_GET_QUERY_RESULT is fenced
- query buffers (PIPE_BIND_CUSTOM) are coherent
We can check if a query is ready using DRM_IOCTL_VIRTGPU_WAIT, and also
avoid a synchronized transfer to retrieve the query result. When
running against an older virglrenderer, it falls back to the old
behavior automatically.
TF2 @ 640x480 for pts4.dem went from 17fps to 40fps on my testing
machine.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The pitch is actually the number of components per row. We found
the problem when we implemented some meta operations for these
formats and the wrong pitch has been confirmed with a small test case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108325
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes CompressedTexSubImage from a PBO source do proper GPU
rendering to upload instead of stalling to map the PBO source on
the CPU (then copying it on the CPU).
Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this
functionality, and to Jason Ekstrand for writing the code I adapted.
Vulkan only supports a single layer, however, and this code tries to
support multiple layers as long as it's miplevel 0.
Improves performance in Sid Meier's Civilization VI:
Average frame time (ms): -3.67423% +/- 1.46201% (n=5)
99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)
|
|
|
|
|
|
| |
Handled similarly as radeonsi. I checked the offsets are actually used.
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently ppir continues compilation when there is an unsupported
intrinsic, resulting in a shader that will surely not work as intended.
This is a problem during piglit runs as some tests don't compile
properly due to this but actually still get submitted to the gpu and
leave the system in an unstable state after executing, causing further
tests to fail.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
| |
While lima still doesn't support some kinds of intrinsics, it is more
helpful to display the name of the unsupported instr->intrinsic to make
debugging easier.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit a99c360a4630 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.
Cc: Rob Clark <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alistair Strachan <[email protected]>
Cc: Greg Hartman <[email protected]>
Cc: Tapani Pälli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Fixes: a99c360a463 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: John Stultz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to Makefile.sources
In commit 2f0b9d22495 ("freedreno/ir3: lower
load_barycentric_at_offset") a new file was added that needs to
also be added to the Makefile.sources list used by Android and
SCons build system.
Cc: Rob Clark <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alistair Strachan <[email protected]>
Cc: Greg Hartman <[email protected]>
Cc: Tapani Pälli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Fixes: 2f0b9d22495 ("freedreno/ir3: lower load_barycentric_at_offset")
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: John Stultz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ir3_nir_trig.py file was moved in a previous commit,
aa0fed10d3574 (freedreno: move ir3 to common location),
so update the Android.gen.mk file to match.
Cc: Rob Clark <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alistair Strachan <[email protected]>
Cc: Greg Hartman <[email protected]>
Cc: Tapani Pälli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Fixes: aa0fed10d35 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: John Stultz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add libfreedreno_drm/ir3 to the build
Cc: Rob Clark <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alistair Strachan <[email protected]>
Cc: Greg Hartman <[email protected]>
Cc: Tapani Pälli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Fixes: b4476138d5a ("freedreno: move drm to common location")
Fixes: aa0fed10d35 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Amit Pundir <[email protected]>
[jstultz: Tweaked to add extra ir3 files from master]
Signed-off-by: John Stultz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current AOSP master build system breaks building mesa due to the
following error:
external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error:
writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h"
This error is bogus -- nothing "writes" to ir.h -- but the rule is
unnecessary because the generated header that is a dependency of the
non-generated header should be added to LOCAL_GENERATED_SOURCES and this
will track if the dependency needs to be regenerated.
(This change fixes a similar problem affecting nir.h too.)
Cc: Rob Clark <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alistair Strachan <[email protected]>
Cc: Greg Hartman <[email protected]>
Cc: Tapani Pälli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Alistair Strachan <[email protected]>
[jstultz: Forward ported and tweaked commit subject]
Signed-off-by: John Stultz <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Apparently cosited_even was the required one instead of midpoint.
This adds slight offset of 0.5 pixels to the coordinates (+ we need
the image size to convert to normalized coords)
Fixes: 91702374d5d "radv: Add ycbcr lowering pass."
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It was removed in "delete autotools .gitignore files", but the build
directory is created by scons.
[Skip CI]
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Broken on Polaris and since I discovered NV12 is not subsampled, but
a 2-plane format I decided I don't really care.
Work to do to re-enable:
1) Figure out which devices support it natively.
2) Write some software emulation for the others.
Fixes: 52c1adda21b "radv: Add ycbcr format features."
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes the game playable on radeonsi.
Cc: "19.0" "19.1" <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110143
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Basically, when the conditions of a csel diverge, we scalarize to avoid
going into weird code paths during emit. We could be doing better, but
this case can't occur organically from GLSL as far as I can, though it
does fix lowered atan2.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
A previous commit by Tomeu aborted RA early, which solves the memory
corruption issue, but then generates an incorrect compile. This fixes
that.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This handles the usual case. 8-bit register access parallels 16-bit
access, but with one major caveat: in 8-bit mode, only half of the
register file is actually (directly) accessible as sources. In
particular, for each 16-bit integer register (hrN), we can only index a
*single* 8-bit integer (qrN), corresponding to the lower 8-bits. To get
the upper 8-bits, it is required to do an explicit shift. For example,
to add the bytes of a 16-bit integer hr0.x and get the result as an
8-bit qr0, you'd need to do something like:
ilsr hr1.x, hr0.x, #8
iadd qr0.x, qr0.x, qr1.x
This scheme diverges from 32-bit registers, in that both the upper and
lower halves of a 32-bit register are individually accessible as a pair
of half registers. For contrast, to add the lower and upper 16-bits of a
32-bit integer r0.x, you can just:
iadd hr0.x, hr0.x, hr1.x
Since hr1.x = upper 16-bit of r0.x.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Meanwhile, we're forced to disable dest_override, since it's not yet
clear how this interacts with other bitnesses (it'll likely need to be
overhauled in any case).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Per OpenCL.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
We silently ignored certain bits of the mask, which causes issues when
disassembly 8/64-bit ops.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
In preparation for 8-bit and 64-bit operands, let's not reinforce the
32-bit-centric biases in the ISA.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since it is dependent on the tile mode (ie. disabled for smaller mipmap
levels), we should handle it a similar way to fd_resource_level_linear().
The code previously mostly did the right thing because the old helper
took the tile mode.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Best to keep it encapsulated in the helper which returns layer/level
offset (and actually use that helper everywhere) rather than spreading
the logic around the code.
Also add a helper to find UBWC offset, to complete the encapsulation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Small cleanup. They are just an array of data and only ever linear/
uncompressed.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
If someone is importing a buffer, we can't really know the state of it's
contents, so assume it is valid.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are still some fallbacks we'll need to handle before we can enable
UBWC by default. I think we may need to fallback to uncompressed if
image atomic operations are used. And we still need to sort out how to
handle image and sampler views of compressed resources if the image/
sampler view is using a format that does not support compression. (I
think the latter should hopefully be uncommon outside of deqp/piglit.)
But at least this gets us to the point where supertuxkart works properly
with UBWC enabled ;-)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few fixes that get UBWC working for the games/benchmarks where I
noticed problems before (in particular and manhattan, and stk (modulo
image support for UBWC when compute shaders are used for post-process
effects):
+ fix the size of the UBWC meta buffer (ie, the offset to color
pixel data) that is returned by ->fill_ubwc_buffer_sizes()
+ correct size/layout for 8 and 16 byte per pixel formats
+ limit the supported formats.. Note all formats that can be
tiled can be compressed.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Corrects tex state ubwc pitch/size
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20
ca3eb5db665cbcc2de5a5d3158e3dc68f86e5822 went from silently truncating
the constant state, which was also the wrong thing to do, to an assert.
Which then showed up in a couple of dEQPs. Actually there is nothing
wrong with larger constant file so just drop the assert.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
with that we can simplify code where nir vectors are created
v2: merge both lines in nir_vec
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: rename to nir_build_alu_src_arr
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: use vtn_push_ssa and vtn_ssa_value
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that dlist compilation again knows if it is inside glBegin/glEnd,
we can leave the decision if aliasing should occur to the vertex attribute
setter functions instead of doing that at glArrayElement time.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have to use _mesa_inside_dlist_begin_end instead of
_mesa_inside_begin_end to see if we are inside a glBegin/glEnd block in
case of display lists.
So split the is_vertex_position function used in vertex attribute processing
into a imm and dlist variant and use the appropriate _mesa_inside_begin_end
variant.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
| |
That seems to be lost somewhere. Is needed for correct outside begin/end
detection in display list compilation. And is needed for correct aliasing
in dlists restablished in the next changes.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|