| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Diff from shader-db:
Scratch: 3221504 -> 17408 (-99.46 %) bytes per wave
v2: add "break;"
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This reverts commit ffd54d1936fcd07424265b780e1d049222a01e94.
No, it doesn't work. The test case is "glxgears -samples 2".
|
|
|
|
|
|
| |
disabled by 4f1cccf570112f93265a4cace504eb763fa8f73e
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
No idea why this was disabled, but it works fine.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Allocating it has no effect, but it adds overhead (useless DCC clear).
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
If first_level > 0 and DCC is disabled for that level, let's skip DCC
reads entirely.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to keep DCC enabled to save bandwidth. It was a bad idea to disable
it here.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This will get more complicated with mipmapped DCC or when DCC is enabled
after allocation.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
R9G9B9E5 is the only uncompressed one hopefully.
This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.
Cc: 11.1 11.2 12.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
WQM is needed when the PS prolog computes a VGPR that is consumed by a shader
with (implicit or explicit) derivatives.
Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be
effective (otherwise it's just a no-op).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Cc: 12.0 <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use rasterizer provoking vertex API.
Fix rasterizer provoking vertex for tristrips and quad list/strips.
v2: make provoking vertex tables static const
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometimes a register source can actually be double- or even quad-wide.
We must make sure that the inserted texbars take that width into
account.
Based on an earlier patch by Samuel Pitoiset.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: "12.0 11.2" <[email protected]>
|
|
|
|
|
|
|
| |
Reduces CPU load for draw calls that change none or few of the descriptors.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This will simplify moving them to a per-context array.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
So that callers outside of si_descriptors.c need to worry less about the
details of descriptor handling.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: use a function for calculating WORD1 of bo metadata
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We don't import textures with DCC now, but soon we will.
v2: if we can't disable DCC for image writes, at least decompress DCC
at bind time
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like floats, we should use the round toward 0 mode instead of the
nearest one (which is the default) for doubles to integers.
This fixes all arb_gpu_shader_fp64 piglits which convert doubles to
integers (16 tests).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: "11.2 12.0" <[email protected]>
|
|
|
|
|
|
| |
CMASK has no effect on metadata, because it's not sharable.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Ported from Vulkan.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Ported from Vulkan.
v2: keep the comment
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This improves MSAA resolve performance.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This improves MSAA resolve performance.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Found randomly while skimming the code. This might have caused VM faults in
robustness tests.
Cc: 12.0 <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
When TGSI debug flag is enabled, print the shader linkage info as well.
Tested with mesa demos with SVGA_DEBUG=tgsi
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
This is mostly academic as this is not reachable from GL, which only has
the packed RGB10_A2UI vertex format.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We should not call nouveau_bufctx_reset() inside a validate function.
This only affects Fermi where images are aliased between 3D and CP.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.
However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.
This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This should fix spec@arb_shader_image_load_store@level.
Broken by:
Commit: 95c5bbae66af3ca1f805d94f6fe8d8e4ba2c9c43
radeonsi: set some image descriptor fields at bind time
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Submitting a DMA IB flushes the GFX IB and all GPU caches.
Vedran Miletić said:
"On Tonga 380X, this improves The Talos Principle from 8.3 fps to 28.3 fps
(all graphics settings Ultra, 4xAA, 1080p resolution with downsampling
from 1200p)."
Some anonymous dude said:
R9 390 results:
Tomb Raider (normal settings): 80 -> 88 FPS
Talos Principle (custom settings): 23 -> 56 FPS
Metro Last Light Redux (default benchmark settings): 39 -> 40 FPS
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Vedran Miletić <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|