| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.
In one shader, it removes 17 * 4 bytes from the shader binary.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Same as before, writing TCS outputs to LDS is rare.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
TCS outputs are usually not written to LDS, so no stats here.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
-44 bytes in a monolithic LS-HS binary.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now it's able to generate ds_write2_b64 instead of ds_write2_b32.
-20 bytes in one shader binary. (having only 1 output)
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
-16 bytes in one shader binary.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should
have fixed the perf regression didn't really change much if anything.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.
According to the hardware team, this will be fixed in future chips,
so take that into account already.
Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.
v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
3 flags for primitive binning, 2 flags for out-of-order rasterization
(but that will be done some other time)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
The data is read when the render_cond_atom is emitted, so we must
delay emitting the atom until after the flush.
Fixes: 0fe0320dc074 ("radeonsi: use optimal packet order when doing a pipeline sync")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The result written by the shader workaround needs to be written back, or
the CP may read stale data.
Fixes: 78476cfe071a ("radeonsi: enable ARB_transform_feedback_overflow_query")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Fixes: 420c438589c8 ("radeonsi: log draw and compute state into log context")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
used
v2: remove some redundant checks
Acked-by: Roland Scheidegger <[email protected]> (v1)
Tested-by: Dieter Nützel <[email protected]> (v1)
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
For radv, in order to report VM faults when detected.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes a rendering issue with Hitman when bindless textures
are enabled.
Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This is still very simple, but it's better than before.
Loosely ported from Vulkan.
|
|
|
|
|
|
| |
v2: don't special-case Tonga and Iceland.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit fc99cb3c9edee3af773700cf7ebdc60dc02fcaba.
"The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to
25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25%
performance decrease."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429
It looks like we can't use the raster config values from the kernel.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 74e39de9324d it was set to 3 and it was reported that 4 caused
tesseract to start spilling VGPRs. This no longer seems to be the
case.
Totals:
SGPRS: 2787844 -> 2787764 (-0.00 %)
VGPRS: 1713121 -> 1712717 (-0.02 %)
Spilled SGPRs: 7532 -> 7532 (0.00 %)
Spilled VGPRs: 49 -> 33 (-32.65 %)
Private memory VGPRs: 2060 -> 2060 (0.00 %)
Scratch size: 2200 -> 2180 (-0.91 %) dwords per thread
Code Size: 79265520 -> 79248360 (-0.02 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 670535 -> 670608 (0.01 %)
Wait states: 0 -> 0 (0.00 %)
Before:
VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize
EffectsCaveDemo 301 0 256 264
ReflectionsSubwayDemo 264 0 256 264
VehicleGame 295 0 128 132
bioshock-infinite 1140 0 448 516
dirt-showdown 453 33 0 28
gang-beasts 364 0 500 496
kerbal-space-program 1228 0 472 480
tomb-raider-ultra 1199 16 0 20
After:
VGPR SPILLING APPS Shaders SpillVGPR PrivVGPR ScratchSize
EffectsCaveDemo 301 0 256 264
ReflectionsSubwayDemo 264 0 256 264
VehicleGame 295 0 128 132
bioshock-infinite 1140 0 448 516
dirt-showdown 453 33 0 28
gang-beasts 364 0 500 496
kerbal-space-program 1228 0 472 480
The only change in VGPR spills is the elimination of all spills
in Tomb Raider at Ultra settings. Closer examination shows that
the shaders go over the limit because they contain three
expressions a mul, rcp and ubo load. The ubo load is actually
used elsewhere and is therefore stored in a temp already in IR
such as tgsi but glsl ir counts it agaist the if cost.
Acked-by: Nicolai Hähnle <[email protected]>
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
We don't actually write them to disk here. That will happen in the
following commit.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Not sure yet if we wanna do this on CIK and VI too.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Bad mistake, sorry.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When assertions were disabled, the compiler removed
the call to util_idalloc_alloc() and the first allocated
bindless slot was 0 which is invalid per the spec.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
I think it makes more sense.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
clip_regs aren't marked dirty when writes_viewport_index is changed.
Cc: 17.2 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
+ 4 piglit regressions, but it's correct accorcing to the GL spec and
performance is more important than piglit.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|