| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Drivers that support this benefit by saving one lowering pass in the
GLSL-to-TGSI conversion.
radeonsi already supports this because all outputs are stored in temporary
variables before the export (except for TCS outputs, which have always
been readable in TGSI anyway due to their special semantics).
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of (incorrectly) biasing the snorm value to make it look like a
unorm, just use signed integer math.
This fixes arb_color_buffer_float-render GL_RGBA8_SNORM
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
Everything is in place for these.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I find this a lot more readable and compact - much easier to scan
through the list and see what's on and what's off.
No functional change intended.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs.
Also, the extraction of the descriptor created some really ugly asm code
with lots of VALU bitwise ops and v_readfirstlane.
Totals from *affected* shaders:
SGPRS: 13880 -> 13253 (-4.52 %)
VGPRS: 15200 -> 15088 (-0.74 %)
Code Size: 499864 -> 459816 (-8.01 %) bytes
Max Waves: 1554 -> 1564 (0.64 %)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
My LLVM commit disables it for dGPUs, but not APUs.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
v2: only do this if debug output of shader dumping is enabled
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
|
|
|
|
|
|
| |
PS and CS don't have any param exports, so it's a no-op.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes dual source blending on Stoney. The fix was copied from Vulkan.
The problem was discovered during internal testing.
Cc: 13.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
copied from Vulkan
Cc: 13.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
better safe than sorry
Cc: 13.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
better safe than sorry; set_framebuffer_state always makes this dirty
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Debugging a shader-db reported cycle count regression from the tex
coalescing, I eventually figured out that the texture latencies were
totally bogus. Really fixing it will probably involve mirroring
vc4_qir_schedule.c's texture fifo management here.
|
|
|
|
|
|
|
|
|
|
|
| |
This isn't as complete as I would like (can't merge interpolation because
of the implicit r5 dependency, doesn't work with control flow), but this
was cheap and easy.
Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16)
total instructions in shared programs: 99810 -> 99059 (-0.75%)
instructions in affected programs: 10705 -> 9954 (-7.02%)
|
|
|
|
|
| |
For texturing, there won't be a fixed limit on how many writes there are,
so we need to compute uses up front.
|
|
|
|
|
| |
The dead code elimination wants it to be safe, and I actually got
segfaults due to it being unsafe with the new coalescing pass.
|
|
|
|
|
|
| |
The VPM write logic will be basically the same as the texture coordinate
write logic we need, and it's not really related to the VPM read logic
other than the reuse of the use_count array.
|
|
|
|
|
| |
For now we're still just generating MOVs, but this will let us fold into
other ops in the future. No difference on shader-db.
|
|
|
|
|
|
| |
Every caller was dereffing the qinst, and this will let us make the number
of sources vary depending on the destination of the qinst so that we can
have general ALU ops that store to tex_[strb] and get an implicit uniform.
|
|
|
|
|
|
| |
This may have made a tiny bit of sense when we had one 4-arg inst per
shader, but if we only ever put 2 things in, having a pointer to 2 things
almost every instruction is pointless indirection.
|
|
|
|
|
| |
This was used originally for unorm4x8 packs, but we now represent those as
a series of packed movs.
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Among other things, blits would clear existing SO targets which would
cause a bunch of updates from u_blitter to be missed.
Fixes fbo-scissor-blit fbo, probably among many others.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
picture order count can be a negative value
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is not allowed for indirect accesses because the source
GPR might be erased by a subsequent instruction (WaR hazard)
if we don't emit a read dep bar.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow to use MOV instead of LD. The main advantage is
that MOV doesn't require a read dependency barrier while LD does,
and so this will both reduce barriers pressure and the number of
stall counts needed to read data from constant memory.
This is currently only for user uniform accesses. I should do
something similar when loading from the driver constant buffer
but it seems like a bit tricky to handle for now.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit 8e430ff8b060b4e8e922bae24b3c57837da6ea77 broke support for
LLVM 3.9 and older versions in Clover. This patch restores it and
refactors the support using Clover compatibility layer for LLVM.
v2: merged #ifdef blocks
v3: added support for LLVM 3.6-3.8
v4: add missing #ifdef around <memory>
v5: simplify using templates and lambda
Signed-off-by: Vedran Miletić <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98740
Tested-by[v4]: Pierre Moreau <[email protected]>
Tested-by: Vinson Lee <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jan Vesely <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
This patch deletes those fragment shaders in util_blitter_destroy().
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Currently clears only operate on the 0th array index (ignoring surface
layout parameters). Instead normalize to take a RTAI like all the
load/store tile logic does, and use ComputeSurfaceAddress to properly
take the surface state's lod/array index into account.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
| |
The non-fast-clear path was never updated after clear colors were passed
in as floats. Remove the now-harmful conversion from unorm8.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When switching render target array indexes (as might happen in a GS, or
in a future change, with layered clears), if the previous state is
HOTTILE_CLEAR, we should actually clear the tile before saving it off.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Bad type cast for stencil clear value was picking up structure
padding bytes.
Reviewed-by: Ilia Mirkin <[email protected]>
|