| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
We can just support them the same way we do load_const's SSA values.
|
|
|
|
|
|
|
|
|
| |
Since we just pulled it out of the destination as 8-bit unorm, we know
it's in [0, 1] already.
shader-db:
total instructions in shared programs: 100040 -> 98208 (-1.83%)
instructions in affected programs: 14084 -> 12252 (-13.01%)
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, SFU values always moved to a temporary, and TLB color reads
and texture reads always lived in r4. Instead, we can have these results
just be normal temporaries, and the register allocator can leave the
values in r4 when they don't interfere with anything else using r4.
shader-db results:
total instructions in shared programs: 100809 -> 100040 (-0.76%)
instructions in affected programs: 42383 -> 41614 (-1.81%)
|
| |
|
|
|
|
|
|
| |
needed for MRT
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Both a3xx and a4xx need the same logic to decide if half-precision can
be used for blit shaders. So move it to core and simplify things a bit
with a helper that considers all render targets.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Collapse dirty/reading bools into status bitmask (and drop writing which
should really be the same as dirty). And use 'used_resources' list for
all tracking, including zsbuf/cbufs, rather than special casing the
color and depth/stencil buffers.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Should be in units of components, not vec4's
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We hard-coded 4 or 8 as the max in various places. Switch it all to a
define since the limit will go up with a4xx (and maybe even again in the
future?)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
To circumvent a problem occuring when LINEAR_ALIGNED array mode is
selected on a TEXTURE_2D RAT.
This configuration causes MEM_RAT STORE_TYPED to write to incorrect
locations.
|
|
|
|
|
|
|
| |
Fixed by the CB_SHADER_MASK fix.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes the single-sample fast clear hang.
Cc: 10.6 <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
No effect, but this is what we should be doing.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Picked from the amdgpu branch.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
This reverts commit 8559f6ce62a9d5b52fa8189ba2352cd48bdabccf.
It causes hangs in DOTA 2 Reborn.
|
|
|
|
|
|
|
|
|
|
| |
Disabling the FP16 mode didn't help.
If needed, we can use this trick for blits too, but not for scaled blits.
+ 4 piglits
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Broken since:
46b2b3b - radeonsi: don't change pipe_resource in resource_copy_region
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91444
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
st/nine uses GENERIC slots greater than 60.
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
| |
This is for shader-db and should reduce size of shader dumps.
|
|
|
|
|
| |
This will be used by the new ddebug pipe. I'm including it now to avoid
conflicts with other patches.
|
|
|
|
|
|
|
|
| |
There are 2 reasons for this:
- LLVM optimization passes can work with floor
- there are patterns to select v_fract from floor anyway
There is no change in the generated code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 4db985a5fa9ea985616a726b1770727309502d81.
The grass no longer disappears, which was the reason the commit was reverted.
This might affect tessellation. We'll see.
Totals from affected shaders:
SGPRS: 151672 -> 150232 (-0.95 %)
VGPRS: 90620 -> 89776 (-0.93 %)
Code Size: 3980472 -> 3920836 (-1.50 %) bytes
LDS: 67 -> 67 (0.00 %) blocks
Scratch: 1357824 -> 1202176 (-11.46 %) bytes per wave
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
duplicated now
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This will help remove some duplicated code from radeon.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch has a better explanation. Just a summary here:
- The CPU always uploads a whole descriptor array to previously-unused memory.
- CP DMA isn't used.
- No caches need to be flushed.
- All descriptors are always up-to-date in memory even after a hang, because
CP DMA doesn't serve as a middle man to update them.
This should bring:
- better hang recovery (descriptors are always up-to-date)
- better GPU performance (no KCACHE and TC flushes)
- worse CPU performance for partial updates (only whole arrays are uploaded)
- less used IB space (no CP_DMA and WRITE_DATA packets)
- simpler code
- hopefully, some of the corruption issues with SI cards will go away.
If not, we'll know the issue is not here.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
This also moves the vec4-to-byte-addressing math into NIR, so that
algebraic has a chance at it.
|
| |
|
|
|
|
|
| |
I want to be able to inspect them from other files for lowering passes in
NIR.
|
|
|
|
|
| |
For now this is just scalarizing, but it also means we'll get to dump a
bunch of QIR-based lowering in a moment.
|
|
|
|
|
|
| |
For now, this just splits up store_output intrinsics to be scalars, and
drops unused outputs in the coordinate shader. My goal is to be able to
drop a bunch of my VC4-specific optimization by letting NIR handle it.
|
|
|
|
|
| |
I had my understanding of this bit flipped. We're using the full register
space, so we need to say so.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This enables GL4.1 for radeonsi, and updates the
docs in the correct places.
v2: enable only for llvm 3.7 which has fixes in place.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the final piece for ARB_gpu_shader5,
The code is based on the r600 code from Glenn Kennard,
and myself.
While developing this, I'm not 100% sure of all the calculations
made in the GS registers, this is why the max_stream is worked
out there and used to limit the changes in registers. Otherwise
my initial attempts either regressed GS texelFetch tests
or primitive-id-restart. The current code has no regressions
in piglit.
This commit doesn't enable ARB_gpu_shader5, since that just
bumps the glsl level to 4.00, so I'll just do a separate patch
for 4.10.
v1.1: fix bug introduced in rebase.
v2: Address Marek's review comments,
remove my llvm stream code for simpler C,
move gsvs_ring and gs_next_vertex to arrays.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There isn't a single instance in mesa that
mentions HAVE_SYS_TYPES_H, other than this file.
Cc: Jose Fonseca <[email protected]>
Acked-by: Brian Paul <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
The global CSE pass stinks and is unable to pull this out. Easy enough
to handle it here and avoid generating unnecessary special register
loads (which can allegedly be quite slow).
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
PFETCH retrieves the address for incoming vertices, not output vertices
in TCS. For output vertices, we must use the laneid as a base.
Fixes barrier piglit test, which was failing for entirely non-barrier
reasons, but rather that it was (a) trying to draw multiple patches and
(b) the incoming patch size was not the same as the outgoing patch size.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds to the common radeon streamout code, support
for multiple streams.
It updates radeonsi/r600 to set the enabled mask up.
v2: update for changes in previous patch.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will be used here later.
v2: update atom sizes
add check for old vs new enabled mask
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|