| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
We'll also want it in NIR f/e for implementing UBO support.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Basically just sync up the cmdstream emit parts to match the changes
already done on a3xx.
Also, fix scheduling for mem instructions. This is needed on a4xx, and
I am a bit surprised it isn't needed for a3xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
As mentioned by Michel Dänzer for LLVM >= 3.6 we create the
LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi
context. For older LLVM or hardware (r600) the triple is always r600--
and is created at a later stage - radeon_llvm_compile()
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes assert triggered by
ext_transform_feedback-intervening-read output use_gs
piglit test.
Signed-off-by: Glenn Kennard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
We could potentially support the right combination of 8888 to 565, but the
important thing for now is to not mix up our orderings of 8888. Fixes
fbo-copyteximage regressions.
|
|
|
|
|
| |
Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which
does blits to work around baselevel/lastlevel).
|
|
|
|
|
| |
We don't know who else has written to it, so we'd better update it every
time. This makes the gears spin in X again.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than emitting one kill instruction per component of KILL_IF's src
reg, we now or the components of the src register together and use the
result as a condition for just one kill instruction.
shader-db stats (bonaire):
979 shaders
Totals:
SGPRS: 34872 -> 34848 (-0.07 %)
VGPRS: 20696 -> 20676 (-0.10 %)
Code Size: 749032 -> 748452 (-0.08 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave
Totals from affected shaders:
SGPRS: 1184 -> 1160 (-2.03 %)
VGPRS: 600 -> 580 (-3.33 %)
Code Size: 13200 -> 12620 (-4.39 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Increases:
SGPRS: 2 (0.00 %)
VGPRS: 0 (0.00 %)
Code Size: 0 (0.00 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Decreases:
SGPRS: 5 (0.01 %)
VGPRS: 5 (0.01 %)
Code Size: 25 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
*** BY PERCENTAGE ***
Max Increase:
SGPRS: 32 -> 40 (25.00 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 116 -> 96 (-17.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
*** BY UNIT ***
Max Increase:
SGPRS: 64 -> 72 (12.50 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 424 -> 356 (-16.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should improve code quality in general and will help with some
future changes to how we emit kill instructions.
shader-db shows a few regressions, but these don't seem to be the result
of deficiencies in instcombine. They're mostly caused by the scheduler
making different decisions than before.
shader-db stats (bonaire):
979 shaders
Totals:
SGPRS: 35056 -> 34872 (-0.52 %)
VGPRS: 20624 -> 20696 (0.35 %)
Code Size: 764372 -> 749032 (-2.01 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave
Totals from affected shaders:
SGPRS: 13264 -> 13072 (-1.45 %)
VGPRS: 8248 -> 8316 (0.82 %)
Code Size: 486320 -> 470992 (-3.15 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 11264 -> 11264 (0.00 %) bytes per wave
Increases:
SGPRS: 6 (0.01 %)
VGPRS: 20 (0.02 %)
Code Size: 14 (0.01 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Decreases:
SGPRS: 32 (0.03 %)
VGPRS: 8 (0.01 %)
Code Size: 244 (0.25 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
*** BY PERCENTAGE ***
Max Increase:
SGPRS: 32 -> 48 (50.00 %)
VGPRS: 12 -> 20 (66.67 %)
Code Size: 216 -> 224 (3.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 40 -> 32 (-20.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 368 -> 280 (-23.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
*** BY UNIT ***
Max Increase:
SGPRS: 32 -> 48 (50.00 %)
VGPRS: 28 -> 36 (28.57 %)
Code Size: 39320 -> 40132 (2.07 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 72 -> 64 (-11.11 %)
VGPRS: 48 -> 40 (-16.67 %)
Code Size: 6272 -> 5852 (-6.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This makes it easier to parse.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
This accelerates the path for generating the shadow tiled texture when
asked to sample from a raster texture (typical in glamor).
|
|
|
|
|
|
| |
For blitting, we want to fire off an RCL-only job. This takes a bit of
tweaking in our validation and the simulator support (and corresponding
new code in the kernel).
|
|
|
|
|
| |
There will be other blit code showing up, and it seems like the place
you'd look.
|
|
|
|
|
|
| |
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
|
|
|
|
|
|
|
|
|
|
|
| |
So, it turns out my simulator doesn't *quite* match the hardware. And the
errata about raster textures tells you most of what's wrong, but there's
still stuff wrong after that. Instead, if we're asked to sample from
raster, we'll just blit it to a tiled temporary.
Raster textures should only be screen scanout, and word is that it's
faster to copy to tiled using the tiling engine first than to texture from
an entire raster texture, anyway.
|
| |
|
|
|
|
| |
This fixes the idiv tests in piglit.
|
|
|
|
|
|
| |
These are required to get piglit's idiv tests working. The
unsigned<->float conversions are wrong, but are good enough to get
piglit's small ranges of values working.
|
|
|
|
|
| |
This lets us plug in a better blit implementation and have it impact the
shadow update, too.
|
| |
|
| |
|
|
|
|
| |
There was no reason to tie the two packets' values together.
|
|
|
|
|
|
| |
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
|
|
|
|
| |
This wouldn't have mattered except in the worst case scenario RCL setup.
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
These correspond to the tgsi TXQ opcode
(plus sneak in a fix for two-sided color)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
We'll need these in one or two other spots.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Just build up arrays for src0/src1, and use create_collect()..
Also add back missing .3d flag for 3d/cube textures.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I noticed some cases where we where trying to copy-propagate indirect
src's into places they cannot go, like 2nd src for cat3 (mad, etc).
Expand out valid_flags() to be aware of relativ flag, and fix up a few
related spots.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we get in a scenario where we cannot schedule any more instructions
due to address register conflict, clone the instruction that writes the
address register, and switch the remaining unscheduled users for the
current address register over to the new clone.
This is simpler and more robust than the previous attempt (which tried
and sometimes failed to ensure all other dependencies of users of the
address register were scheduled first).. hint it would try to schedule
instructions that were not actually needed for any output value.
We probably need to do the same with predicate register, although so far
it isn't so heavily used so we aren't running into problems with it
(yet).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
A bit fugly.. try and make this cleaner.. note if we hoist all the
get_addr() out of the loop we can drop the hashtable and just use
create_addr()..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
It probably *should* be an assert, but for now TGSI f/e isn't very good
about dealing w/ CONST vs ABS/NEG. So for debug builds, print a warning
instead of crashing with an assert for now.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Without this, a3xx breaks.. a4xx would too if it had already implemented
support for passing driver params.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a normal MAD (ie. not MADSH), if first source is gpr and second
source is const, we can swap the first two sources to avoid needing a
mov instruction.
This gives back the biggest advantage TGSI f/e had over NIR f/e for
common shaders, since TGSI f/e had this logic in the f/e. Note that
doing this in copy-prop step has the advantage that it will also work
for cases like:
MOV TEMP[b], CONST[x]
MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c]
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb116d0449d63f7b94efbfa8b205d826f. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)
v2: use static vars for the fake bufs, minor code cleanups
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.
Mark emit instructions as unkillable so DCE doesn't eat them.
Enable only for evergreen/cayman as there are a few
unexplained GS piglit regressions on R6xx/R7xx with SB
enabled otherwise.
Signed-off-by: Glenn Kennard <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
CF_END could end up emitted in the middle of a shader on cayman
when there was a loop at the very end.
Fixes glsl-1.50-geometry-end-primitive and
ext_transform_feedback-geometry-shaders-basic piglit tests.
Signed-off-by: Glenn Kennard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We limit y-tiling to 0x20 when depth is involved. However the function is
run for each miplevel, and the hardware expects miplevel 0 to have the
highest tiling settings. Perform the y-tiling limit on all levels of a
3d texture, not just the ones that have depth.
Fixes:
texelFetch fs sampler3D 98x129x1-98x129x9
Signed-off-by: Ilia Mirkin <[email protected]>
Tested-by: Nick Tenney <[email protected]> # GT216
Cc: "10.4 10.5" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This code to handle absolute values on op3 srcs was a bit too simple,
it really needs a temp reg per src, not one per channel, make it
easier and let sb clean up the mess.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831
Reviewed-by: Glenn Kennard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR compiler frontend is an alternative to the TGSI f/e, producing
the same ir3 IR and using the same backend passes for scheduling, etc.
It is not enabled by default yet, as there are still some regressions.
To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for
example, xonotic or supertuxkart.
With the NIR f/e, scalarizing and a number of other lowering steps
happen in NIR, so we don't have to do them in ir3. Which simplifies the
f/e and allows the lowered instructions to pass through other
optimization stages.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Use the correct sprite replacement depending on the flip of the coord
mode, using either T or 1-T depending on whether we have an upper-left or
lower-left coordinate origin. This fixes all the point sprite piglits.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|