| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Needs to update max_half_reg, or be remapped to full reg and update
max_reg accordingly, depending on generation..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Because who are we kidding... it is a sysval.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The compiler support for:
OES_sample_shading
OES_sample_variables
OES_shader_multisample_interpolation
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The so->inputs[] table is in units of vec4
Fixes: 7ff6705b8d8 freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Since this is what the value actually is. Cleanup the name before
adding more different i,j related values for sample-shading.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
tex instruction can actually return 16b values.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Calculates i,j at specified offset within a pixel. A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
De-duplicate the "normal" and "flags" versions of the macros, and while
at it go ahead and add "flags" versions for all the remaining macros,
since we'll at least need INSTR1F in a following commit.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Couple more opcodes which don't take a sampler id as first arg.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
It takes an argument.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
And add corresponding enums for different sorts of varying
interpolation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Fixes: fe8c57e859d freedreno/ir3: use nir_src_as_uint in a few places
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fixes a few warnings.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v2)
|
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand):
- Add even more places
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This required to calculate sizes correctly when we have bindless
samplers/images.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.
However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.
28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)
Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)
The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.
This gives +10% FPS with Doom on my Vega56.
Rhys Perry also benchmarked Doom on his VEGA64:
Before: 72.53 FPS
After: 80.77 FPS
v2: disable pass on non-AMD drivers
Reviewed-by: Ian Romanick <[email protected]> (v1)
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Add support for load_barycentric_pixel, load_interpolated_input, and
friends. For now, this retains support for old-style inputs, which can
probably be dropped with some ttn work.
Prep work for sample-shading support.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Originally we kept track of a table of inputs. But with new-style frag
inputs this becomes awkward. Re-work it so that initially we assigned
un-packed varying locations, and then after the shader is compiled scan
to find actual used inputs, and re-pack.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Make it more clear that it applies to the following 'case' statements,
rather than the previous one.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Not sure why new-style frag inputs start triggering this. But we
probably shouldn't consider src's from other blocks.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This instruction needs a workaround when used from vertex shaders.
Fixes:
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
emit_cat5() needs to check if the last optional reg is there before it
accesses it.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We have a rather big constant file and it seems that the best way to
use it is to upload all UBOs and lower UBO access the load_uniform.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.
As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:
dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Most cat5 instructions are constructed using ir3_SAM, which uses
regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up
src1 and src2 differently for those two.
Fixes: 1dffb089 ("freedreno/ir3: fix sam.s2en encoding")
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions. We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks. Track whether we need
derivatives explicitly and use that to enable the state.
Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil_fbo
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
There are other cases where we need to disable early-z, like image
writes. So rename to something more generic.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
and similar things with multiple UBOs
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Seems like it can only work 16b at a time. Fixes
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitcount.*
TODO need to check if this limitation applies to a3xx as well.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For some things that show up when we expose higher glsl
TODO check blob traces to see if we have instructions for some of this?
I guess we don't but worth a check..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Detect when sampler/texture idx are immediate and switch to non s2en
encoding.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For now it uses indirect for everything. The next step is for the
ir3_cp pass to detect the case that tex and samp idx are immediate
and convert the sam instruction back to the non .s2en variant. But
doing that in a following patch so we can shake out the bugs with
.s2en more easily.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
When we have indirect samplers, we cannot tell the max sampler
referenced. Instead just refer to the number of sampler uniforms.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
On a6xx+ with half-regs conflicting with full-regs, the legalize pass
needs to set appropriate sync bits, such as (sy), on writes to full regs
that conflict with half regs, and visa-versa.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a6xx, half-regs conflict with full-regs. But we were only setting up
conflicts for the first class (ie. scalar, but not hvec2/hvec3/hvec4),
resulting in higher half-reg classes getting assigned to regs that
overwrite full-regs.
Noticed while trying to enable indirect-sampler (sam.s2en) which uses an
hvec2 argument to pass the sampler/tex index.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
These two bits seem to be a better way to detect which encoding we are
looking at.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
v2: turn on for turnip as well (Karol Herbst)
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
One line left out of the conversion to ir3 ssbo intrinsics on a6xx.
Fixes: 2e4525883f0 ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Something that we didn't hit earlier because of the extra shr.b
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While we lack value range tracking, this patch tries to 'manually' propogate
the division by 4 to calculate SSBO element-offset, into a possible previous
shift operation (shift left or right); checking that it is safe to do so.
This should help in cases like ie. when accessing a field in an array of
structs, where the offset is likely defined as base plus a multiplication
by a struct or array element size.
See dEQP test 'dEQP-GLES31.functional.ssbo.atomic.xor.highp_uint'
for an example of a shader that benefits from this.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These intrinsics have the offset in dwords already computed in the last
source, so the change here is basically using that instead of emitting
the ir3_SHR to divide the byte-offset by 4.
The improvement in shader stats is significant, of up to ~15% in
instruction count in some cases. Tested only on a5xx.
shader-db is unfortunately not very useful here because shaders that use
SSBO require GLSL versions that are not supported by freedreno yet.
For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
are helped.
A random case:
dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2
with current master:
; CL prog 14/1: 1252 instructions, 0 half, 48 full
; 8 const, 8 constlen
; 61 (ss), 43 (sy)
with the SSBO dword-offset moved to NIR:
; CL prog 14/1: 1053 instructions, 0 half, 45 full
; 7 const, 7 constlen
; 34 (ss), 73 (sy)
The SHR previously emitted for every single SSBO instruction disappears
in most cases, and the dword-offset ends up embedded in the STGB
instruction as immediate in many cases as well.
There are also a few of those tests that are currently failing on register
allocation, that start to pass as a result of reducing the pressure. At least
these, probably more:
dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7
No regressions observed with relevant CTS and piglit tests.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.
For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.
Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.
Reviewed-by: Rob Clark <[email protected]>
|