| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
v3: rebase
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Marek Olšák <[email protected]> (v2)
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Seems it was missing the "/ ma + 0.5" and the order was swapped.
Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When gathering info for unmovable types we need to handle arrays.
While we dont support packing/moving arrays we do support packing
scalar components with these arrays.
Fixes piglit:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test
Fixes: 5eb17506e159 ("nir: do not pack varying with different types")
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to pluck off components properly.
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order copy components around properly.
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
| |
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to swizzle them properly.
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v2)
|
|
|
|
|
|
|
|
| |
we already assert above that there are no more than 3 sources, so it
doesn't make sense to use an array of 4 sources
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
| |
v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst)
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.
v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!
v3: Bikeshed continues.
v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.
Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Suggested-by: Qiang Yu <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Port TGSI TRUNC lowering to nir
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
I was thinking about a refactor, and needed to read this first.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Please don't make me read a const_index[] expression ever again.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync. Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gl_nir_lower_samplers_as_deref splits structure uniform variables,
creating new variables for individual fields. As part of that, it
calculates a new location. It then never set this on the new variables.
Thanks to Michael Fiano for finding this bug. Fixes crashes on i965
with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct
test, which was reduced from the failing case in Michael's app.
Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
for Viewperf 13
Acked-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: handle atomics as well
make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v3)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
fixes retrieving the sampler type for bindless images stored inside structs.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: add support for AMD
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v1)
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This required to calculate sizes correctly when we have bindless
samplers/images.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes a couple of Coverity warnings CID 1444626.
Fixes: e30804c6024f ("nir/radv: remove restrictions on opt_if_loop_last_continue()")
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 3b3653c4cfb we decided not to use bare types; hence do not use
bare type when comparing with interface type to find out if the xfb
variable is an array block.
This fixes dEQP-VK.transform_feedback.* tests.
Fixes: 3b3653c4cfb ("nir/spirv: don't use bare types, remove assert in
split vars for testing")
CC: Dave Airlie <[email protected]>
CC: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This reverts commit b7ca074cc0df6101c428b2dfa53a59a0c6620af2.
It broke a lot of tests.
|
|
|
|
|
|
|
|
|
|
| |
also set some constants for SSBOs.
With that it can compile the shader from:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.
Fixes: 8ed583fe523 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f23 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
CTS: GL45-CTS.compute_shader.resources-max
Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it to gallium"
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers,
Page 67, (Location aliasing):
" Further, when location aliasing, the aliases sharing the location
must have the same underlying numerical type and bit
width (floating-point or integer, 32-bit versus 64-bit, etc.) and
the same auxiliary storage and interpolation qualification."
Additionally, we have improved the linker error descriptions.
Specifically, when taking structs into account we were producing a
linker error because we assumed that all components in each location
were used and that would cause component aliasing. This is not
accurate of the actual problem. Now, the failure specifies that the
underlying numerical type incompatibility is the cause for the
failure.
Fixes the following piglit test:
tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test
v2:
- Do not assert if we see invalid numerical types. These come
straight from shader code, so we should produce linker errors if
shaders attempt to do location aliasing on variables that are not
numerical such as records.
- While we are at it, improve error reporting for the case of
numerical type mismatch to include the shader stage.
v3:
- Allow location aliasing of images and samplers. If we get these
it means bindless support is active and they should be handled
as 64-bit integers (Ilia)
- Make sure we produce link errors for any non-numerical type
for which we attempt location aliasing, not just structs.
v4:
- Rebased with minor fixes (Andres).
- Added fixing tag to the commit log (Andres).
v5:
- Remove the helper function and check individually for the
underlying numerical type and bit width (Timothy).
- Implicitly, assume that any non-treated type which is checked for
its underlying numerical type is either integer or
float and has a defined bit width (Timothy).
- Implicitly, assume that structs are the only non-treated
non-numerical type (Timothy).
- Improve the linker error descriptions and commit log (Andres).
Fixes: 13652e7516a ("glsl/linker: Fix type checks for location aliasing")
Cc: Ilia Mirkin <[email protected]>
Cc: Timothy Arceri <[email protected]>
Cc: Iago Toral Quiroga <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
All we ever do is initialize it to zero, clone it, print it, and
validate it. No one ever sets or uses it.
Acked-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As defined in SPV_NV_compute_shader_derivatives. These control how the
invocations are arranged in a CS when doing derivative and related
operations (which are also enabled by the extension).
Since we expect valid SPIR-V, we don't need to do more work at SPIR-V
level to enable the derivative and related operations to be called.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.
Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In compute shaders if no derivative group is defined, the derivatives
will always be zero. Specified in NV_compute_shader_derivatives.
To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules. (Jason)
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan. So parse and propagate
those up to shader_info.h.
v2: Do not fail when ARB_compute_variable_group_size is being used,
since we are still clarifying what is the right thing to do here.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Renamed a few predicates from "fs_only" to be "derivative_only" (or
similar pairs).
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
As the code evolved, we ended up with a redundant conditions. Clean
this up.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.
However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.
28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)
Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)
The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.
This gives +10% FPS with Doom on my Vega56.
Rhys Perry also benchmarked Doom on his VEGA64:
Before: 72.53 FPS
After: 80.77 FPS
v2: disable pass on non-AMD drivers
Reviewed-by: Ian Romanick <[email protected]> (v1)
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following search expression and NIR sequence:
('iadd', ('imul', a, b), b)
ssa_2 = imul ssa_0, ssa_1
ssa_3 = iadd ssa_2, ssa_0
The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success. In the above
example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match. The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.
This commit instead counts the number of commutative ops in each
expression and assigns an index to each. It then does a loop and loops
over the full combinatorial matrix of commutative operations. In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations). There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
instructions in affected programs: 1797123 -> 1789467 (-0.43%)
helped: 6751
HURT: 2264
total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
cycles in affected programs: 15931005 -> 15786914 (-0.90%)
helped: 6024
HURT: 3436
total loops in shared programs: 4360 -> 4360 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 23675 -> 23666 (-0.04%)
spills in affected programs: 235 -> 226 (-3.83%)
helped: 5
HURT: 1
total fills in shared programs: 32040 -> 32032 (-0.02%)
fills in affected programs: 190 -> 182 (-4.21%)
helped: 6
HURT: 2
LOST: 18
GAINED: 5
Reviewed-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang. Not sure about the other two new patterns but we
may as well throw them in for completeness. While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15321227 -> 15321129 (<.01%)
instructions in affected programs: 3594 -> 3496 (-2.73%)
helped: 6
HURT: 0
total cycles in shared programs: 357481321 -> 357479725 (<.01%)
cycles in affected programs: 44109 -> 42513 (-3.62%)
helped: 6
HURT: 0
VkPipeline-DB results on Kaby Lake:
total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
instructions in affected programs: 19058 -> 18288 (-4.04%)
helped: 163
HURT: 0
total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
cycles in affected programs: 750958 -> 736984 (-1.86%)
helped: 158
HURT: 1
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time. We can drop these specifiers and now the optimizations
will apply more generally.
Shader-DB results on Kaby Lake:
total instructions in shared programs: 15321168 -> 15321227 (<.01%)
instructions in affected programs: 8836 -> 8895 (0.67%)
helped: 1
HURT: 31
total cycles in shared programs: 357481781 -> 357481321 (<.01%)
cycles in affected programs: 146524 -> 146064 (-0.31%)
helped: 22
HURT: 10
total spills in shared programs: 23675 -> 23673 (<.01%)
spills in affected programs: 11 -> 9 (-18.18%)
helped: 1
HURT: 0
total fills in shared programs: 32040 -> 32036 (-0.01%)
fills in affected programs: 27 -> 23 (-14.81%)
helped: 1
HURT: 0
No change in VkPipeline-DB
Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb. In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL. Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win? Also it helps spilling in one Car Chase compute shader.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a def is used as an condition before its definition, we should also
consider this a case to repair. When repairing, make sure we rewrite
any if conditions too.
Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch. We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: 364212f1ede4b "nir: Add a pass to repair SSA form"
|
|
|
|
|
|
|
|
|
| |
The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|