| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
We can have a access flag already set here so just augment the
existing ones.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 0fb61dfdeb ("spirv: propagate access qualifiers through ssa & pointer")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the help of Sagar, Ian and Ivan.
v2: Fix dependencies (Ian Romanick)
v3: 1) fix function name (Marek Olsak)
2) Add check for extension enable (Marek Olsak)
Signed-off-by: Paulo Zanoni <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used to support one of the function from
Ext_texture_shadow_lod specification.
With the help of Sagar, Ian and Ivan.
Signed-off-by: Paulo Zanoni <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
With the help of Sagar, Ian and Ivan.
Signed-off-by: Paulo Zanoni <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
instr->type is the type of the array element, not the type of the array
being dereferenced. Rather than fishing out the parent type, just use
parent->num_children which should be the length plus 1. While we're here
add another assert for the issue fixed by the previous commit.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e62 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
There was an off-by-one error.
Fixes: 156306e5e62 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input. It also makes zero sense because we have to
special-case it in the back-end.
Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This fixes glsl-fcoord-invariant-pass.shader_test on drivers that set
GLSLFragCoordIsSysVal which includes radeonsi among others.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Even if the data race wasn't real (I'm not great at reasoning about
this), helgrind is a nice enough tool that keeping noise out of it is
probably worthwhile. Besides, typing out the numbers keeps the data
in the read-only data section instead of emitting code to initialize
it every time.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:
foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...
that wasn't recognized by the previous pass.
In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.
All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:
Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We print the size as decimal too, and using hex without a leading "0x"
was very confusing.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We don't have calculate final quotient in order to calculate unsigned
modulo result. Once we are done with error correction we have partial
result which can be used to find out modulo operation result
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not only variables can be flagged as NonUniformEXT but also
expressions. We're currently ignoring it in an expression such as :
imageLoad(data[nonuniformEXT(rIndex)], 0)
The associated SPIRV :
OpDecorate %69 NonUniformEXT
...
%69 = OpLoad %61 %68
This changes propagates access qualifiers through ssa & pointers so
that when it hits a OpLoad/OpStore style instructions, qualifiers are
not forgotten.
Fixes failure the following tests :
dEQP-VK.descriptor_indexing.*
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 8ed583fe523703 ("spirv: Handle the NonUniformEXT decoration")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This refactor allows for common code to apply decoration on all
ssa/pointer values. In particular this will allow to propagage access
qualifiers.
Signed-off-by: Lionel Landwerlin <[email protected]>
Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
SPIRV added the ability to access variables and have expressions non
dynamically uniform and because spirv_to_nir generates deref
instructions, we'll need to have that access there.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When generating the error message for a missing function error where
all available overloads were missing due to a too low GLSL version, we
used to report something like this:
---8<---
0:224(14): error: no matching function for call to
`textureCubeLod(samplerCube, vec3, float)'; candidates are:
0:224(14): error: type mismatch
---8<---
This is a pretty confusing error message, and can throw people off when
debugging. So let's instead check if any overload is available before we
decide what to print. This allow us to report something like this
instead:
---8<---
0:224(14): error: no function with name 'textureCubeLod'
0:224(14): error: type mismatch
---8<---
This is arguably easier to understand for programmers, and doesn't send
you on a wild goose chase to figure out what argument is wrong just
because you stopped reading the message prematurely. I'm of course
referring to a friend, not me. For sure. I would never do that.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When 'x' is the result of a scmp op:
x != 0.0 or x == 1.0: passthrough
x == 0.0 or x != 1.0: invert
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add simple fdot2 optimizations that are missing.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
For backends that don't have a 'fdph' instructions
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This version has less ops for the same precision.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This is to allow optimizations in nir_opt_algebraic not otherwise possible
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
This will effectively enable the optimization in anv.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of building a constant mask (which depends on knowing the
subgroup size), we build an expression. Because the pass uses the
nir_shader_lower_instructions helper, subgroup lowering will be run on
any newly emitted instructions as well as the previously existing
instructions. In particular, if the subgroup size is known, the newly
emitted subgroup_size intrinsic will get turned into a constant and a
later constant folding pass will clean it up.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Tested on Gen > 9.
v2: 1) Fix lowering
2) Keep a consistent i/u order (Matt Turner)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
I can't find a single place where nir_lower_io is called after going out
of SSA which is the only real reason why you wouldn't do this. Returning
SSA defs is more idiomatic and is required for the next commit.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The one obvious omission here is gl_HelperInvocation itself. However,
the spec doesn't require that we generate then when gl_HelperInvocation
is used, it merely mandates that we report them if they are there.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise, as we add things to the switch, we're going to forget and add
some 64-bit op at some point in the future and it'll stop getting
flagged. There's no reason why we can't do the check for derivatives.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
warning: use of logical '||' with constant operand
note: use '|' for a bitwise operation
Fixes: 758fdce9fee ("nir: Add some generic helpers for writing lowering passes")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Signed-off-by: Andrii Simiklit <[email protected]>
|
|
|
|
|
|
| |
Fixes: 14531d676b11999123c0 ("nir: make nir_const_value scalar")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, you may end up moving a register read and that could result
in an incorrect shader. This commit fixes a rendering issue in Elite:
Dangerous.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111152
Fixes: 3ee2e84c60 "nir: Rematerialize compare instructions"
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Semantically, the memory barrier has to come first to wait
for the completion of pending memory requests.
Afterwards, the workgroups can be synchronized.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
No vkpipeline-db changes found.
Signed-off-by: Rhys Perry <[email protected]>
Reveiewed-by: Alyssa Rosenzweig [email protected]
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of these were found in a few GTAV, Rise of the Tomb Raider and
Shadow of the Tomb Raider shaders.
Results from vkpipeline-db run with ACO:
Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 220 -> 220 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 13492 -> 11560 (-14.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 69 -> 69 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: use False instead of 0
Signed-off-by: Rhys Perry <[email protected]>
Reveiewed-by: Alyssa Rosenzweig [email protected]
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
This will be used to enabled compat profile support for geometry
shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will be reused in the following patch to add support for clip
vertex lowering in geometry shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow code sharing in a following patch that adds support
for lowering in geometry shaders. It also allows us to exit early
if there is no lowering to do which allows a small code tidy up.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a function has a constant and is called more than once, after
inlining we may end up with different variables representing the same
constant. This commit look into the data and de-duplicate them.
The first pass now will collect the constant data in a per variable
buffer, then de-duplication happens (by sorting then linear walk), and
the second pass will use the data in var->data.location.
One side-effect of the current implementation is that constants will
be reordered. If this turns out to be a problem is something that can
be fixed.
An alternative strategy considered was to perform this in a
per-function basis and then merge the results, the problem is that we
would have to fix up the offsets during the merge. Given the data we
have, the current patch is good enough.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used later on to allocate constant data for each
variable (and then deduplicate). Also drop initializing found_read,
as it is already implicitly false in the literal.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in
NIR, but doesn't generate a new txf_ms instructions as replacement. It's
pretty easy to allow that in nir_shader_lower_instructions, and it may be
common in lowering passes.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The location is unused for shader_temp and function_temp variables, and
due to the way we nir_lower_io_to_temproraries demotes shader_out
variables to shader_temp variables, it happened to equal
VARYING_SLOT_POS for the gl_Position temporary, which made this pass
fail with the offline compiler due to this coming before vars_to_ssa.
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For per-sample color writes we need the output intrinsic to pack the
sample index, which is not provided with regular store_output intrinsics
unless we figured out a way to encode it into the base or the offset.
v2:
- Drop the writemask (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|