| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Copy+paste error. It was supposed to test cull and not clip.
Fixes: 4e69fba534e "nir: Rewrite lower_clip_cull_distance_arrays..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717
Reviewed-by: Lionel Landwerlin <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.
This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.
FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.
This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.
Fixes: e68871f6a ("spirv: Handle constants and types before execution
modes")
v2: (Jason)
* glsl_to_nir: get the info before glsl_to_nir, while all the rest
of the info gathering is happening
* prog_to_nir: gather the info on a general info-gathering pass,
not on variable setup.
v3: (Jason)
* Squash with the patch that removes that info from ir variable
* anv: assert that OriginUpperLeft is true. It should be already
set by spirv_to_nir.
* blorp: set origin_upper_left on its core "compile fragment
shader", not just on some specific places (for this we added an
helper on a previous patch).
* prog_to_nir: no need to gather specifically this fragcoord modes
as the full gl_program shader_info is copied.
* spirv_to_nir: assert that we are a fragment shader when handling
this execution modes.
v4: (reported by failing gitlab pipeline #18750)
* state_tracker: update too due changes on ir.h/gl_program
v5:
* blorp: minor change after change on previous patch
* radeonsi: update due this change.
v6: (Timothy Arceri)
* prog_to_nir: remove extra whitespace
* shader_info: don't use :1 on origin_upper_left
* glsl: program.fs.origin_upper_left/pixel_center_integer can be
move out of the shader list loop
|
|
|
|
|
|
|
| |
This makes us properly handle gl_ClipDistance and gl_CullDistance.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We needed to better handle cases where a chunk of a variable starts at
some non-zero location_frac and rolls over into the next slot but may
not be more than 4 dwords. For example, if gl_CullDistance is an array
of 3 things and has location_frac = 2, it will span across two vec4s but
is not, itself, bigger than a vec4. If you ignore the clip/cull special
case, it's not allowed to happen for anything else because the only
things that can span more than one slot is dvec3 and dvec4 and they're
both bigger than a vec4. The current code uses this attrib_slot thing
where we count attribute slots and iterate over them. However, that
doesn't work in the case above because gl_CullDistance will have an
attrib_slot count of 1 even though it does span two slots. We could fix
this by adjusting attrib_slot but we already have comp_mask and it's
easier to just handle it that way.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of going to all the work of to combine them into one array, just
make two arrays and use location_frac to colocate them within CLIP0.
Then the back-end can sort things out and stack them on top of each
other. Thanks to ef99f4c8, we also don't need to set compact anymore.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
| |
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Even in a very basic shader this reduces the time spent in
nir_copy_prop() by ~17%.
No shader-db changes for radeonsi NIR or i965.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 08bfd710a25c14df5f690cce9604617536d7c560. (nir/dead_cf: Stop
relying on liveness analysis) introduced a new check that iterated
through a SSA def's uses, to see if it's used. But it only checked
normal uses, and not uses which are part of an 'if' condition. This
led to it thinking more nodes were dead than possible.
Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test
(and related tests) with the out-of-tree Iris driver.
Fixes: 08bfd710a25 nir/dead_cf: Stop relying on liveness analysis
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
I'd like to use this in the prog_parameter.c code, so I need to move it
into C, make it non-static, and so on. This probably isn't the ideal
place for it, but I couldn't think of a better one.
Acked-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is to reassociate a * (b * c) into (a * c) * b, when
b is a non-constant value, but a and c are constants, allowing them
to be combined.
But nothing was enforcing that 'b' must be non-constant, which meant
that running opt_algebraic in a loop would never terminate if the IR
contained non-folded constant expressions like 256 * 0.5 * 2. Normally,
we call constant folding in such a loop too, but IMO it's better for
nir_opt_algebraic to be robust and not rely on that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581
Fixes: 32e266a9a58 i965: Compile fp64 funcs only if we do not have 64-bit hardware support
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was probably useful when it was first written, however it
looks to be no longer necessary.
As far as I can tell these days dce is smart enough to remove useless
instructions from if branches. Once this is done
nir_opt_peephole_select() will end up removing the empty if.
Removing this support reduces the dolphin uber shader compilation
time spent in nir_opt_dead_cf() by a little over 7x.
No shader-db changes on i965 or radeonsi.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the affected shaders are Unreal4 demos.
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437170 -> 15437001 (<.01%)
instructions in affected programs: 21536 -> 21367 (-0.78%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4
helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -4.07 -3.79
95% mean confidence interval for instructions %-change: -0.83% -0.77%
Instructions are helped.
total cycles in shared programs: 383007896 -> 383007378 (<.01%)
cycles in affected programs: 158640 -> 158122 (-0.33%)
helped: 38
HURT: 4
helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6
helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19%
HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2
HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -16.90 -7.77
95% mean confidence interval for cycles %-change: -0.39% -0.19%
Cycles are helped.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8213746 -> 8213745 (<.01%)
instructions in affected programs: 127 -> 126 (-0.79%)
helped: 1
HURT: 0
total cycles in shared programs: 187734146 -> 187734144 (<.01%)
cycles in affected programs: 2132 -> 2130 (-0.09%)
helped: 1
HURT: 0
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec
says:
It is undefined to convert a negative floating-point value to an
uint.
Assuming that (uint)some_float behaves like (uint)(int)some_float allows
some optimizations in the i965 backend to proceed.
This basically undoes the small amount of damage done by
"intel/compiler: Avoid propagating inequality cmods if types are
different".
v2: Replicate part of the commit message as a comment in the code.
Suggested by Jason.
shader-db results compairing *before* "intel/compiler: Avoid propagating
inequality cmods if types are different" and after this commit:
Skylake
total cycles in shared programs: 383007996 -> 383007896 (<.01%)
cycles in affected programs: 85208 -> 85108 (-0.12%)
helped: 13
HURT: 8
helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6
helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14%
HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3
HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07%
95% mean confidence interval for cycles value: -9.31 -0.21
95% mean confidence interval for cycles %-change: -0.24% <.01%
Cycles are helped.
Broadwell
total cycles in shared programs: 415251194 -> 415251370 (<.01%)
cycles in affected programs: 83750 -> 83926 (0.21%)
helped: 7
HURT: 13
helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12
helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30%
HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22
HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47%
95% mean confidence interval for cycles value: 0.76 16.84
95% mean confidence interval for cycles %-change: <.01% 0.37%
Inconclusive result (%-change mean confidence interval includes 0).
Haswell
total instructions in shared programs: 13823885 -> 13823886 (<.01%)
instructions in affected programs: 2249 -> 2250 (0.04%)
helped: 0
HURT: 1
total cycles in shared programs: 390094243 -> 390094001 (<.01%)
cycles in affected programs: 85640 -> 85398 (-0.28%)
helped: 15
HURT: 6
helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18
helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42%
HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2
HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04%
95% mean confidence interval for cycles value: -17.36 -5.69
95% mean confidence interval for cycles %-change: -0.44% -0.14%
Cycles are helped.
Ivy Bridge
total cycles in shared programs: 180986448 -> 180986552 (<.01%)
cycles in affected programs: 34835 -> 34939 (0.30%)
helped: 0
HURT: 10
HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10
HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30%
95% mean confidence interval for cycles value: 4.67 16.13
95% mean confidence interval for cycles %-change: 0.20% 0.35%
Cycles are HURT.
Sandy Bridge
total cycles in shared programs: 154603969 -> 154603970 (<.01%)
cycles in affected programs: 171514 -> 171515 (<.01%)
helped: 25
HURT: 14
helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04%
HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3
HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11%
95% mean confidence interval for cycles value: -0.91 0.96
95% mean confidence interval for cycles %-change: -0.02% 0.04%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In opt_peel_initial_if optimization, when moving the continue list to
end of the continue block, before the jump, could happen that the
continue list itself also ends with a jump.
This would mean that we would have two jump instructions in a row: the
first one from the continue list and the second one from the contine
block.
As inserting an instruction after a jump is not allowed (and it does not
make sense, as it will not be executed), remove the jump from the
continue block and keep the one from continue list, as it will be
executed first.
CC: Jason Ekstrand <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_split_alu_of_phi moves ALU instruction to the end of continue block.
But if the continue block ends with a jump instruction (an explicit
"continue" instruction) then the ALU must be inserted before the jump,
as it is illegal to add instructions after the jump.
CC: Ian Romanick <[email protected]>
Fixes: 0881e90c099 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The liveness analysis pass is fairly expensive because it has to build
large bit-sets and run a fix-point algorithm on them. Instead of
requiring liveness for detecting if values escape a CF node, just take
advantage of the structured nature of NIR and use block indices instead.
This only requires the block index metadata which is the fastest we have
metadata to generate.
No shader-db changes on Kaby Lake
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to handle live SSA values differently and it's going to involve
walking the instructions. We can make it a single instruction walk if
we combine it with cf_node_has_side_effects.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'.
../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’:
../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable]
unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0};
^~~~~~~~~~
[36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'.
../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function]
instr_each_src_and_dest_is_ssa(nir_instr *instr)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spirv_to_nir can generate input/output variables which are illegal
for the current shader stage, which would cause nir_validate_shader
to balk. After my recent commit to start decorating arrays as compact,
dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started
hitting validation errors due to outputs in a TCS (not intended for the
TCS at all) not being per-vertex arrays.
Thanks to Jason Ekstrand for suggesting this approach.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573
Fixes: ef99f4c8d17 compiler: Mark clip/cull distance arrays as compact before lowering.
Reviewed-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: c6465fec0c5 ("spirv: add SpvCapabilityInt64Atomics")
CID: 1442555
|
|
|
|
|
|
|
| |
I wanted to reuse this from v3d.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Everything should be in ssa form when we call this. This is a
hotpath so replace the check with an assert.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Everthing should be in ssa form when this is called. Checking
for it here is expensive so turn this into an assert instead.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
| |
There is no need to hash the instruction twice, especially as we
end up adding it in the majority of cases.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if we have something like this:
loop {
...
if x {
break;
} else {
continue;
}
}
opt_if_loop_last_continue returns true marking progress allthough nothing
changes.
Fixes: 5921a19d4b0c6 "nir: add if opt opt_if_loop_last_continue()"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Patch adds nir_lower_tex_options as parameter to sample_plane so that
we don't need to extend nir_tex_instr for this.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eric and I would like a bitmask of which samplers are used, similar to
prog->SamplersUsed, but available in NIR. The linker uses SamplersUsed
for resource limit checking, but later optimizations may eliminate more
samplers. So instead of propagating it through, we gather a new one.
While there, we also gather the existing textures_used_by_txf bitmask.
Gathering these bitfields in nir_shader_gather_info is awkward at best.
The main reason is that it introduces an ordering dependency between the
two passes. If gathering runs before lower_samplers_as_deref, it can't
look at var->data.binding. If the driver doesn't use the full lowering
to texture_index/texture_array_size (like radeonsi), then the gathering
can't use those fields. Gathering might be run early /and/ late, first
to get varying info, and later to update it after variant lowering. At
this point, should gathering work on pre-lowered or post-lowered code?
Pre-lowered is also harder due to the presence of structure types.
Just doing the gathering when we do the lowering alleviates these
ordering problems. This fixes ordering issues in i965 and makes the
txf info gathering work for radeonsi (though they don't use it).
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Passes like nir_lower_drawpixels add additional sampler variables,
and set an explicit binding which never changes. These extra samplers
don't have proper uniform storage associated with them, and there is no
way to update bindings via the API. So, for any 'hidden' variables,
just trust that there's an explicit binding set.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I would like to be able to run gl_nir_lower_samplers() to turn texture
and sampler variable dereferences into indexes and offsets, even for
ARB programs, and built-in shaders. This would make sampler handling
more consistent across the various types of shaders.
For GLSL programs, the gl_nir_lower_samplers_as_deref() pass looks up
the variable bindings in the shader program's uniform storage. But
ARB programs and built-in shaders don't have a gl_shader_program, and
uniform storage doesn't exist. In this case, we simply skip that
lookup, and trust var->data.binding to be set correctly by whoever
created the shader.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that. However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work. The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.
The easy solution here is to just rematerialize deref sources of deref
instructions as well. This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2b "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
| |
The constructor should init this to NULL
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Remove the original ALU instruciton after all of its readers are
modified to read the new ALU instruction.
v3: Fix an issue where a bcsel that may not be executed on a loop
iteration due to a break statement is converted to a phi (and therefore
incorrectly "executed"). Noticed by Tim.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216
Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A single shader in Unigine Superposition is affected by this change.
A single iadd is moved to the end of a loop. This iadd is involved in
a complex set of logic to terminate the loop, and an extra mov
instruction is inserted. This shader really needs the optimization
suggested by bugzilla #94747, and I expect that to make this tiny
regression go away.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15047543 -> 15047545 (<.01%)
instructions in affected programs: 565 -> 567 (0.35%)
helped: 0
HURT: 2
total cycles in shared programs: 369977253 -> 369978253 (<.01%)
cycles in affected programs: 127910 -> 128910 (0.78%)
helped: 0
HURT: 2
v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid
infinite optimization loops. Remove the original ALU instruciton after
all of its readers are modified to read the new ALU instruction.
v3: Extend to the more general case. The if the prev-block value from
the phi is not undef, this means the ALU instruction has to be
duplicated in both the prev-block and the continue-block.
Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
This simplifies some changes coming later.
Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used in a couple more places soon.
The function name is... horribly long. Neither Matt nor I could think
of any thing that was shorter and still more descriptive than
"is_phi_foo". I'm willing to entertain suggestions.
Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Fixes: cd56d79b59f "nir: check NIR_SKIP to skip passes by name"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a number of reasons for the rewrite.
1. Adding support for packing tess patch varyings in a sane way.
2. Making use of qsort allowing the code to be much easier to
follow.
3. Fixes a bug where different interp types caused component
packing to be skipped for all varyings in some scenarios.
4. Allows us to add a crude live range analysis for deciding
which components should be packed together. This support can
optionally be added in a future patch.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This will be used in the following patches to determine if we
support packing the components of a varying.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This adds support needed for marking the varyings as used but we
don't actually support packing patches in this patch.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compact arrays are used for special variables like clip and cull
distances, or tessellation levels. Drivers using compact arrays
assume that these values will always be actual arrays. We don't
want to turn a float[1] gl_CullDistance into a single float; that
would confuse drivers.
Today, i965 uses compact arrays, and Gallium drivers use
nir_lower_io_arrays_to_elements, so we haven't had any overlap
that would demonstrate the issue. Iris will use both.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A couple places in st/nir assume that cull distances have been lowered
away, so it will need to call this lowering pass for drivers which opt
out of the GLSL IR lowering. The Intel backend also calls this pass,
for i965 and anv. We need to only do it once.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a GLSL IR pass to convert clip/cull distance float[] arrays
into vec4[2] arrays. In ff281e6204, we attempted to skip this pass
if the GLSL IR lowering had already run. But, that code was not quite
right, as we forgot to strip away the per-vertex IO array layer for
geometry and tessellation shader varyings.
If the GLSL IR pass has run, the variables will not be marked as
"compact". So we can simply check that and bail.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
nir_lower_clip_cull_distance_arrays() marks the combined clip/cull
distance array as compact. However, when translating in from GLSL
or SPIR-V, we were not marking the original float[] arrays as compact.
We should do so. That way, we can detect these corner cases properly.
Reviewed-by: Timothy Arceri <[email protected]>
|