| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results Skylake:
total instructions in shared programs: 13109035 -> 13109024 (<.01%)
instructions in affected programs: 4777 -> 4766 (-0.23%)
helped: 11
HURT: 0
total cycles in shared programs: 332090418 -> 332090443 (<.01%)
cycles in affected programs: 19474 -> 19499 (0.13%)
helped: 6
HURT: 4
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We need to update the cursor before we check if the alu use is
dominated by the if condition. Previously we were checking if
the current location of the alu instruction was dominated by
the if condition which would miss some optimisation opportunities.
Fixes: a3b4cb34589e ("nir/opt_if: Rework condition propagation")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This will be used on V3D to cut down the size of the VS inputs in the VPM
(memory area for sharing data between shader stages).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
| |
This is different from the GL_ARB_spirv pass because it generates a much
simpler data structure that isn't tied to OpenGL and mtypes.h.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Empty initializer is not standard C. This fixes MSVC build.
Trivial.
|
|
|
|
|
|
|
|
|
| |
This isn't a great solution for bit-sizes but we don't have a
particularly convenient way to get a bit size from the system value enum
and this keeps the lowering pass from changing it.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing our own constant folding, we just emit instructions and
let constant folding happen. This is substantially simpler and lets us
use the nir_imm_bool helper instead of dealing with the const_value's
ourselves.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This requires that we rework the interface a bit to use nir_builder but
that's a nice little modernization anyway.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Missed one while converting to the nir_src_as_* helpers.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Empty initializer is not standard C. This fixes MSVC build.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use nir_src_comp_as_uint() to read the proper second component, as
nir_src_as_uint() returns the first one.
v2: Use nir_src_comp_as_uint() [Jason]
Fixes: 16870de8a0a ("nir: Use nir_src_is_const and nir_src_as_* in core
code")
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108532
Tested-by: Michel Dänzer <[email protected]>
Tested-by: Vinson Lee <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The linking opts shouldn't try removing or compacting XFB varyings
in the consumer. To avoid this we copy the always_active_io flag
from the producer.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
The conon_bit_class and canon_var_class variables got switched.
Fixes: 932c650e0b "nir/algebraic: Loosen a restriction on variables"
Reported-by: Ian Romanick <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This will hopefully make debugging opt_algebraic bit-size compile
failures easier.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, we would fail if a variable had an assigned but unknown bit
size X and we tried to assign it an actual bit size. However, this is
ok because, at the time we do the search, the variable does have an
actual bit size and it will match X because of the NIR rules.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
We rename some local variables in validate() to be more readable and
plumb the var through to get/set_var_bit_class instead of the var index.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
| |
There's nothing boolean about (a | ~a) ~> -1
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This not only makes them safe for more bit sizes but it also fixes a bug
in is_zero_to_one where it would return true for constant NaN.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
If GLSL has already done the lowering, we'd rather not crash in this pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the pass to propagate the copies information along the control
flow graph. It performs two walks, first it collects the vars
that were written inside each node. Then it walks applying the copy
propagation using a list of copies previously available. At each node
the list is invalidated according to results from the first walk.
This approach is simpler than a full data-flow analysis, but covers
various cases. If derefs are used for operating on more memory
resources (e.g. SSBOs), the difference from a regular pass is expected
to be more visible -- as the SSA copy propagation pass won't apply to
those.
A full data-flow analysis would handle more scenarios: conditional
breaks in the control flow and merge equivalent effects from multiple
branches (e.g. using a phi node to merge the source for writes to the
same deref). However, as previous commentary in the code stated, its
complexity 'rapidly get out of hand'. The current patch is a good
intermediate step towards more complex analysis.
The 'copies' linked list was modified to use util_dynarray to make it
more convenient to clone it (to handle ifs/loops).
Annotated shader-db results for Skylake:
total instructions in shared programs: 15105796 -> 15105451 (<.01%)
instructions in affected programs: 152293 -> 151948 (-0.23%)
helped: 96
HURT: 17
All the HURTs and many HELPs are one instruction. Looking
at pass by pass outputs, the copy prop kicks in removing a
bunch of loads correctly, which ends up altering what other
other optimizations kick. In those cases the copies would be
propagated after lowering to SSA.
In few HELPs we are actually helping doing more than was
possible previously, e.g. consolidating load_uniforms from
different blocks. Most of those are from
shaders/dolphin/ubershaders/.
total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
cycles in affected programs: 151461830 -> 151367845 (-0.06%)
helped: 2933
HURT: 2950
A lot of noise on both sides.
total loops in shared programs: 4603 -> 4603 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 11085 -> 11073 (-0.11%)
spills in affected programs: 23 -> 11 (-52.17%)
helped: 1
HURT: 0
The shaders/dolphin/ubershaders/12.shader_test was able to
pull a couple of loads from inside if statements and reuse
them.
total fills in shared programs: 23143 -> 23089 (-0.23%)
fills in affected programs: 2718 -> 2664 (-1.99%)
helped: 27
HURT: 0
All from shaders/dolphin/ubershaders/.
LOST: 0
GAINED: 0
The other generations follow the same overall shape. The spills and
fills HURTs are all from the same game.
shader-db results for Broadwell.
total instructions in shared programs: 15402037 -> 15401841 (<.01%)
instructions in affected programs: 144386 -> 144190 (-0.14%)
helped: 86
HURT: 9
total cycles in shared programs: 600912755 -> 600902486 (<.01%)
cycles in affected programs: 185662820 -> 185652551 (<.01%)
helped: 2598
HURT: 3053
total loops in shared programs: 4579 -> 4579 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 80929 -> 80924 (<.01%)
spills in affected programs: 720 -> 715 (-0.69%)
helped: 1
HURT: 5
total fills in shared programs: 93057 -> 93013 (-0.05%)
fills in affected programs: 3398 -> 3354 (-1.29%)
helped: 27
HURT: 5
LOST: 0
GAINED: 2
shader-db results for Haswell:
total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
instructions in affected programs: 44992 -> 43374 (-3.60%)
helped: 27
HURT: 69
total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
cycles in affected programs: 7720673 -> 7687588 (-0.43%)
helped: 1609
HURT: 1416
total loops in shared programs: 1830 -> 1830 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 1988 -> 1692 (-14.89%)
spills in affected programs: 296 -> 0
helped: 1
HURT: 0
total fills in shared programs: 2103 -> 1668 (-20.68%)
fills in affected programs: 438 -> 3 (-99.32%)
helped: 4
HURT: 0
LOST: 0
GAINED: 1
v2: Remove the DISABLE prefix from tests we now pass.
v3: Add comments about missing write_mask handling. (Caio)
Add unreachable when switching on cf_node type. (Jason)
Properly merge the component information in written map
instead of replacing. (Jason)
Explain how removal from written arrays works. (Jason)
Use mode directly from deref instead of getting the var. (Jason)
v4: Register the local written mode for calls. (Jason)
Prefer cf_node instead of node. (Jason)
Clarify that remove inside iteration only works in backward
iterations. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calls are not used yet (functions are inlined), but since new code is
already taking them into account, do it here too. The convention here
and in other places is that no writable memory is assumed to remain
unchanged, as well as global variables.
Also, explicitly state the modes affected (instead of using the
reverse logic) in one of the apply_for_barrier_modes calls.
Suggested by Jason.
v2: Consider local vars used by a call to be conservative, SPIR-V has
such cases. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also tests for removal of redundant loads, that we currently handle as
part of the copy propagation.
Note some tests involve multiple blocks and are currently DISABLED
because they (expectedly) fail.
v2: Add missing DISABLED prefix to "multi block" tests. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
These are covered by another pass now.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing this as part of the existing copy_prop_vars pass.
Separation makes easier to expand the scope of both passes to be more
than per-block. For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").
Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.
v2: Remove entries from dynarray instead of marking items as
deleted. Use foreach_reverse. (Caio)
(all from Jason)
Do not cache nir_deref_path. Not worthy for this patch.
Clear unused writes when hitting a call instruction.
Clean up enumeration of modes for barriers.
Move metadata calls to the inner function.
v3: For copies, use the vector length to calculate the mask.
(all from Jason)
Use nir_component_mask_t when applicable.
Rename functions for clarity.
Consider local vars used by a call to be conservative (SPIR-V has
such cases).
Comment and assert the assumption that stores and copies are
always to a deref that ends with a vector or scalar.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Note at the moment the pass called is nir_opt_copy_prop_vars, because
dead write elimination is implemented there.
Also added tests that involve identifying dead writes in multiple
blocks (e.g. the overwrite happens in another block). Those currently
fail as expected, so are marked to be skipped.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add basic helpers for doing tests on the vars related optimization
passes. The main goal is to lower the barrier to create tests during
development and debugging of the passes. Full coverage is not a
requirement.
v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason)
Move nir_imm_ivec2() to nir_builder.h. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For gallium drivers where you want to do some linking at variant compile
time, you don't have the other producer/consumer shader on hand to modify.
By exposing the inner function, the driver can have the used varyings in
the compiled shader cache key and still do linking.
This is also useful for V3D, where the binning shader wants to only output
position and TF varyings. We've been removing those after nir_lower_io,
but this will be less driver-specific code and let more of the shader get
DCEed early in NIR.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
Fixes assertion failures when calling nir_remove_unused_varyings() or
nir_remove_unused_io_vars().
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This is needed for nir_gather_info to actually count the new textures,
since it operates solely on variables.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: - change how the access qualifiers are accumulated
v3: - duplicate members in struct_member_decoration_cb()
- handle access qualifiers on variables
- remove access qualifiers handling in _vtn_variable_load_store()
- fix setting access qualifiers on type->array_element
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The b2f and b2i conversions always produce zero or one which are both
representable in every type and size. Since b2i and b2f support all bit
sizes, we can just get rid of the conversion opcode.
total instructions in shared programs: 15089335 -> 15084368 (-0.03%)
instructions in affected programs: 212564 -> 207597 (-2.34%)
helped: 896
HURT: 0
total cycles in shared programs: 369831123 -> 369826267 (<.01%)
cycles in affected programs: 2008647 -> 2003791 (-0.24%)
helped: 693
HURT: 216
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These allows us to not support fsign.sat in the Intel compiler backend,
and that will simplify some later changes.
No shader-db changes on any Intel platform.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results:
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15106023 -> 15105981 (<.01%)
instructions in affected programs: 300 -> 258 (-14.00%)
helped: 6
HURT: 0
helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7
helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00%
95% mean confidence interval for instructions value: -7.00 -7.00
95% mean confidence interval for instructions %-change: -14.00% -14.00%
Instructions are helped.
total cycles in shared programs: 566050327 -> 566050075 (<.01%)
cycles in affected programs: 2826 -> 2574 (-8.92%)
helped: 6
HURT: 0
helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42
helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92%
95% mean confidence interval for cycles value: -44.30 -39.70
95% mean confidence interval for cycles %-change: -8.95% -8.88%
Cycles are helped.
No changes on Gen6 or earlier.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ssa_for_alu_src helper will correctly handle swizzles and other
source modifiers for you. The expansions for unpack_half_2x16,
pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards
to swizzles. The brokenness of unpack_half_2x16 was causing rendering
errors in Rise of the Tomb Raider on Intel ever since c11833ab24dcba26
which added an extra copy propagation to the optimization pipeline and
caused us to start seeing swizzles where we hadn't seen any before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
Fixes: 9ce901058f3d "nir: Add lowering of nir_op_unpack_half_2x16."
Fixes: 9b8786eba955 "nir: Add lowering support for packing opcodes."
Tested-by: Alex Smith <[email protected]>
Tested-by: Józef Kucia <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already call nir_rematerialize_derefs_in_use_blocks_impl prior to
calling nir_lower_ssa_defs_to_regs_block so the assertion that all deref
uses in the block should hold. This fixes the following CTS test when
SPIR-V optimization recipe 1:
dEQP-VK.glsl.struct.local.loop_nested_struct_array_vertex
Fixes: 606eb56ab9449b "intel/nir: Only lower load/store derefs"
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the block in which the jump is inserted is the predecessor of a phi
then we need to remove phi sources otherwise the phi may end up with
things improperly connected. This fixes the following CTS test when
dEQP is run with SPIR-V optimization recipe 1:
dEQP-VK.glsl.functions.control_flow.return_in_nested_loop_vertex
Cc: [email protected]
Reviewed-by: Iago Toral Quiroga <[email protected]>
|