aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler/nir/nir_opt_if.c
Commit message (Collapse)AuthorAgeFilesLines
* nir: Call nir_metadata_preserve on !progressJason Ekstrand2020-06-111-3/+1
| | | | | | Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5171>
* tree-wide: fix deprecated GitLab URLsEric Engestrom2020-05-231-1/+1
| | | | | | | | | | | | | They will stop working in the next GitLab release, so let's update them ASAP to make sure things are propagated to everyone by then. See: https://about.gitlab.com/releases/2020/05/06/gitlab-com-13-0-breaking-changes/#removal-of-deprecated-project-paths Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5111>
* nir/opt_if: use nir_src_as_bool in opt_peel_loop_initial_if helperRhys Perry2020-05-191-12/+10
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4757>
* nir/opt_if: run opt_peel_loop_initial_if after all other optimizationsRhys Perry2020-05-191-5/+44
| | | | | | | | | | | | | Fixes dEQP-VK.graphicsfuzz.loops-ifs-continues-call with RADV. opt_if_loop_terminator can cause this optimization or opt_if_simplification to be run on the non-SSA code. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Fixes: 52c8bc0130a ('nir: make opt_if_loop_terminator() less strict') Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2943 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4757>
* nir: make opt_if_loop_terminator() less strictTimothy Arceri2020-04-081-1/+1
| | | | | | | | | nir_cf_{extract,reinsert}() can't stitch a block together if the block we are extracting ends in a jump but other jumps nested in further ifs should be fine to move. Reviewed-by: Samuel Pitoiset <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4477>
* nir: Handle vec8/16 in opt_split_alu_of_phiJason Ekstrand2020-03-311-4/+1
| | | | | | Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
* nir/opt_if: Fix undef handling in opt_split_alu_of_phi()Connor Abbott2019-09-181-55/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pass assumed that "Most ALU ops produce an undefined result if any source is undef" which is completely untrue. Due to how we lower if statements to selects and then optimize on those selects later, we simply cannot make that assumption. In particular this pass tried to replace an ior of undef and true, which had been generated by optimizing a select which itself came from flattening an if statement, to undef causing a miscompilation for a CTS test with radeonsi NIR. We fix this by always doing what the non-undef path did, i.e. duplicate the instruction twice. If there are cases where the instruction before the loop can be folded away due to having an undef source, we should add these to opt_undef instead. The comment above the pass says that if the phi source from before the loop is undef, and we can fold the instruction before the loop to undef, then we can ignore sources of the original instruction that don't dominate the block before the loop because we don't need them to create the instruction before the loop. This is incorrect, because the instruction at the bottom of the loop would get those sources from the wrong loop iteration. The code never actually did what the comment said, so we only have to update the comment to match what the pass actually does. We also update the example to more closely match what most actual loops look like after vtn and peephole_select. There are no shader-db changes with i965, radeonsi NIR, or radv. With anv and my vkpipeline-db there's only one change: total instructions in shared programs: 14125290 -> 14125300 (<.01%) instructions in affected programs: 2598 -> 2608 (0.38%) helped: 0 HURT: 1 total cycles in shared programs: 2051473437 -> 2051473397 (<.01%) cycles in affected programs: 36697 -> 36657 (-0.11%) helped: 1 HURT: 0 Fixes KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input with radeonsi NIR.
* nir/opt_if: Clean up single-src phis in opt_if_loop_terminatorJason Ekstrand2019-07-151-0/+7
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071 Fixes: 2a74296f24ba "nir: add opt_if_loop_terminator()" Reviewed-by: Timothy Arceri <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-4/+3
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir: make nir_const_value scalarKarol Herbst2019-04-141-2/+2
| | | | | | | | | v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* nir: initialise some variables in opt_if_loop_last_continue()Timothy Arceri2019-04-111-2/+2
| | | | | | | | Fixes a couple of Coverity warnings CID 1444626. Fixes: e30804c6024f ("nir/radv: remove restrictions on opt_if_loop_last_continue()") Reviewed-by: Tapani Pälli <[email protected]>
* nir/radv: remove restrictions on opt_if_loop_last_continue()Timothy Arceri2019-04-091-34/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
* Revert "nir: propagate known constant values into the if-then branch"Timothy Arceri2019-04-031-60/+0
| | | | | | This reverts commit 4218b6422cf1ff70c7f0feeec699a35e88522ed7. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110311
* nir: propagate known constant values into the if-then branchTimothy Arceri2019-04-031-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Helps Max Waves / VGPR use in a bunch of Unigine Heaven shaders. shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 5505440 -> 5505872 (0.01 %) VGPRS: 3077520 -> 3077296 (-0.01 %) Spilled SGPRs: 39032 -> 39030 (-0.01 %) Spilled VGPRs: 16326 -> 16326 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 744 -> 744 (0.00 %) dwords per thread Code Size: 123755028 -> 123753316 (-0.00 %) bytes Compile Time: 2751028 -> 2560786 (-6.92 %) milliseconds LDS: 1415 -> 1415 (0.00 %) blocks Max Waves: 972192 -> 972240 (0.00 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 160 -> 160 (0.00 %) VGPRS: 88 -> 88 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 18268 -> 18152 (-0.63 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 26 -> 26 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: fix opt_if_loop_last_continue()Timothy Arceri2019-03-221-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than skipping code that looked like this: loop { ... if (cond) { do_work_1(); continue; } else { break; } do_work_2(); } Previously we would turn this into: loop { ... if (cond) { do_work_1(); continue; } else { do_work_2(); break; } } This was clearly wrong. This change checks for this case and makes sure we now leave it for nir_opt_dead_cf() to clean up. Reviewed-by: Ian Romanick <[email protected]>
* nir: remove jump from two merging jump-ending blocksJuan A. Suarez Romero2019-02-151-2/+19
| | | | | | | | | | | | | | | | | | In opt_peel_initial_if optimization, when moving the continue list to end of the continue block, before the jump, could happen that the continue list itself also ends with a jump. This would mean that we would have two jump instructions in a row: the first one from the continue list and the second one from the contine block. As inserting an instruction after a jump is not allowed (and it does not make sense, as it will not be executed), remove the jump from the continue block and keep the one from continue list, as it will be executed first. CC: Jason Ekstrand <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: move ALU instruction before the jump instructionJuan A. Suarez Romero2019-02-151-1/+1
| | | | | | | | | | | | opt_split_alu_of_phi moves ALU instruction to the end of continue block. But if the continue block ends with a jump instruction (an explicit "continue" instruction) then the ALU must be inserted before the jump, as it is illegal to add instructions after the jump. CC: Ian Romanick <[email protected]> Fixes: 0881e90c099 ("nir: Split ALU instructions in loops that read phis") Reviewed-by: Ian Romanick <[email protected]>
* nir: fix example in opt_peel_loop_initial_if descriptionCaio Marcelo de Oliveira Filho2019-02-121-3/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_if: don't mark progress if nothing changesKarol Herbst2019-02-131-0/+7
| | | | | | | | | | | | | | | | | | | | if we have something like this: loop { ... if x { break; } else { continue; } } opt_if_loop_last_continue returns true marking progress allthough nothing changes. Fixes: 5921a19d4b0c6 "nir: add if opt opt_if_loop_last_continue()" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Convert a bcsel with only phi node sources to a phi nodeIan Romanick2019-02-081-0/+220
| | | | | | | | | | | | | v2: Remove the original ALU instruciton after all of its readers are modified to read the new ALU instruction. v3: Fix an issue where a bcsel that may not be executed on a loop iteration due to a break statement is converted to a phi (and therefore incorrectly "executed"). Noticed by Tim. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216 Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <[email protected]>
* nir: Split ALU instructions in loops that read phisIan Romanick2019-02-081-0/+294
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A single shader in Unigine Superposition is affected by this change. A single iadd is moved to the end of a loop. This iadd is involved in a complex set of logic to terminate the loop, and an extra mov instruction is inserted. This shader really needs the optimization suggested by bugzilla #94747, and I expect that to make this tiny regression go away. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15047543 -> 15047545 (<.01%) instructions in affected programs: 565 -> 567 (0.35%) helped: 0 HURT: 2 total cycles in shared programs: 369977253 -> 369978253 (<.01%) cycles in affected programs: 127910 -> 128910 (0.78%) helped: 0 HURT: 2 v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid infinite optimization loops. Remove the original ALU instruciton after all of its readers are modified to read the new ALU instruction. v3: Extend to the more general case. The if the prev-block value from the phi is not undef, this means the ALU instruction has to be duplicated in both the prev-block and the continue-block. Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <[email protected]>
* nir: Select phi nodes using prev_block instead of continue_blockIan Romanick2019-02-081-11/+10
| | | | | | | This simplifies some changes coming later. Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <[email protected]>
* nir: Refactor code that checks phi nodes in opt_peel_loop_initial_ifIan Romanick2019-02-081-16/+36
| | | | | | | | | | | This will be used in a couple more places soon. The function name is... horribly long. Neither Matt nor I could think of any thing that was shorter and still more descriptive than "is_phi_foo". I'm willing to entertain suggestions. Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <[email protected]>
* nir: Unset metadata debug bit if no progress madeMatt Turner2019-01-091-0/+4
| | | | | | | | | | | | | | | | | | | | | NIR metadata validation verifies that the debug bit was unset (by a call to nir_metadata_preserve) if a NIR optimization pass made progress on the shader. With the expectation that the NIR shader consists of only a single main function, it has been safe to call nir_metadata_preserve() iff progress was made. However, most optimization passes calculate progress per-function and then return the union of those calculations. In the case that an optimization pass makes progress only on a subset of the functions in the shader metadata validation will detect the debug bit is still set on any unchanged functions resulting in a failed assertion. This patch offers a quick solution (short of a larger scale refactoring which I do not wish to undertake as part of this series) that simply unsets the debug bit on unchanged functions. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: merge some basic consecutive ifsTimothy Arceri2019-01-031-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
* nir: add rewrite_phi_predecessor_blocks() helperTimothy Arceri2019-01-031-20/+31
| | | | | | This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <[email protected]>
* nir: Switch to using 1-bit Booleans for almost everythingJason Ekstrand2018-12-161-1/+1
| | | | | | | | | | | | | | | | This is a squash of a few distinct changes: glsl,spirv: Generate 1-bit Booleans Revert "Use 32-bit opcodes in the NIR producers and optimizations" Revert "nir/builder: Generate 32-bit bool opcodes transparently" nir/builder: Generate 1-bit Booleans in nir_build_imm_bool Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir: fix opt_if_loop_last_continue()Timothy Arceri2018-12-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 */ continue /* succs: block_1 */ } else { block block_9: /* preds: block_7 */ break /* succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <[email protected]>
* nir: add if opt opt_if_loop_last_continue()Danylo Piliaiev2018-12-131-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removing the last continue can allow more loops to unroll. Also inserting code into the if branch can allow the various if opts to progress further. The insertion of some loops into the if branch also reduces VGPR use in some shaders. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 6552 -> 6576 (0.37 %) VGPRS: 6544 -> 6532 (-0.18 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 481952 -> 478032 (-0.81 %) bytes LDS: 13 -> 13 (0.00 %) blocks Max Waves: 241 -> 242 (0.41 %) Wait states: 0 -> 0 (0.00 %) Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 168 -> 168 (0.00 %) VGPRS: 144 -> 140 (-2.78 %) Spilled SGPRs: 157 -> 157 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 8524 -> 8488 (-0.42 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 7 -> 7 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: (Timothy Arceri): - allow for continues in either branch - move any trailing loops inside the if as well as blocks. - leave nir_opt_trivial_continues() to actually remove the continue. Reviewed-by: Thomas Helland <[email protected]> Signed-off-by: Timothy Arceri <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-1/+1
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
* nir: fix condition propagation when src has a swizzleTimothy Arceri2018-11-031-1/+30
| | | | | | | | | | | | | | | | | | We cannot use nir_build_alu() to create the new alu as it has no way to know how many components of the src we will use. This results in it guessing the max number of components from one of its inputs. Fixes the following CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_geom dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_tessc dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_vert Fixes: 2975422ceb6c ("nir: propagates if condition evaluation down some alu chains") Reviewed-by: Jason Ekstrand <[email protected]>
* nir: allow propagation of if evaluation for bcselTimothy Arceri2018-11-021-9/+16
| | | | | | | | | | | | | | | | Shader-db results Skylake: total instructions in shared programs: 13109035 -> 13109024 (<.01%) instructions in affected programs: 4777 -> 4766 (-0.23%) helped: 11 HURT: 0 total cycles in shared programs: 332090418 -> 332090443 (<.01%) cycles in affected programs: 19474 -> 19499 (0.13%) helped: 6 HURT: 4 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix if condition propagation for alu useTimothy Arceri2018-11-011-2/+1
| | | | | | | | | | | We need to update the cursor before we check if the alu use is dominated by the if condition. Previously we were checking if the current location of the alu instruction was dominated by the if condition which would miss some optimisation opportunities. Fixes: a3b4cb34589e ("nir/opt_if: Rework condition propagation") Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Fix array initializerBrian Paul2018-10-261-1/+1
| | | | | | Empty initializer is not standard C. This fixes MSVC build. Trivial.
* nir/opt_if: Rework condition propagationJason Ekstrand2018-10-261-61/+30
| | | | | | | | | | Instead of doing our own constant folding, we just emit instructions and let constant folding happen. This is substantially simpler and lets us use the nir_imm_bool helper instead of dealing with the const_value's ourselves. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* util: use C99 declaration in the for-loop set_foreach() macroEric Engestrom2018-10-251-1/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/opt_if: Re-materialize derefs in use blocks before peeling loopsJason Ekstrand2018-09-191-6/+7
| | | | | | Reviewed-by: Iago Toral Quiroga <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107879 Cc: "18.2" <[email protected]>
* nir: propagates if condition evaluation down some alu chainsTimothy Arceri2018-09-141-0/+128
| | | | | | | | | | | | | | | | | | | | | | | | | v2: - only allow nir_op_inot or nir_op_b2i when alu input is 1. - use some helpers as suggested by Jason. v3: - evaluate alu op for single input alu ops - add helper function to decide if to propagate through alu - make use of nir_before_src in another spot shader-db IVB results: total instructions in shared programs: 9993483 -> 9993472 (-0.00%) instructions in affected programs: 1300 -> 1289 (-0.85%) helped: 11 HURT: 0 total cycles in shared programs: 219476091 -> 219476059 (-0.00%) cycles in affected programs: 7675 -> 7643 (-0.42%) helped: 10 HURT: 1 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: evaluate if condition uses inside the if branchesTimothy Arceri2018-09-141-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we know what side of the branch we ended up on we can just replace the use with a constant. All the spill changes in shader-db are from Dolphin uber shaders, despite some small regressions the change is clearly positive. V2: insert new constant after any phis in the use->parent_instr->type == nir_instr_type_phi path. v3: - use nir_after_block_before_jump() for inserting const - check dominance of phi uses correctly v4: - create some helpers as suggested by Jason. v5 (Jason Ekstrand): - Use LIST_ENTRY to get the phi src shader-db results IVB: total instructions in shared programs: 9999201 -> 9993483 (-0.06%) instructions in affected programs: 163235 -> 157517 (-3.50%) helped: 132 HURT: 2 total cycles in shared programs: 231670754 -> 219476091 (-5.26%) cycles in affected programs: 143424120 -> 131229457 (-8.50%) helped: 115 HURT: 24 total spills in shared programs: 4383 -> 4370 (-0.30%) spills in affected programs: 1656 -> 1643 (-0.79%) helped: 9 HURT: 18 total fills in shared programs: 4610 -> 4581 (-0.63%) fills in affected programs: 374 -> 345 (-7.75%) helped: 6 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: mark *prev_block as MAYBE_UNUSED in opt_peel_loop_initial_ifKai Wasserbäch2018-08-181-1/+1
| | | | | | | | | | | | | Only used, when asserts are enabled. Fixes an unused-variable warning with gcc-8: ../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if': ../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable] nir_block *prev_block = ^~~~~~~~~~ Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Fix end of function without return warning/error.Bas Nieuwenhuizen2018-07-201-0/+2
| | | | | | | | | There always is a continue block, so let us just do unreachable. Reviewed-by: Jason Ekstrand <[email protected]> Fixes: 8cacf38f527 "nir: Do not use continue block after removing it." CC: 18.1 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107312
* nir: Do not use continue block after removing it.Bas Nieuwenhuizen2018-07-201-6/+25
| | | | | | | | | | | | | | | | Reinserting code directly before a jump means the block gets split and merged, removing the original block and replacing it in the process. Hence keeping a pointer to the continue block over a reinsert causes issues. This code changes nir_opt_if to simply look for the new continue block. Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107275 CC: 18.1 <[email protected]>
* nir: delete not needed for reinserted nir_cf_listCaio Marcelo de Oliveira Filho2018-07-121-2/+0
| | | | | | | It wasn't causing problems since there's nothing to delete, but better be consistent with the rest of existing codebase. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_if: Remove unneeded phis if we make progressJason Ekstrand2018-06-261-0/+7
| | | | | | | | | | Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This fixes validation issues with some Vulkan CTS tests with the new deref instructions. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir: add opt_if_loop_terminator()Timothy Arceri2018-06-071-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | This pass detects potential loop terminators and moves intructions from the non breaking branch after the if-statement. This enables both the new opt_if_simplification() pass and loop unrolling to potentially progress further. Unexpectedly this change speed up shader-db run times by ~3% Ivy Bridge shader-db results (all changes in dolphin/ubershaders): total instructions in shared programs: 9995662 -> 9995338 (-0.00%) instructions in affected programs: 87845 -> 87521 (-0.37%) helped: 27 HURT: 0 total cycles in shared programs: 230931495 -> 230925015 (-0.00%) cycles in affected programs: 56391385 -> 56384905 (-0.01%) helped: 27 HURT: 0 Reviewed-by: Ian Romanick <[email protected]>
* nir: implement the GLSL equivalent of if simplication in nir_opt_ifSamuel Pitoiset2018-06-041-5/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass turns: if (cond) { } else { do_work(); } into: if (!cond) { do_work(); } else { } Here's the vkpipeline-db stats (from affected shaders) on Polaris10: Totals from affected shaders: SGPRS: 17272 -> 17296 (0.14 %) VGPRS: 18712 -> 18740 (0.15 %) Spilled SGPRs: 1179 -> 1142 (-3.14 %) Code Size: 1503364 -> 1515176 (0.79 %) bytes Max Waves: 916 -> 911 (-0.55 %) This pass only affects Serious Sam 2017 (Vulkan) on my side. The stats are not really good for now. Some shaders look quite dumb but this will be improved with further NIR passes, like ifs combination. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Rename convert_to_ssa lower_regs_to_ssaJason Ekstrand2016-12-291-1/+1
| | | | This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
* nir: stop gcc warning about uninitialised variablesTimothy Arceri2016-12-291-1/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a pass for moving SPIR-V continue blocks to the ends of loopsJason Ekstrand2016-12-221-0/+256
When shaders come in from SPIR-V, we handle continue blocks by placing the contents of the continue inside of a "if (!first_iteration)". We do this so that we can properly handle the fact that continues in SPIR-V jump to the continue block at the end of the loop rather than jumping directly to the top of the loop like they do in NIR. In particular, the increment step of a simple for loop ends up in the continue block. This pass looks for this case in loops that don't actually have any continues and moves the continue contents to the end of the loop instead. We need this because loop unrolling doesn't work if the increment is inside of a condition. Reviewed-by: Timothy Arceri <[email protected]>