aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler/nir
Commit message (Collapse)AuthorAgeFilesLines
* nir: allow propagation of if evaluation for bcselTimothy Arceri2018-11-021-9/+16
| | | | | | | | | | | | | | | | Shader-db results Skylake: total instructions in shared programs: 13109035 -> 13109024 (<.01%) instructions in affected programs: 4777 -> 4766 (-0.23%) helped: 11 HURT: 0 total cycles in shared programs: 332090418 -> 332090443 (<.01%) cycles in affected programs: 19474 -> 19499 (0.13%) helped: 6 HURT: 4 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix if condition propagation for alu useTimothy Arceri2018-11-011-2/+1
| | | | | | | | | | | We need to update the cursor before we check if the alu use is dominated by the if condition. Previously we were checking if the current location of the alu instruction was dominated by the if condition which would miss some optimisation opportunities. Fixes: a3b4cb34589e ("nir/opt_if: Rework condition propagation") Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allow using nir_lower_io_to_scalar_early on VS input vars.Eric Anholt2018-10-301-1/+3
| | | | | | | This will be used on V3D to cut down the size of the VS inputs in the VPM (memory area for sharing data between shader stages). Reviewed-by: Timothy Arceri <[email protected]>
* nir: fix yet another MSVC build breakBrian Paul2018-10-291-1/+1
| | | | Trivial.
* nir: Add a pass for gathering transform feedback infoJason Ekstrand2018-10-293-0/+211
| | | | | | | This is different from the GL_ARB_spirv pass because it generates a much simpler data structure that isn't tied to OpenGL and mtypes.h. Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: Fix array initializerBrian Paul2018-10-261-1/+1
| | | | | | Empty initializer is not standard C. This fixes MSVC build. Trivial.
* nir/system_values: Use the bit size from the load_derefJason Ekstrand2018-10-261-0/+1
| | | | | | | | | This isn't a great solution for bit-sizes but we don't have a particularly convenient way to get a bit size from the system value enum and this keeps the lowering pass from changing it. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/opt_if: Rework condition propagationJason Ekstrand2018-10-261-61/+30
| | | | | | | | | | Instead of doing our own constant folding, we just emit instructions and let constant folding happen. This is substantially simpler and lets us use the nir_imm_bool helper instead of dealing with the const_value's ourselves. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/search: Use the nir_imm_* helpers from nir_builderJason Ekstrand2018-10-263-91/+43
| | | | | | | | This requires that we rework the interface a bit to use nir_builder but that's a nice little modernization anyway. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/builder: Handle 16-bit floats in nir_imm_floatN_tJason Ekstrand2018-10-261-0/+14
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/builder: Add a nir_imm_true/false helpersJason Ekstrand2018-10-265-6/+29
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/constant_folding: Use nir_src_as_bool for discard_ifJason Ekstrand2018-10-261-6/+7
| | | | | | | Missed one while converting to the nir_src_as_* helpers. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/constant_folding: Add an unreachable to a switchJason Ekstrand2018-10-261-0/+2
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/validate: Print when the validation failedJason Ekstrand2018-10-263-29/+35
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util: use C99 declaration in the for-loop set_foreach() macroEric Engestrom2018-10-2513-18/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-255-14/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Fix array initializer.Jose Fonseca2018-10-241-1/+1
| | | | | | Empty initializer is not standard C. This fixes MSVC build. Trivial.
* nir: fix nir_copy_propagation testJuan A. Suarez Romero2018-10-241-2/+2
| | | | | | | | | | | | | | | Use nir_src_comp_as_uint() to read the proper second component, as nir_src_as_uint() returns the first one. v2: Use nir_src_comp_as_uint() [Jason] Fixes: 16870de8a0a ("nir: Use nir_src_is_const and nir_src_as_* in core code") Signed-off-by: Juan A. Suarez Romero <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108532 Tested-by: Michel Dänzer <[email protected]> Tested-by: Vinson Lee <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add linking helper nir_link_xfb_varyings()Samuel Pitoiset2018-10-242-0/+34
| | | | | | | | The linking opts shouldn't try removing or compacting XFB varyings in the consumer. To avoid this we copy the always_active_io flag from the producer. Reviewed-by: Timothy Arceri <[email protected]>
* nir/algebraic: Fix a typo in the bit size validation codeJason Ekstrand2018-10-231-2/+2
| | | | | | | | The conon_bit_class and canon_var_class variables got switched. Fixes: 932c650e0b "nir/algebraic: Loosen a restriction on variables" Reported-by: Ian Romanick <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir/algebraic: Provide descriptive asserts for bit size checksJason Ekstrand2018-10-221-9/+42
| | | | | | | This will hopefully make debugging opt_algebraic bit-size compile failures easier. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir/algebraic: Loosen a restriction on variablesJason Ekstrand2018-10-221-2/+6
| | | | | | | | | Previously, we would fail if a variable had an assigned but unknown bit size X and we tried to assign it an actual bit size. However, this is ok because, at the time we do the search, the variable does have an actual bit size and it will match X because of the NIR rules. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir/algebraic: A bit of validation refactoring'Jason Ekstrand2018-10-221-15/+15
| | | | | | | We rename some local variables in validate() to be more readable and plumb the var through to get/set_var_bit_class instead of the var index. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir/algebraic: Make internal classes str-ableJason Ekstrand2018-10-221-4/+12
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir/algebraic: Generalize an optimizationJason Ekstrand2018-10-221-1/+1
| | | | | | There's nothing boolean about (a | ~a) ~> -1 Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir/algebraic: Use bool internally instead of bool32Jason Ekstrand2018-10-222-8/+5
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* nir: Use nir_src_is_const and nir_src_as_* in core codeJason Ekstrand2018-10-2216-71/+54
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/search_helpers: Use nir_src_is_const and friendsJason Ekstrand2018-10-221-27/+22
| | | | | | | This not only makes them safe for more bit sizes but it also fixes a bug in is_zero_to_one where it would return true for constant NaN. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/search: Use nir_src_is_const and friendsJason Ekstrand2018-10-221-57/+13
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Add some new helpers for working with const sourcesJason Ekstrand2018-10-222-0/+108
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: fix clip cull lowering to not assert if GLSL already lowered.Dave Airlie2018-10-151-0/+6
| | | | | | If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Copy propagation between blocksCaio Marcelo de Oliveira Filho2018-10-152-82/+351
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Take call instruction into account in copy_prop_varsCaio Marcelo de Oliveira Filho2018-10-151-6/+12
| | | | | | | | | | | | | | | | | Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add tests for copy propagation of derefsCaio Marcelo de Oliveira Filho2018-10-151-0/+300
| | | | | | | | | | | | Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Remove handling of dead writes from copy_prop_varsCaio Marcelo de Oliveira Filho2018-10-151-76/+8
| | | | | | These are covered by another pass now. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Separate dead write removal into its own passCaio Marcelo de Oliveira Filho2018-10-154-3/+226
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add tests for dead write eliminationCaio Marcelo de Oliveira Filho2018-10-151-0/+241
| | | | | | | | | | | Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add test file for vars related passesCaio Marcelo de Oliveira Filho2018-10-152-0/+201
| | | | | | | | | | | | Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_imm_ivec2 helperCaio Marcelo de Oliveira Filho2018-10-151-0/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Expose nir_remove_unused_io_vars().Eric Anholt2018-10-152-8/+27
| | | | | | | | | | | | | | For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Be sure to fix deref modes after demoting shader i/o vars to global.Eric Anholt2018-10-151-0/+3
| | | | | | | Fixes assertion failures when calling nir_remove_unused_varyings() or nir_remove_unused_io_vars(). Reviewed-by: Timothy Arceri <[email protected]>
* nir: Create sampler2D variables in nir_lower_{bitmap,drawpixels}.Kenneth Graunke2018-10-142-1/+23
| | | | | | | | This is needed for nir_gather_info to actually count the new textures, since it operates solely on variables. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* spirv/nir: handle memory access qualifiers for SSBO loads/storesSamuel Pitoiset2018-10-121-2/+2
| | | | | | | | | | | v2: - change how the access qualifiers are accumulated v3: - duplicate members in struct_member_decoration_cb() - handle access qualifiers on variables - remove access qualifiers handling in _vtn_variable_load_store() - fix setting access qualifiers on type->array_element Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]
* nir: Add a bunch of b2[if] optimizationsJason Ekstrand2018-10-111-0/+17
| | | | | | | | | | | | | | | | | | The b2f and b2i conversions always produce zero or one which are both representable in every type and size. Since b2i and b2f support all bit sizes, we can just get rid of the conversion opcode. total instructions in shared programs: 15089335 -> 15084368 (-0.03%) instructions in affected programs: 212564 -> 207597 (-2.34%) helped: 896 HURT: 0 total cycles in shared programs: 369831123 -> 369826267 (<.01%) cycles in affected programs: 2008647 -> 2003791 (-0.24%) helped: 693 HURT: 216 Reviewed-by: Ian Romanick <[email protected]>
* nir/algebraic: Simplify fsat of fsignIan Romanick2018-10-091-0/+1
| | | | | | | | | | These allows us to not support fsign.sat in the Intel compiler backend, and that will simplify some later changes. No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir/algebraic: sign(x)*x*x is abs(x)*xIan Romanick2018-10-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shader-db results: All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15106023 -> 15105981 (<.01%) instructions in affected programs: 300 -> 258 (-14.00%) helped: 6 HURT: 0 helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00% 95% mean confidence interval for instructions value: -7.00 -7.00 95% mean confidence interval for instructions %-change: -14.00% -14.00% Instructions are helped. total cycles in shared programs: 566050327 -> 566050075 (<.01%) cycles in affected programs: 2826 -> 2574 (-8.92%) helped: 6 HURT: 0 helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42 helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92% 95% mean confidence interval for cycles value: -44.30 -39.70 95% mean confidence interval for cycles %-change: -8.95% -8.88% Cycles are helped. No changes on Gen6 or earlier. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Add helper functions to get the instruction that generated a nir_srcIan Romanick2018-10-091-0/+23
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansionsJason Ekstrand2018-10-041-15/+18
| | | | | | | | | | | | | | | | | The ssa_for_alu_src helper will correctly handle swizzles and other source modifiers for you. The expansions for unpack_half_2x16, pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards to swizzles. The brokenness of unpack_half_2x16 was causing rendering errors in Rise of the Tomb Raider on Intel ever since c11833ab24dcba26 which added an extra copy propagation to the optimization pipeline and caused us to start seeing swizzles where we hadn't seen any before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Fixes: 9ce901058f3d "nir: Add lowering of nir_op_unpack_half_2x16." Fixes: 9b8786eba955 "nir: Add lowering support for packing opcodes." Tested-by: Alex Smith <[email protected]> Tested-by: Józef Kucia <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/from_ssa: Don't rewrite derefs destinations to registersJason Ekstrand2018-10-021-0/+6
| | | | | | | | | | | | We already call nir_rematerialize_derefs_in_use_blocks_impl prior to calling nir_lower_ssa_defs_to_regs_block so the assertion that all deref uses in the block should hold. This fixes the following CTS test when SPIR-V optimization recipe 1: dEQP-VK.glsl.struct.local.loop_nested_struct_array_vertex Fixes: 606eb56ab9449b "intel/nir: Only lower load/store derefs" Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir/cf: Remove phi sources if needed in nir_handle_add_jumpJason Ekstrand2018-10-021-17/+21
| | | | | | | | | | | | If the block in which the jump is inserted is the predecessor of a phi then we need to remove phi sources otherwise the phi may end up with things improperly connected. This fixes the following CTS test when dEQP is run with SPIR-V optimization recipe 1: dEQP-VK.glsl.functions.control_flow.return_in_nested_loop_vertex Cc: [email protected] Reviewed-by: Iago Toral Quiroga <[email protected]>