summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* nir/lower_io_to_vector: add flat modeRhys Perry2019-09-061-47/+204
| | | | | | | | | | | | | | | | | | | | | This has lower_io_to_vector try to turn variables into arrays of 4-sized vectors when possible and fall back to the old approach when that isn't possible. This is so that lower_io_to_vector can guarantee that only one variable is used for each fragment shader output. v2: handle dual-source blending v3: don't try to merge structs and non-32-bit types in get_flat_type() v3: fix per-vertex inputs v3: fix and cleanup location advancement in get_flat_type() and it's calling code v4: prioritize the original mode over the flat mode v4: don't create flat variables to merge only one variable v5: don't skip an entire slot when encountering structs in the old mode Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_io_to_vector: allow FS outputs to be vectorizedRhys Perry2019-09-062-27/+33
| | | | | | | | v2: handle dual-source blending v3: use a higher MAX_SLOTS Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* glsl: Fix unroll of do{} while(false) like loopsDanylo Piliaiev2019-09-062-17/+41
| | | | | | | | | | | | | | | For loops which condition is false on the first iteration iteration count was falsely calculated under the assumption that loop's condition is true until it becomes false, meaning it's true at least one time. Now such loops are reported as having 0 iteration. Similar to the fix e71fc7f2 done in NIR. Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Carve out nir_lower_samplers from GLSL code.Timur Kristóf2019-09-065-127/+159
| | | | | | | | | | | | Lowering samplers is needed to produce NIR that can actually be consumed by some gallium drivers, so it doesn't make sense to to keep it only in the GLSL code. This commit introduces nir_lower_samplers to compiler/nir, while maintains the GL-specific function too. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_explicit_io: Handle 1 bit loads and storesCaio Marcelo de Oliveira Filho2019-09-051-9/+24
| | | | | | | | | | | | | | Load a 32-bit value then convert to 1-bit. Convert 1-bit to 32-bit value, then Store it. These cases started to appear when we changed Anvil to use derefs for shared memory. v2: Use `bit_size` in a couple of places we were missing. (Jason) Reassign `value` instead of `src[0]`. (Jason) Fixes: 024a46a4079 ("anv: use derefs for shared memory access") Reviewed-by: Jason Ekstrand <[email protected]>
* nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick2019-09-062-6/+16
| | | | | | | | | | | | | Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* gallium: Plumb through a way to disable GLSL const loweringConnor Abbott2019-09-051-1/+2
| | | | | | | | | | For radeonsi, we will prefer the NIR pass as it'll generate better code (some index calculation and a single load vs. a load, then index calculation, then another load) and oftentimes NIR optimization can kick in and make all the access indices constant. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Store the precision for a function return typeNeil Roberts2019-09-043-1/+30
| | | | | | | | | The precision for a function return type is now stored in ir_function_signature. This will later be useful to implement mediump to float16 lowering. In the meantime it is also useful to catch errors where a function is redeclared with a different precision. Reviewed-by: Timothy Arceri <[email protected]>
* nir: fix memleak in error pathEric Engestrom2019-09-041-1/+3
| | | | | | | Fixes: 2cf59861a8128a91bfdd ("nir: Add partial redundancy elimination for compares") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: remove unused constant_fold_stateRob Clark2019-09-031-6/+0
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Fix num_ssbos when lowering atomic countersConnor Abbott2019-09-031-0/+21
| | | | | | | | | | | | Otherwise it's impossible to know the maximum SSBO index for both internal TGSI shaders from TTN (which don't have any notion of atomic counters and no offset) as well as shaders from GLSL. I fixed everything I could find while grepping for num_ssbos and num_abos, which hopefully is everything (iris was the only user I could find that uses it in a meaningful way). Reviewed-by: Marek Olšák <[email protected]>
* nir: do not assume that the result of fexp2(a) is always an integralSamuel Pitoiset2019-09-021-0/+1
| | | | | | | | | It's only correct when 'a' is an integral greater or equal to 0. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493 Fixes: 5544b2cbbd2 ("nir/algebraic: Use value range analysis to eliminate useless unary ops") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: replace 'x + (-x)' with constant 0Pierre-Eric Pelloux-Prayer2019-08-291-0/+12
| | | | | | | | | | | | | This fixes a hang in shadertoy for radeonsi where a buffer was initialized with: value -= value with value being undefined. In this case LLVM replace the operation with an assignment to NaN. Cc: 19.1 19.2 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111241 Reviewed-by: Marek Olšák <[email protected]>
* nir/range-analysis: Add a lot more assertions about the contents of tablesIan Romanick2019-08-291-6/+128
| | | | | | | | | v2: Update several of the comments. Drop some redundant uses of ASSERT_UNION_OF_OTHERS_MATCHES_UNKNOWN_*_SOURCE source. Suggested by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/range-analysis: Range tracking for fpowIan Romanick2019-08-291-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One shader from Metro Last Light and the rest from Rochard. In the Rochard cases, something like: min(1.0, max(pow(saturate(x), y), z)) was transformed to saturate(max(pow(saturate(x), y), z)) because the result of the pow must be >= 0. The Metro Last Light case was similar. An instance of min(pow(abs(x), y), 1.0) became saturate(pow(abs(x), y)) v2: Fix some comments. Suggested by Caio. v3: Fix setting is_intgral when the exponent might be negative. See also Mesa MR !1778. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16280670 -> 16280659 (<.01%) instructions in affected programs: 1130 -> 1119 (-0.97%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.72% max: 1.43% x̄: 1.03% x̃: 0.97% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.19% -0.86% Instructions are helped. total cycles in shared programs: 367168430 -> 367168270 (<.01%) cycles in affected programs: 10281 -> 10121 (-1.56%) helped: 10 HURT: 1 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 1.31% max: 2.43% x̄: 1.79% x̃: 1.70% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 3.10% max: 3.10% x̄: 3.10% x̃: 3.10% 95% mean confidence interval for cycles value: -20.06 -9.04 95% mean confidence interval for cycles %-change: -2.36% -0.32% Cycles are helped.
* nir/range-analysis: Handle constants in nir_op_mov just like nir_op_bcselIan Romanick2019-08-291-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I discovered this while looking at a shader that was hurt by some other work I'm doing. When I examined the changes, I was confused that one instance of a comparison that was used in a discard_if was (incorrectly) eliminated, while another instance used by a bcsel was (correctly) not eliminated. I had to use NIR_PRINT=true to see exactly where things when wrong. A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and Strike Suit Zero were impacted. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass") All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16280659 -> 16281075 (<.01%) instructions in affected programs: 21042 -> 21458 (1.98%) helped: 0 HURT: 136 HURT stats (abs) min: 1 max: 9 x̄: 3.06 x̃: 3 HURT stats (rel) min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03% 95% mean confidence interval for instructions value: 2.93 3.19 95% mean confidence interval for instructions %-change: 2.08% 2.37% Instructions are HURT. total cycles in shared programs: 367168270 -> 367170313 (<.01%) cycles in affected programs: 172020 -> 174063 (1.19%) helped: 14 HURT: 111 helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9 helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79% HURT stats (abs) min: 2 max: 584 x̄: 21.08 x̃: 5 HURT stats (rel) min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40% 95% mean confidence interval for cycles value: 5.41 27.28 95% mean confidence interval for cycles %-change: 0.64% 1.81% Cycles are HURT.
* nir/range-analysis: Fix incorrect fadd range result for (ne_zero, ne_zero)Ian Romanick2019-08-291-3/+8
| | | | | | | | | | | | | | | Found by inspection. I tried really, really hard to make a test case that would trigger this problem, but I was unsuccesful. It's very hard to get an instruction to produce a ne_zero result without ne_zero sources. The most plausible way is using bcsel. That proves problematic because bcsel interprets its sources as integers, so it cannot currently be used to "clean" values for floating point instructions. No shader-db changes on any Intel platform. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
* nir/range-analysis: Adjust result range of multiplication to account for ↵Ian Romanick2019-08-291-31/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush-to-zero Fixes piglit tests (new in piglit!110): - fs-underflow-fma-compare-zero.shader_test - fs-underflow-mul-compare-zero.shader_test v2: Add back part of comment accidentally deleted. Noticed by Caio. Remove is_not_zero function as it is no longer used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: fa116ce357b ("nir/range-analysis: Range tracking for ffma and flrp") Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> All Gen7+ platforms** had similar results. (Ice Lake shown) total instructions in shared programs: 16278465 -> 16279492 (<.01%) instructions in affected programs: 16765 -> 17792 (6.13%) helped: 0 HURT: 23 HURT stats (abs) min: 7 max: 275 x̄: 44.65 x̃: 8 HURT stats (rel) min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62% 95% mean confidence interval for instructions value: 9.57 79.74 95% mean confidence interval for instructions %-change: 1.85% 6.61% Instructions are HURT. total cycles in shared programs: 367135159 -> 367154270 (<.01%) cycles in affected programs: 279306 -> 298417 (6.84%) helped: 0 HURT: 23 HURT stats (abs) min: 13 max: 6029 x̄: 830.91 x̃: 54 HURT stats (rel) min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49% 95% mean confidence interval for cycles value: 100.89 1560.94 95% mean confidence interval for cycles %-change: 0.94% 13.71% Cycles are HURT. total spills in shared programs: 8870 -> 8869 (-0.01%) spills in affected programs: 19 -> 18 (-5.26%) helped: 1 HURT: 0 total fills in shared programs: 21904 -> 21901 (-0.01%) fills in affected programs: 81 -> 78 (-3.70%) helped: 1 HURT: 0 LOST: 0 GAINED: 1 ** On Broadwell, a shader was hurt for spills / fills instead of helped. No changes on any earlier platforms.
* nir/range-analysis: Adjust result range of exp2 to account for flush-to-zeroIan Romanick2019-08-291-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes piglit tests (new in piglit!110): - fs-underflow-exp2-compare-zero.shader_test Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Most of the shaders affected are, unsurprisingly, in Unigine Heaven. All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16278207 -> 16278465 (<.01%) instructions in affected programs: 11374 -> 11632 (2.27%) helped: 0 HURT: 58 HURT stats (abs) min: 2 max: 13 x̄: 4.45 x̃: 4 HURT stats (rel) min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82% 95% mean confidence interval for instructions value: 3.77 5.13 95% mean confidence interval for instructions %-change: 2.19% 2.64% Instructions are HURT. total cycles in shared programs: 367134284 -> 367135159 (<.01%) cycles in affected programs: 81207 -> 82082 (1.08%) helped: 17 HURT: 36 helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6 helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78% HURT stats (abs) min: 4 max: 235 x̄: 66.97 x̃: 16 HURT stats (rel) min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09% 95% mean confidence interval for cycles value: -20.36 53.38 95% mean confidence interval for cycles %-change: -1.08% 4.67% Inconclusive result (value mean confidence interval includes 0). No changes on any earlier platforms.
* nir/algebraic: Clean up value range analysis-based optimizationsIan Romanick2019-08-291-8/+18
| | | | | | | | | Fix the a / b ordering in some compares. Delete duplicate patterns. Add a table explaining things. While I was cleaning this up, I managed to confuse myself. The table helped sort that out. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/algebraic: Mark some value range analysis-based optimizations impreciseIan Romanick2019-08-291-9/+13
| | | | | | | | | | | | | | | | | | This didn't fix bug #111308, but it was found will trying to find the actual cause of that bug. Fixes piglit tests (new in piglit!110): - fs-fract-of-NaN.shader_test - fs-lt-nan-tautology.shader_test - fs-ge-nan-tautology.shader_test No shader-db changes on any Intel platform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: b77070e293c ("nir/algebraic: Use value range analysis to eliminate tautological compares") Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is ↵Ian Romanick2019-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | enabled This caused a problem on Sandybridge where an open-coded bitfieldReverse() function could be optimized to a nir_op_bitfield_reverse that would generate an unsupported BFREV instruction in the backend. This was encountered in some Unreal4 tech demos in shader-db. The bug was not previously noticed because we don't actually try to run those demos on Sandybridge. The fixes tag is a bit a lie. The actual bug was introduced about 26,000 commits earlier in 371c4b3c48f ("nir: Recognize open-coded bitfield_reverse."). Without the NIR lowering pass, the flag needed to avoid the optimization does not exist. Hopefully nobody will care to fix this on an earlier Mesa release. Reviewed-by: Matt Turner <[email protected]> Fixes: 7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
* compiler/glsl: Fix warning about unused functionCaio Marcelo de Oliveira Filho2019-08-231-1/+3
| | | | | | | | | | The helper check_node_type() is only used when DEBUG is set (in the function below), but ASSERTED macro uses NDEBUG. So just guard the helper with #ifdef. If we see more such cases we might consider a ASSERTED-like macro for the DEBUG case. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Remove nir_const_load_to_arrAlyssa Rosenzweig2019-08-221-5/+0
| | | | | | | There are no remaining users in-tree. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2019-08-219-30/+64
| | | | | | | | | | | This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/loop_analyze: Treat do{}while(false) loops as 0 iterationsDanylo Piliaiev2019-08-211-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Loops like: block block_0: vec1 32 ssa_2 = load_const (0x00000020) vec1 32 ssa_3 = load_const (0x00000001) loop { vec1 32 ssa_7 = phi block_0: ssa_3, block_4: ssa_9 vec1 1 ssa_8 = ige ssa_2, ssa_7 if ssa_8 { break } else { } vec1 32 ssa_9 = iadd ssa_7, ssa_1 } Were treated as having more than 1 iteration and after unrolling produced wrong results, however such loop will exit during the first iteration if not unrolled. So we check if loop will actually loop. Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/loop_unroll: Prepare loop for unrolling in wrapper_unrollDanylo Piliaiev2019-08-211-25/+1
| | | | | | | | | Without loop_prepare_for_unroll loops are losing phis. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111411 Fixes: 5db98195 "nir: add loop unroll support for wrapper loops" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/loop_unroll: Update the comments for loop_prepare_for_unrollDanylo Piliaiev2019-08-211-2/+2
| | | | | | | | The comments say that we should remove continue if it is the last intruction in a loop however we remove any kind of jump. Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/algebraic: some subtraction optimizationsDaniel Schürmann2019-08-211-0/+3
| | | | | | | | | | | | | | | | | | Changes with RADV/ACO: Totals from affected shaders: SGPRS: 444087 -> 455543 (2.58 %) VGPRS: 436468 -> 436768 (0.07 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13448928 -> 13353520 (-0.71 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 68060 -> 67979 (-0.12 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* mesa/compiler: rework tear down of builtin/typesLionel Landwerlin2019-08-217-67/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | The issue we're running into when running CTS is that glsl types are deleted while builtins depending on them are not. This happens because on one hand we have glsl types ref counted, but builtins are not. Instead builtins are destroyed when unloading libGL or explicitly calling glReleaseShaderCompiler(). This change removes almost entirely any dealing with glsl types ref/unref by letting the builtins deal with it instead. In turn we introduce a builtin ref count mechanism. Each GL context takes a reference on the builtins when compiling a shader for the first time. It releases the reference when the context is destroyed. It can also explicitly release those when glReleaseShaderCompiler() is called. Finally we also take a reference on the glsl types when loading libGL to avoid recreating glsl types too often. v2: Ensure we take a reference if we don't have one in link step (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110796 Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* compiler: ensure glsl types are not created without a referenceLionel Landwerlin2019-08-211-1/+6
| | | | | | | | | We want to detect invalid refcounting so assert we have at least one use before creating types. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir/tests: take reference on glsl typesLionel Landwerlin2019-08-214-1/+16
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl/tests: take refs on glsl typesLionel Landwerlin2019-08-219-18/+64
| | | | | | | | | Much like each driver, tests as standalone entities must take references on the glsl types. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir: add divergence analysis pass.Daniel Schürmann2019-08-203-0/+799
| | | | | | | | | | This pass expects the shader to be in LCSSA form. The algorithm is based on 'The Simple Divergence Analysis' from Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira. Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/subgroups: Lower clustered reductions with cluster_size >= subgroup_size ↵Rhys Perry2019-08-201-1/+12
| | | | | | | | into reductions The behavior for reductions with cluster_size >= subgroup_size is implementation defined. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lcssa: allow to create LCSSA phis for loop-invariant booleansRhys Perry2019-08-202-3/+7
| | | | | | | ACO depends on LCSSA phis for divergent booleans to work correctly. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lcssa: Skip loop invariant variables when converting to LCSSA.Daniel Schürmann2019-08-202-14/+162
| | | | | | Co-authored-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: make nir_to_lcssa() a general NIR pass.Rhys Perry2019-08-202-3/+42
| | | | | Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lcssa: handle deref instructions properlyDaniel Schürmann2019-08-202-14/+26
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Fixes: 414148cdc124 "nir: Support deref instructions in loop_analyze"
* nir: Add more source types to nir_tex_instr_src_typeJason Ekstrand2019-08-191-3/+14
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* nir: Add missing dependency in Android.nir.gen.mkRoman Stratiienko2019-08-191-0/+1
| | | | | | | Fixes incremental build with Android Signed-off-by: Roman Stratiienko <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl/standalone: init shader stage in init_gl_program()Vasily Khoruzhick2019-08-171-2/+4
| | | | | | | | | | | Otherwise lima standalone compiler fails when trying to compile fragment shader with: lima_compiler: ../src/compiler/nir/nir.c:55: nir_shader_create: Assertion `si->stage == stage' failed Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* nir/algebraic: add a few masking-before-unpack optimizationsRhys Perry2019-08-161-1/+9
| | | | | | | | | | | | | | | | | | | Helps some Dawn of War 3 and F1 2017 shaders with ACO: Totals from affected shaders: SGPRS: 2136 -> 2128 (-0.37 %) VGPRS: 1624 -> 1628 (0.25 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 168068 -> 164332 (-2.22 %) bytes LDS: 44 -> 44 (0.00 %) blocks Max Waves: 222 -> 221 (-0.45 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* win32: unify strcasecmp definitionsErik Faye-Lund2019-08-151-0/+1
| | | | | | | | | There was two incompatible definitions of strcasecmp, which lead to a compiler warning. Let's clean this up by only leaving one of them, and using that one all the time. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: avoid warning when casting bogus pointerErik Faye-Lund2019-08-151-1/+1
| | | | | | | | This intentionally-bogus pointer generates a warning on some 64-bit systems, so let's cast to a properly-sized integer first. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: fixup u64-warningErik Faye-Lund2019-08-151-1/+1
| | | | | | | | | Similarly to the unsigned-version, we need to first cast the result to a suiting integer before negating the number, otherwise we'll trigger a warning. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: add nir tests to the compiler/nir test suiteEric Engestrom2019-08-141-2/+5
| | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/algebraic: Reassociate shift-by-constant of shift-by-constantIan Romanick2019-08-141-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <[email protected]> [v2] Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Reassociate add-and-shift to be shift-and-addIan Romanick2019-08-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 -> 16250704 (-0.17%) instructions in affected programs: 1440284 -> 1413230 (-1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: -5.67 -5.31 95% mean confidence interval for instructions %-change: -2.26% -2.16% Instructions are helped. total cycles in shared programs: 367118526 -> 365895358 (-0.33%) cycles in affected programs: 93504145 -> 92280977 (-1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: -387.31 -220.78 95% mean confidence interval for cycles %-change: -2.11% -1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/find_array_copies: Reject copies with mismatched lengthsAndrii Simiklit2019-08-141-4/+8
| | | | | | | | | | | | | | | | | copy_deref for wildcard dereferences requires the same arrays lengths otherwise it leads to a crash in optimizations like 'nir_opt_copy_prop_vars' because these optimizations expect 'copy_deref' just for arrays with the same lengths. v2: check was moved to 'try_match_deref' to fix aoa cases (Jason Ekstrand <[email protected]>) v3: -fixed comment -the condition merged with other one (Jason Ekstrand <[email protected]>) Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111286 Signed-off-by: Andrii Simiklit <[email protected]>