aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* meson: drop unused inc_nirEric Engestrom2019-10-071-1/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: drop duplicate inc_nir from spirv2nirEric Engestrom2019-10-071-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: drop duplicate inc_nir from libglslEric Engestrom2019-10-071-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: rename libnir to _libnir to make it clear it's not meant to be used ↵Eric Engestrom2019-10-071-2/+2
| | | | | | | anywhere else Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir/constant_folding: fold load_constant intrinsicsRhys Perry2019-10-071-0/+58
| | | | | | | | | | | | | | | | | | These can appear after loop unrolling. v2: stylistic changes v2: replace state->mem_ctx with state->shader v2: add bounds checking v3: use nir_intrinsic_range() for bounds checking v3: fix issue where partially out-of-bounds reads are replaced with undefs v4: fix merge conflicts during rebase v5: split into two commits v6: set constant_data to NULL after freeing (fixes nir_sweep()/Iris) v7: don't remove the constant data if there are no constant loads Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v6) Acked-by: Ian Romanick <[email protected]>
* nir/constant_folding: add back and use constant_fold_stateRhys Perry2019-10-071-22/+19
| | | | | | | Useful for load_constant folding. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* spirv: Implement SPV_KHR_shader_clockCaio Marcelo de Oliveira Filho2019-10-072-0/+36
| | | | | | | We only have the subgroup variant in NIR (equivalent to clockARB), so only support that for now. Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: Fix some wonky whitespace in nir_search.h.Eric Anholt2019-10-041-2/+2
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Factor out most of the algebraic passes C code to .c/.h.Eric Anholt2019-10-043-146/+173
| | | | | | | | | | | Working on the algebraic implementation, I was being driven nuts by my editor not highlighting and handling indentation for the C code. It turns out that it's basically not pass-specific code, and we can move it over to the relevant .c file. Replaces 30KB of code with 34KB of data on my i965 build. No perf diff on shader-db (n=3) Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Keep the range analysis HT around intra-pass until we make a change.Eric Anholt2019-10-047-38/+52
| | | | | | | | | This lets us memoize range analysis work across instructions. Reduces runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3). Fixes: 405de7ccb6cb ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Skip emitting no-op movs from the builder.Eric Anholt2019-10-042-3/+12
| | | | | | | | | | | Having passes generate these is just making more work for copy propagation (and thus probably calling more optimization passes) later. Noticed while trying to debug nir_opt_algebraic() top-to-bottom having O(n^2) behavior due to not finding new matches in replacement code. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Make nir_search's dumping go to stderr.Eric Anholt2019-10-041-16/+16
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/print: always use the right FILE *Rhys Perry2019-10-041-2/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: initialize needs_helper_invocations as wellErik Faye-Lund2019-10-041-0/+1
| | | | | | | Similar to the previous commit, we should also initialize needs_helper_invocations here. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: initialize uses_discard to falseErik Faye-Lund2019-10-041-0/+1
| | | | | | | | This matches what we do for uses_sample_qualifier, and what we do in ir_set_program_inouts.cpp as well. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add helperInvocationEXT() builtinCaio Marcelo de Oliveira Filho2019-09-303-0/+47
| | | | | | | | | | | From EXT_demote_to_helper_invocation, implemented with the existing nir_intrinsic_is_helper_invocation. Such builtin is necessary when using `demote` because we can't redefine the value of gl_HelperInvocation (since it is an input variable). Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Parse `demote` statementCaio Marcelo de Oliveira Filho2019-09-305-1/+49
| | | | | | | When the EXT_demote_to_helper_invocation extension is enabled, `demote` is treated as a keyword, and produces an ir_demote. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add ir_demoteCaio Marcelo de Oliveira Filho2019-09-309-0/+81
| | | | | | | | | | | | | | | To represent the new `demote` keyword when using EXT_demote_to_helper_invocation extension. Most of the changes are to include it in the visitors. Demote is not considered a control flow, so also include an empty visit member function in ir_control_flow_visitor. Only NIR actually supports `demote`, so assert the translations for TGSI and Mesa's gl_program -- since the demote is not expected to appear for those. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Extension boilerplate for EXT_demote_to_helper_invocationCaio Marcelo de Oliveira Filho2019-09-302-0/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Remove unnecessary subtraction optimizationsDaniel Schürmann2019-09-301-10/+0
| | | | | | | These optimizations are already covered after lowering. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: recombine nir_op_*sub when lower_sub = falseDaniel Schürmann2019-09-301-8/+13
| | | | | | | | | | | | | There are some optimizations which are only implemented for additions and some optimizations which assume that subtractions have been lowered. By lowering all subtractions first and later recombine for backends which prefer this option, we don't have to implement them twice. This patch also moves lower_negate to nir_opt_algebraic_late() to enable these optimizations for backends which make use of it. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* android: compiler/nir: build nir_divergence_analysis.cMauro Rossi2019-09-281-0/+1
| | | | | | | | | | | | | Prerequisite to avoid following radv linking error happening with aco FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so ... external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178: error: undefined reference to 'nir_divergence_analysis' clang.real: error: linker command failed with exit code 1 (use -v to see invocation) Fixes: df86c5f ("nir: add divergence analysis pass.") Signed-off-by: Mauro Rossi <[email protected]>
* glsl: disallow incompatible matrices multiplicationAndrii Simiklit2019-09-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | glsl 4.4 spec section '5.9 expressions': "The operator is multiply (*), where both operands are matrices or one operand is a vector and the other a matrix. A right vector operand is treated as a column vector and a left vector operand as a row vector. In all these cases, it is required that the number of columns of the left operand is equal to the number of rows of the right operand. Then, the multiply (*) operation does a linear algebraic multiply, yielding an object that has the same number of rows as the left operand and the same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations” explains in more detail how vectors and matrices are operated on." This fix disallows a multiplication of incompatible matrices like: mat4x3(..) * mat4x3(..) mat4x2(..) * mat4x2(..) mat3x2(..) * mat3x2(..) .... CC: <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664 Signed-off-by: Andrii Simiklit <[email protected]>
* shader_enums: Move MAX_DRAW_BUFFERS to this file.Eric Anholt2019-09-271-1/+3
| | | | | | | | | We include shader_enums.h from freedreno's compiler for both GL and Vulkan, and the main/config.h include resulted in polluting the namespace with things like MAX_VIEWPORTS that other Vulkan drivers use as their driver-specific maximums. Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir/range-analysis: Use types to provide better ranges from bcsel and movIan Romanick2019-09-251-25/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16328255 -> 16315391 (-0.08%) instructions in affected programs: 218318 -> 205454 (-5.89%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 72 x̄: 13.02 x̃: 10 helped stats (rel) min: 0.33% max: 16.04% x̄: 6.27% x̃: 4.88% 95% mean confidence interval for instructions value: -13.69 -12.35 95% mean confidence interval for instructions %-change: -6.55% -5.99% Instructions are helped. total cycles in shared programs: 363683977 -> 363615417 (-0.02%) cycles in affected programs: 1475193 -> 1406633 (-4.65%) helped: 923 HURT: 36 helped stats (abs) min: 1 max: 624 x̄: 75.78 x̃: 48 helped stats (rel) min: 0.08% max: 13.89% x̄: 5.20% x̃: 5.08% HURT stats (abs) min: 1 max: 179 x̄: 38.58 x̃: 4 HURT stats (rel) min: 0.06% max: 16.56% x̄: 3.33% x̃: 0.29% 95% mean confidence interval for cycles value: -75.88 -67.10 95% mean confidence interval for cycles %-change: -5.10% -4.66% Cycles are helped. Sandy Bridge total instructions in shared programs: 10785779 -> 10785654 (<.01%) instructions in affected programs: 13855 -> 13730 (-0.90%) helped: 67 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.87 x̃: 1 helped stats (rel) min: 0.20% max: 3.45% x̄: 0.97% x̃: 0.78% 95% mean confidence interval for instructions value: -2.47 -1.26 95% mean confidence interval for instructions %-change: -1.13% -0.81% Instructions are helped. total cycles in shared programs: 153704799 -> 153704481 (<.01%) cycles in affected programs: 101509 -> 101191 (-0.31%) helped: 38 HURT: 13 helped stats (abs) min: 1 max: 38 x̄: 12.53 x̃: 16 helped stats (rel) min: 0.07% max: 2.69% x̄: 0.87% x̃: 0.53% HURT stats (abs) min: 1 max: 36 x̄: 12.15 x̃: 7 HURT stats (rel) min: 0.06% max: 2.53% x̄: 0.73% x̃: 0.44% 95% mean confidence interval for cycles value: -10.24 -2.24 95% mean confidence interval for cycles %-change: -0.75% -0.17% Cycles are helped. LOST: 2 GAINED: 0 No shader-db change on Iron Lake or GM45.
* nir/range-analysis: Use types in the hash keyIan Romanick2019-09-251-38/+98
| | | | | | | This allows the reslut of mov and bcsel to be separately interpreted as float or int depending on the use. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/range-analysis: Bail if the types don't matchIan Romanick2019-09-251-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders are hurt by this change because now a load_const(0x00000000) is not recognized as eq_zero when loaded as a float. This behavior is restored in a later patch (nir/range-analysis: Use types to provide better ranges from bcsel and mov). v2: Add a comment about reinterpretation of int/uint/bool. Suggested by Caio. Rewrite condition the check for types being float versus checking for types not being all the things that aren't float. Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16327543 -> 16328255 (<.01%) instructions in affected programs: 55928 -> 56640 (1.27%) helped: 0 HURT: 208 HURT stats (abs) min: 1 max: 16 x̄: 3.42 x̃: 3 HURT stats (rel) min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12% 95% mean confidence interval for instructions value: 3.06 3.79 95% mean confidence interval for instructions %-change: 1.17% 1.46% Instructions are HURT. total cycles in shared programs: 363682759 -> 363683977 (<.01%) cycles in affected programs: 325758 -> 326976 (0.37%) helped: 44 HURT: 133 helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5 helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29% HURT stats (abs) min: 1 max: 157 x̄: 20.28 x̃: 14 HURT stats (rel) min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73% 95% mean confidence interval for cycles value: 0.38 13.39 95% mean confidence interval for cycles %-change: -0.06% 0.96% Inconclusive result (%-change mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10787433 -> 10787443 (<.01%) instructions in affected programs: 1842 -> 1852 (0.54%) helped: 0 HURT: 10 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.36% 1.10% Instructions are HURT. total cycles in shared programs: 153724543 -> 153724563 (<.01%) cycles in affected programs: 8407 -> 8427 (0.24%) helped: 1 HURT: 3 helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98% HURT stats (abs) min: 4 max: 18 x̄: 12.67 x̃: 16 HURT stats (rel) min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72% 95% mean confidence interval for cycles value: -21.31 31.31 95% mean confidence interval for cycles %-change: -1.11% 1.46% Inconclusive result (value mean confidence interval includes 0). No shader-db changes on Iron Lake or GM45.
* glsl: turn runtime asserts of compile-time value into compile-time assertsEric Engestrom2019-09-251-6/+12
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]>
* nir: Fix overlapping vars in nir_assign_io_var_locations()Connor Abbott2019-09-251-1/+1
| | | | | | | | | | | | | | | | | When handling two variables with overlapping locations, we process the one with lower location first, and then extend the location -> driver_location map to guarantee that it's contiguous for the second variable too. But the loop had the wrong bound, so we weren't extending the map 100%, which could lead to problems later such as an incorrect num_inputs. The loop index i is an index into the slots of the variable, so we need to stop at the final slot of the variable (var_size) instead of the number of unassigned slots. This fixes spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-interleave-range on radeonsi NIR. Reviewed-by: Marek Olšák <[email protected]>
* glsl: correct bitcast-helpersErik Faye-Lund2019-09-251-2/+2
| | | | | | | | | | | | Without this, we'll incorrectly round off huge values to the nearest representable double instead of keeping it at the exact value as we're supposed to. Found by inspecting compiler-warnings. Signed-off-by: Erik Faye-Lund <[email protected]> Fixes: 85faf5082f ("glsl: Add 64-bit integer support for constant expressions") Reviewed-by: Eric Engestrom <[email protected]>
* nir/opt_remove_phis: handle phis with no sourcesRhys Perry2019-09-251-5/+6
| | | | | | | | | | | This can happen with loops with unreachable exits which are later optimized away. Fixes assertion in dEQP-VK.graphicsfuzz.unreachable-loops with RADV. Cc: [email protected] Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_large_constants: Handle store writemasksConnor Abbott2019-09-241-20/+24
| | | | | | | | | | | | | | | This fixes some piglit tests on radeonsi NIR where a varying is initialized to a constant array in the vertex shader. Varying packing after nir_lower_io_to_temporaries creates writemasked stores which persist after pulling the constant initialization down into the fragment shader. While we're here, rewrite handle_constant_store() to do the loop over components outside the switch, so that we don't have to duplicate the writemask checking for every bitsize. Fixes: 1235850522c ("nir: Add a large constants optimization pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: define 8-byte size and alignment for bindless variablesMarek Olšák2019-09-231-1/+6
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: don't add bindless variables to num_textures and num_imagesMarek Olšák2019-09-231-0/+4
| | | | | | It confuses radeonsi. Reviewed-by: Connor Abbott <[email protected]>
* nir/repair_ssa: Replace the unreachable check with the phi builderJason Ekstrand2019-09-231-35/+44
| | | | | | | | | | | | | | | In a3268599f3c9, I attempted to fix nir_repair_ssa for unreachable blocks. However, that commit missed the possibility that the use is in a block which, itself, is unreachable. In this case, we can end up in an infinite loop trying to replace a def with itself. Even though a no-op replacement is a fine operation, it keeps extending the end of the uses list as we're walking it. Instead of explicitly checking for the group of conditions, just check if the phi builder gives us a different def. That's guaranteed to be 100% reliable and, while it lacks symmetry with the is_valid checks, should be more reliable. Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..." Reviewed-by: Ian Romanick <[email protected]>
* nir/algebraic: Additional D3D Boolean optimizationIan Romanick2019-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I observed this pattern in several shaders in Hand of Fate 2 while investigating bugzilla #111490. This also led to the related bugzilla #111578. The shaders from HoF2 are *not* in shader-db. Reviewed-by: Kenneth Graunke <[email protected]> Skylake and Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 16222621 -> 16205419 (-0.11%) instructions in affected programs: 798418 -> 781216 (-2.15%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35 helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09% 95% mean confidence interval for instructions value: -33.22 -29.56 95% mean confidence interval for instructions %-change: -3.11% -2.56% Instructions are helped. total cycles in shared programs: 364676209 -> 363345763 (-0.36%) cycles in affected programs: 112810504 -> 111480058 (-1.18%) helped: 546 HURT: 7 helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340 helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08% HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43 HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35% 95% mean confidence interval for cycles value: -2884.33 -1927.41 95% mean confidence interval for cycles %-change: -1.59% -1.21% Cycles are helped. total spills in shared programs: 8870 -> 8514 (-4.01%) spills in affected programs: 1230 -> 874 (-28.94%) helped: 161 HURT: 0 total fills in shared programs: 21901 -> 21348 (-2.52%) fills in affected programs: 2120 -> 1567 (-26.08%) helped: 155 HURT: 5 Broadwell and Haswell had similar results. (Broadwell shown) total instructions in shared programs: 14994910 -> 14975495 (-0.13%) instructions in affected programs: 839033 -> 819618 (-2.31%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49 helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22% 95% mean confidence interval for instructions value: -37.46 -33.40 95% mean confidence interval for instructions %-change: -3.12% -2.70% Instructions are helped. total cycles in shared programs: 386032453 -> 384450722 (-0.41%) cycles in affected programs: 117807357 -> 116225626 (-1.34%) helped: 547 HURT: 6 helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926 helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31% HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29 HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65% 95% mean confidence interval for cycles value: -3060.28 -2660.27 95% mean confidence interval for cycles %-change: -1.59% -1.37% Cycles are helped. total spills in shared programs: 23372 -> 21869 (-6.43%) spills in affected programs: 11730 -> 10227 (-12.81%) helped: 352 HURT: 0 total fills in shared programs: 34747 -> 35351 (1.74%) fills in affected programs: 11013 -> 11617 (5.48%) helped: 3 HURT: 347 Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956420 -> 11956126 (<.01%) instructions in affected programs: 14898 -> 14604 (-1.97%) helped: 98 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -2.18% -1.98% Instructions are helped. total cycles in shared programs: 178791217 -> 178790792 (<.01%) cycles in affected programs: 149763 -> 149338 (-0.28%) helped: 91 HURT: 7 helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16 helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18% HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322 HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41% 95% mean confidence interval for cycles value: -18.94 10.27 95% mean confidence interval for cycles %-change: -1.28% 0.49% Inconclusive result (value mean confidence interval includes 0).
* nir/algebraic: Do not apply late DPH optimization in vertex processing stagesIan Romanick2019-09-191-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders do not use 'invariant' in vertex and (possibly) geometry shader stages on some outputs that are intended to be invariant. For various reasons, this optimization may not be fully applied in all shaders used for different rendering passes of the same geometry. This can result in Z-fighting artifacts (at best). For now, disable this optimization in these stages. In tessellation stages applications seem to use 'precise' when necessary, so allow the optimization in those stages. Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490 Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16194726 -> 16344745 (0.93%) instructions in affected programs: 2855172 -> 3005191 (5.25%) helped: 6 HURT: 20279 helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44% HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7 HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56% 95% mean confidence interval for instructions value: 7.34 7.45 95% mean confidence interval for instructions %-change: 8.48% 8.67% Instructions are HURT. total cycles in shared programs: 364471296 -> 365014683 (0.15%) cycles in affected programs: 32421530 -> 32964917 (1.68%) helped: 2925 HURT: 16144 helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5 helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15% HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15 HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87% 95% mean confidence interval for cycles value: 21.58 35.41 95% mean confidence interval for cycles %-change: 4.36% 4.52% Cycles are HURT.
* Move blob from compiler/ to util/Jason Ekstrand2019-09-198-1095/+2
| | | | | | | | There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/algebraic: refactor inexact opcode restrictionsSamuel Iglesias Gonsálvez2019-09-191-3/+5
| | | | | | | | | | Refactor the code to avoid calling a lot of time to auxiliary functions when it is not really needed. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* spirv: Add missing break for capability handlingCaio Marcelo de Oliveira Filho2019-09-181-0/+1
| | | | | | | New added cases "stole" the previous break. Fixes: 420ad0a1a3d ("spirv: check support for SPV_KHR_float_controls capabilities") Reviewed-by: Eric Engestrom <[email protected]>
* nir/opt_if: Fix undef handling in opt_split_alu_of_phi()Connor Abbott2019-09-181-55/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pass assumed that "Most ALU ops produce an undefined result if any source is undef" which is completely untrue. Due to how we lower if statements to selects and then optimize on those selects later, we simply cannot make that assumption. In particular this pass tried to replace an ior of undef and true, which had been generated by optimizing a select which itself came from flattening an if statement, to undef causing a miscompilation for a CTS test with radeonsi NIR. We fix this by always doing what the non-undef path did, i.e. duplicate the instruction twice. If there are cases where the instruction before the loop can be folded away due to having an undef source, we should add these to opt_undef instead. The comment above the pass says that if the phi source from before the loop is undef, and we can fold the instruction before the loop to undef, then we can ignore sources of the original instruction that don't dominate the block before the loop because we don't need them to create the instruction before the loop. This is incorrect, because the instruction at the bottom of the loop would get those sources from the wrong loop iteration. The code never actually did what the comment said, so we only have to update the comment to match what the pass actually does. We also update the example to more closely match what most actual loops look like after vtn and peephole_select. There are no shader-db changes with i965, radeonsi NIR, or radv. With anv and my vkpipeline-db there's only one change: total instructions in shared programs: 14125290 -> 14125300 (<.01%) instructions in affected programs: 2598 -> 2608 (0.38%) helped: 0 HURT: 1 total cycles in shared programs: 2051473437 -> 2051473397 (<.01%) cycles in affected programs: 36697 -> 36657 (-0.11%) helped: 1 HURT: 0 Fixes KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input with radeonsi NIR.
* nir/opcodes: Clear variable names confusionAndres Gomez2019-09-181-10/+15
| | | | | | | | | | | Having Python and C variables sharing name in the same block of code makes its understanding a bit confusing. Make it explicit that the Python bit_size variable refers to the destination bit size. Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: fix fmin/fmax support for doublesSamuel Iglesias Gonsálvez2019-09-171-2/+2
| | | | | | | | Until now, it was using the floating point version of fmin/fmax, instead of the double version. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: fix denorm flush-to-zero in sqrt's lowering at nir_lower_double_opsSamuel Iglesias Gonsálvez2019-09-171-2/+15
| | | | | | | | | | | | | v2: - Replace hard coded value with DBL_MIN (Connor). v3: - Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64 flag (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v2]
* nir: fix denorms in unpack_half_1x16()Samuel Iglesias Gonsálvez2019-09-174-7/+45
| | | | | | | | | | | | | | | | | | | | | | | | | According to VK_KHR_shader_float_controls: "Denormalized values obtained via unpacking an integer into a vector of values with smaller bit width and interpreting those values as floating-point numbers must: be flushed to zero, unless the entry point is declared with the code:DenormPreserve execution mode." v2: - Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor). v3: - Adapt to use the new NIR lowering framework (Andres). v4: - Updated to renamed shader info member and enum values (Andres). v5: - Simplify flags logic operations (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v2]
* nir/algebraic: disable inexact optimizations depending on float controls ↵Samuel Iglesias Gonsálvez2019-09-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | execution mode If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the inexact optimizations so the VK_KHR_shader_float_controls execution mode is respected. v2: - Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is enabled (Andres). v3: - Updated to renamed shader info member (Andres). v4: - Directly access execution mode instead of dragging it by parameter (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v1]
* nir/algebraic: mark float optimizations returning one parameter as inexactAndres Gomez2019-09-171-8/+8
| | | | | | | | | | | | | | | With the arrival of VK_KHR_shader_float_controls algebraic optimizations for float types of the form (('fop', a, b), a) become inexact depending on the execution mode. For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case of a denorm value for the "a" parameter, we cannot return it still as a denorm, it needs to be flushed to zero. Therefore, we mark now all those operations as inexact. Suggested-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/constant_expressions: mind rounding mode converting from float to ↵Samuel Iglesias Gonsálvez2019-09-171-2/+10
| | | | | | | | | | | | float16 destinations v2: - Move the op-code specific knowledge to nir_opcodes.py even if it means a rount trip conversion (Connor). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/opcodes: make sure f2f16_rtz and f2f16_rtne behavior is not overriden by ↵Samuel Iglesias Gonsálvez2019-09-171-1/+20
| | | | | | | | | the float controls execution mode Suggested-by: Connor Abbott <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: mind rounding mode on fadd, fsub, fmul and fma opcodesSamuel Iglesias Gonsálvez2019-09-172-4/+46
| | | | | | | | | | | | | | | | | | | | | | According to Vulkan spec, the new execution modes affect only correctly rounded SPIR-V instructions, which includes fadd, fsub and fmul. v2: - Fix fmul, fsub and fadd round-to-zero definitions, they should use auxiliary functions to calculate the proper value because Mesa uses round-to-nearest-even rounding mode by default (Connor). v3: - Do an actual fused multiply-add at ffma (Connor). v4: - Simplify fadd and fmul for bit sizes < 64 (Connor). - Do not use double ffma for 32 bits float (Connor). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v3]