summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir
Commit message (Collapse)AuthorAgeFilesLines
* nir: add shader_info::last_msaa_imageMarek Olšák2019-10-091-0/+6
| | | | | | for radeonsi Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* nir/sink: Don't sink load_ubo to outside of its defining loopConnor Abbott2019-10-091-5/+32
| | | | | | | | Previously, this could have made the resource divergent in code like that which is genereated by nir_lower_non_uniform_access. Fixes: da8ed68a ('nir: replace nir_move_load_const() with nir_opt_sink()') Reviewed-by: Daniel Schürmann <[email protected]>
* nir/sink: Rewrite loop handling logicConnor Abbott2019-10-091-35/+40
| | | | | | | | | | | | | | | | Previously, for code like: loop { loop { a = load_ubo() } use(a) } adjust_block_for_loops() would return the block before the first loop. Now we compute the range of allowed blocks and then walk the dominance tree directly, guaranteeing directly that we always choose a block that dominates all the uses and is dominated by the definition. Reviewed-by: Daniel Schürmann <[email protected]>
* meson: rename libnir to _libnir to make it clear it's not meant to be used ↵Eric Engestrom2019-10-071-2/+2
| | | | | | | anywhere else Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir/constant_folding: fold load_constant intrinsicsRhys Perry2019-10-071-0/+58
| | | | | | | | | | | | | | | | | | These can appear after loop unrolling. v2: stylistic changes v2: replace state->mem_ctx with state->shader v2: add bounds checking v3: use nir_intrinsic_range() for bounds checking v3: fix issue where partially out-of-bounds reads are replaced with undefs v4: fix merge conflicts during rebase v5: split into two commits v6: set constant_data to NULL after freeing (fixes nir_sweep()/Iris) v7: don't remove the constant data if there are no constant loads Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v6) Acked-by: Ian Romanick <[email protected]>
* nir/constant_folding: add back and use constant_fold_stateRhys Perry2019-10-071-22/+19
| | | | | | | Useful for load_constant folding. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Fix some wonky whitespace in nir_search.h.Eric Anholt2019-10-041-2/+2
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Factor out most of the algebraic passes C code to .c/.h.Eric Anholt2019-10-043-146/+173
| | | | | | | | | | | Working on the algebraic implementation, I was being driven nuts by my editor not highlighting and handling indentation for the C code. It turns out that it's basically not pass-specific code, and we can move it over to the relevant .c file. Replaces 30KB of code with 34KB of data on my i965 build. No perf diff on shader-db (n=3) Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Keep the range analysis HT around intra-pass until we make a change.Eric Anholt2019-10-047-38/+52
| | | | | | | | | This lets us memoize range analysis work across instructions. Reduces runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3). Fixes: 405de7ccb6cb ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Skip emitting no-op movs from the builder.Eric Anholt2019-10-042-3/+12
| | | | | | | | | | | Having passes generate these is just making more work for copy propagation (and thus probably calling more optimization passes) later. Noticed while trying to debug nir_opt_algebraic() top-to-bottom having O(n^2) behavior due to not finding new matches in replacement code. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Make nir_search's dumping go to stderr.Eric Anholt2019-10-041-16/+16
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/print: always use the right FILE *Rhys Perry2019-10-041-2/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: initialize needs_helper_invocations as wellErik Faye-Lund2019-10-041-0/+1
| | | | | | | Similar to the previous commit, we should also initialize needs_helper_invocations here. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: initialize uses_discard to falseErik Faye-Lund2019-10-041-0/+1
| | | | | | | | This matches what we do for uses_sample_qualifier, and what we do in ir_set_program_inouts.cpp as well. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Remove unnecessary subtraction optimizationsDaniel Schürmann2019-09-301-10/+0
| | | | | | | These optimizations are already covered after lowering. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: recombine nir_op_*sub when lower_sub = falseDaniel Schürmann2019-09-301-8/+13
| | | | | | | | | | | | | There are some optimizations which are only implemented for additions and some optimizations which assume that subtractions have been lowered. By lowering all subtractions first and later recombine for backends which prefer this option, we don't have to implement them twice. This patch also moves lower_negate to nir_opt_algebraic_late() to enable these optimizations for backends which make use of it. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/range-analysis: Use types to provide better ranges from bcsel and movIan Romanick2019-09-251-25/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16328255 -> 16315391 (-0.08%) instructions in affected programs: 218318 -> 205454 (-5.89%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 72 x̄: 13.02 x̃: 10 helped stats (rel) min: 0.33% max: 16.04% x̄: 6.27% x̃: 4.88% 95% mean confidence interval for instructions value: -13.69 -12.35 95% mean confidence interval for instructions %-change: -6.55% -5.99% Instructions are helped. total cycles in shared programs: 363683977 -> 363615417 (-0.02%) cycles in affected programs: 1475193 -> 1406633 (-4.65%) helped: 923 HURT: 36 helped stats (abs) min: 1 max: 624 x̄: 75.78 x̃: 48 helped stats (rel) min: 0.08% max: 13.89% x̄: 5.20% x̃: 5.08% HURT stats (abs) min: 1 max: 179 x̄: 38.58 x̃: 4 HURT stats (rel) min: 0.06% max: 16.56% x̄: 3.33% x̃: 0.29% 95% mean confidence interval for cycles value: -75.88 -67.10 95% mean confidence interval for cycles %-change: -5.10% -4.66% Cycles are helped. Sandy Bridge total instructions in shared programs: 10785779 -> 10785654 (<.01%) instructions in affected programs: 13855 -> 13730 (-0.90%) helped: 67 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.87 x̃: 1 helped stats (rel) min: 0.20% max: 3.45% x̄: 0.97% x̃: 0.78% 95% mean confidence interval for instructions value: -2.47 -1.26 95% mean confidence interval for instructions %-change: -1.13% -0.81% Instructions are helped. total cycles in shared programs: 153704799 -> 153704481 (<.01%) cycles in affected programs: 101509 -> 101191 (-0.31%) helped: 38 HURT: 13 helped stats (abs) min: 1 max: 38 x̄: 12.53 x̃: 16 helped stats (rel) min: 0.07% max: 2.69% x̄: 0.87% x̃: 0.53% HURT stats (abs) min: 1 max: 36 x̄: 12.15 x̃: 7 HURT stats (rel) min: 0.06% max: 2.53% x̄: 0.73% x̃: 0.44% 95% mean confidence interval for cycles value: -10.24 -2.24 95% mean confidence interval for cycles %-change: -0.75% -0.17% Cycles are helped. LOST: 2 GAINED: 0 No shader-db change on Iron Lake or GM45.
* nir/range-analysis: Use types in the hash keyIan Romanick2019-09-251-38/+98
| | | | | | | This allows the reslut of mov and bcsel to be separately interpreted as float or int depending on the use. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/range-analysis: Bail if the types don't matchIan Romanick2019-09-251-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders are hurt by this change because now a load_const(0x00000000) is not recognized as eq_zero when loaded as a float. This behavior is restored in a later patch (nir/range-analysis: Use types to provide better ranges from bcsel and mov). v2: Add a comment about reinterpretation of int/uint/bool. Suggested by Caio. Rewrite condition the check for types being float versus checking for types not being all the things that aren't float. Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16327543 -> 16328255 (<.01%) instructions in affected programs: 55928 -> 56640 (1.27%) helped: 0 HURT: 208 HURT stats (abs) min: 1 max: 16 x̄: 3.42 x̃: 3 HURT stats (rel) min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12% 95% mean confidence interval for instructions value: 3.06 3.79 95% mean confidence interval for instructions %-change: 1.17% 1.46% Instructions are HURT. total cycles in shared programs: 363682759 -> 363683977 (<.01%) cycles in affected programs: 325758 -> 326976 (0.37%) helped: 44 HURT: 133 helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5 helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29% HURT stats (abs) min: 1 max: 157 x̄: 20.28 x̃: 14 HURT stats (rel) min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73% 95% mean confidence interval for cycles value: 0.38 13.39 95% mean confidence interval for cycles %-change: -0.06% 0.96% Inconclusive result (%-change mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10787433 -> 10787443 (<.01%) instructions in affected programs: 1842 -> 1852 (0.54%) helped: 0 HURT: 10 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.36% 1.10% Instructions are HURT. total cycles in shared programs: 153724543 -> 153724563 (<.01%) cycles in affected programs: 8407 -> 8427 (0.24%) helped: 1 HURT: 3 helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98% HURT stats (abs) min: 4 max: 18 x̄: 12.67 x̃: 16 HURT stats (rel) min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72% 95% mean confidence interval for cycles value: -21.31 31.31 95% mean confidence interval for cycles %-change: -1.11% 1.46% Inconclusive result (value mean confidence interval includes 0). No shader-db changes on Iron Lake or GM45.
* nir: Fix overlapping vars in nir_assign_io_var_locations()Connor Abbott2019-09-251-1/+1
| | | | | | | | | | | | | | | | | When handling two variables with overlapping locations, we process the one with lower location first, and then extend the location -> driver_location map to guarantee that it's contiguous for the second variable too. But the loop had the wrong bound, so we weren't extending the map 100%, which could lead to problems later such as an incorrect num_inputs. The loop index i is an index into the slots of the variable, so we need to stop at the final slot of the variable (var_size) instead of the number of unassigned slots. This fixes spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-interleave-range on radeonsi NIR. Reviewed-by: Marek Olšák <[email protected]>
* nir/opt_remove_phis: handle phis with no sourcesRhys Perry2019-09-251-5/+6
| | | | | | | | | | | This can happen with loops with unreachable exits which are later optimized away. Fixes assertion in dEQP-VK.graphicsfuzz.unreachable-loops with RADV. Cc: [email protected] Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_large_constants: Handle store writemasksConnor Abbott2019-09-241-20/+24
| | | | | | | | | | | | | | | This fixes some piglit tests on radeonsi NIR where a varying is initialized to a constant array in the vertex shader. Varying packing after nir_lower_io_to_temporaries creates writemasked stores which persist after pulling the constant initialization down into the fragment shader. While we're here, rewrite handle_constant_store() to do the loop over components outside the switch, so that we don't have to duplicate the writemask checking for every bitsize. Fixes: 1235850522c ("nir: Add a large constants optimization pass") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: don't add bindless variables to num_textures and num_imagesMarek Olšák2019-09-231-0/+4
| | | | | | It confuses radeonsi. Reviewed-by: Connor Abbott <[email protected]>
* nir/repair_ssa: Replace the unreachable check with the phi builderJason Ekstrand2019-09-231-35/+44
| | | | | | | | | | | | | | | In a3268599f3c9, I attempted to fix nir_repair_ssa for unreachable blocks. However, that commit missed the possibility that the use is in a block which, itself, is unreachable. In this case, we can end up in an infinite loop trying to replace a def with itself. Even though a no-op replacement is a fine operation, it keeps extending the end of the uses list as we're walking it. Instead of explicitly checking for the group of conditions, just check if the phi builder gives us a different def. That's guaranteed to be 100% reliable and, while it lacks symmetry with the is_valid checks, should be more reliable. Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..." Reviewed-by: Ian Romanick <[email protected]>
* nir/algebraic: Additional D3D Boolean optimizationIan Romanick2019-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I observed this pattern in several shaders in Hand of Fate 2 while investigating bugzilla #111490. This also led to the related bugzilla #111578. The shaders from HoF2 are *not* in shader-db. Reviewed-by: Kenneth Graunke <[email protected]> Skylake and Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 16222621 -> 16205419 (-0.11%) instructions in affected programs: 798418 -> 781216 (-2.15%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35 helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09% 95% mean confidence interval for instructions value: -33.22 -29.56 95% mean confidence interval for instructions %-change: -3.11% -2.56% Instructions are helped. total cycles in shared programs: 364676209 -> 363345763 (-0.36%) cycles in affected programs: 112810504 -> 111480058 (-1.18%) helped: 546 HURT: 7 helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340 helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08% HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43 HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35% 95% mean confidence interval for cycles value: -2884.33 -1927.41 95% mean confidence interval for cycles %-change: -1.59% -1.21% Cycles are helped. total spills in shared programs: 8870 -> 8514 (-4.01%) spills in affected programs: 1230 -> 874 (-28.94%) helped: 161 HURT: 0 total fills in shared programs: 21901 -> 21348 (-2.52%) fills in affected programs: 2120 -> 1567 (-26.08%) helped: 155 HURT: 5 Broadwell and Haswell had similar results. (Broadwell shown) total instructions in shared programs: 14994910 -> 14975495 (-0.13%) instructions in affected programs: 839033 -> 819618 (-2.31%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49 helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22% 95% mean confidence interval for instructions value: -37.46 -33.40 95% mean confidence interval for instructions %-change: -3.12% -2.70% Instructions are helped. total cycles in shared programs: 386032453 -> 384450722 (-0.41%) cycles in affected programs: 117807357 -> 116225626 (-1.34%) helped: 547 HURT: 6 helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926 helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31% HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29 HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65% 95% mean confidence interval for cycles value: -3060.28 -2660.27 95% mean confidence interval for cycles %-change: -1.59% -1.37% Cycles are helped. total spills in shared programs: 23372 -> 21869 (-6.43%) spills in affected programs: 11730 -> 10227 (-12.81%) helped: 352 HURT: 0 total fills in shared programs: 34747 -> 35351 (1.74%) fills in affected programs: 11013 -> 11617 (5.48%) helped: 3 HURT: 347 Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956420 -> 11956126 (<.01%) instructions in affected programs: 14898 -> 14604 (-1.97%) helped: 98 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -2.18% -1.98% Instructions are helped. total cycles in shared programs: 178791217 -> 178790792 (<.01%) cycles in affected programs: 149763 -> 149338 (-0.28%) helped: 91 HURT: 7 helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16 helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18% HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322 HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41% 95% mean confidence interval for cycles value: -18.94 10.27 95% mean confidence interval for cycles %-change: -1.28% 0.49% Inconclusive result (value mean confidence interval includes 0).
* nir/algebraic: Do not apply late DPH optimization in vertex processing stagesIan Romanick2019-09-191-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders do not use 'invariant' in vertex and (possibly) geometry shader stages on some outputs that are intended to be invariant. For various reasons, this optimization may not be fully applied in all shaders used for different rendering passes of the same geometry. This can result in Z-fighting artifacts (at best). For now, disable this optimization in these stages. In tessellation stages applications seem to use 'precise' when necessary, so allow the optimization in those stages. Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490 Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16194726 -> 16344745 (0.93%) instructions in affected programs: 2855172 -> 3005191 (5.25%) helped: 6 HURT: 20279 helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44% HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7 HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56% 95% mean confidence interval for instructions value: 7.34 7.45 95% mean confidence interval for instructions %-change: 8.48% 8.67% Instructions are HURT. total cycles in shared programs: 364471296 -> 365014683 (0.15%) cycles in affected programs: 32421530 -> 32964917 (1.68%) helped: 2925 HURT: 16144 helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5 helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15% HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15 HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87% 95% mean confidence interval for cycles value: 21.58 35.41 95% mean confidence interval for cycles %-change: 4.36% 4.52% Cycles are HURT.
* Move blob from compiler/ to util/Jason Ekstrand2019-09-191-1/+1
| | | | | | | | There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/algebraic: refactor inexact opcode restrictionsSamuel Iglesias Gonsálvez2019-09-191-3/+5
| | | | | | | | | | Refactor the code to avoid calling a lot of time to auxiliary functions when it is not really needed. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir/opt_if: Fix undef handling in opt_split_alu_of_phi()Connor Abbott2019-09-181-55/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pass assumed that "Most ALU ops produce an undefined result if any source is undef" which is completely untrue. Due to how we lower if statements to selects and then optimize on those selects later, we simply cannot make that assumption. In particular this pass tried to replace an ior of undef and true, which had been generated by optimizing a select which itself came from flattening an if statement, to undef causing a miscompilation for a CTS test with radeonsi NIR. We fix this by always doing what the non-undef path did, i.e. duplicate the instruction twice. If there are cases where the instruction before the loop can be folded away due to having an undef source, we should add these to opt_undef instead. The comment above the pass says that if the phi source from before the loop is undef, and we can fold the instruction before the loop to undef, then we can ignore sources of the original instruction that don't dominate the block before the loop because we don't need them to create the instruction before the loop. This is incorrect, because the instruction at the bottom of the loop would get those sources from the wrong loop iteration. The code never actually did what the comment said, so we only have to update the comment to match what the pass actually does. We also update the example to more closely match what most actual loops look like after vtn and peephole_select. There are no shader-db changes with i965, radeonsi NIR, or radv. With anv and my vkpipeline-db there's only one change: total instructions in shared programs: 14125290 -> 14125300 (<.01%) instructions in affected programs: 2598 -> 2608 (0.38%) helped: 0 HURT: 1 total cycles in shared programs: 2051473437 -> 2051473397 (<.01%) cycles in affected programs: 36697 -> 36657 (-0.11%) helped: 1 HURT: 0 Fixes KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input with radeonsi NIR.
* nir/opcodes: Clear variable names confusionAndres Gomez2019-09-181-10/+15
| | | | | | | | | | | Having Python and C variables sharing name in the same block of code makes its understanding a bit confusing. Make it explicit that the Python bit_size variable refers to the destination bit size. Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: fix fmin/fmax support for doublesSamuel Iglesias Gonsálvez2019-09-171-2/+2
| | | | | | | | Until now, it was using the floating point version of fmin/fmax, instead of the double version. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: fix denorm flush-to-zero in sqrt's lowering at nir_lower_double_opsSamuel Iglesias Gonsálvez2019-09-171-2/+15
| | | | | | | | | | | | | v2: - Replace hard coded value with DBL_MIN (Connor). v3: - Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64 flag (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v2]
* nir: fix denorms in unpack_half_1x16()Samuel Iglesias Gonsálvez2019-09-173-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | According to VK_KHR_shader_float_controls: "Denormalized values obtained via unpacking an integer into a vector of values with smaller bit width and interpreting those values as floating-point numbers must: be flushed to zero, unless the entry point is declared with the code:DenormPreserve execution mode." v2: - Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor). v3: - Adapt to use the new NIR lowering framework (Andres). v4: - Updated to renamed shader info member and enum values (Andres). v5: - Simplify flags logic operations (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v2]
* nir/algebraic: disable inexact optimizations depending on float controls ↵Samuel Iglesias Gonsálvez2019-09-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | execution mode If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the inexact optimizations so the VK_KHR_shader_float_controls execution mode is respected. v2: - Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is enabled (Andres). v3: - Updated to renamed shader info member (Andres). v4: - Directly access execution mode instead of dragging it by parameter (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v1]
* nir/algebraic: mark float optimizations returning one parameter as inexactAndres Gomez2019-09-171-8/+8
| | | | | | | | | | | | | | | With the arrival of VK_KHR_shader_float_controls algebraic optimizations for float types of the form (('fop', a, b), a) become inexact depending on the execution mode. For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case of a denorm value for the "a" parameter, we cannot return it still as a denorm, it needs to be flushed to zero. Therefore, we mark now all those operations as inexact. Suggested-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/constant_expressions: mind rounding mode converting from float to ↵Samuel Iglesias Gonsálvez2019-09-171-2/+10
| | | | | | | | | | | | float16 destinations v2: - Move the op-code specific knowledge to nir_opcodes.py even if it means a rount trip conversion (Connor). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/opcodes: make sure f2f16_rtz and f2f16_rtne behavior is not overriden by ↵Samuel Iglesias Gonsálvez2019-09-171-1/+20
| | | | | | | | | the float controls execution mode Suggested-by: Connor Abbott <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: mind rounding mode on fadd, fsub, fmul and fma opcodesSamuel Iglesias Gonsálvez2019-09-172-4/+46
| | | | | | | | | | | | | | | | | | | | | | According to Vulkan spec, the new execution modes affect only correctly rounded SPIR-V instructions, which includes fadd, fsub and fmul. v2: - Fix fmul, fsub and fadd round-to-zero definitions, they should use auxiliary functions to calculate the proper value because Mesa uses round-to-nearest-even rounding mode by default (Connor). v3: - Do an actual fused multiply-add at ffma (Connor). v4: - Simplify fadd and fmul for bit sizes < 64 (Connor). - Do not use double ffma for 32 bits float (Connor). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v3]
* nir: add support for round to zero rounding mode to nir_op_f2f32Samuel Iglesias Gonsálvez2019-09-172-0/+11
| | | | | | | | | f2f16's rounding modes are already handled and f2f64 don't need it as there is not a floating point type with higher bit size than 64 for now. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: add support for flushing to zero denorm constantsSamuel Iglesias Gonsálvez2019-09-174-40/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: - Refactor conditions and shared function (Connor). - Move code to nir_eval_const_opcode() (Connor). - Don't flush to zero on fquantize2f16 From Vulkan spec, VK_KHR_shader_float_controls section: "3) Do denorm and rounding mode controls apply to OpSpecConstantOp? RESOLVED: Yes, except when the opcode is OpQuantizeToF16." v3: - Fix bit size (Connor). - Fix execution mode on nir_loop_analize (Connor). v4: - Adapt after API changes to nir_eval_const_opcode (Andres). v5: - Simplify constant_denorm_flush_to_zero (Caio). v6: - Adapt after API changes and to use the new constant constructors (Andres). - Replace MAYBE_UNUSED with UNUSED as the first is going away (Andres). v7: - Adapt to newly added calls (Andres). - Simplified the auxiliary to flush denorms to zero (Caio). - Updated to renamed supported capabilities member (Andres). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v4] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: add auxiliary functions to detect if a mode is enabledSamuel Iglesias Gonsálvez2019-09-171-0/+81
| | | | | | | | | | | | | | | | v2: - Added more functions. v3: - Simplify most of the functions (Caio). v4: - Updated to renamed enum values (Andres). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Connor Abbott <[email protected]> [v2] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> [v3]
* nir/large_constants: pass after lowering copy_derefSergii Romantsov2019-09-161-25/+2
| | | | | | | | | | | v2: by J.Ekstrand suggestion moved lowering of large constants after lowering of copy_deref is done. CC: Jason Ekstrand <[email protected]> CC: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450 Signed-off-by: Sergii Romantsov <[email protected]>
* nir/large_constants: more careful data copyingSergii Romantsov2019-09-161-1/+1
| | | | | | | | | | | | | | | | | A filed of nir_variable.location may be equel to -1. That may cause copying to invalid address of list-node, making some internal fields corrupted. Patch fixes segfault during freeing context due to corrupted address of ralloc_header.destructor. v2: copy data if var is constant (Connor Abbott) CC: Caio Marcelo de Oliveira Filho <[email protected]> Fixes: b6d475356846 (nir/large_constants: De-duplicate constants) Signed-off-by: Sergii Romantsov <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676 Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_point_size: assume scalar PSIZIago Toral Quiroga2019-09-121-14/+3
| | | | | Reviewed-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/dead_cf: Repair SSA if the pass makes progressJason Ekstrand2019-09-061-2/+13
| | | | | | | | | | | | | | | | | | | | The dead_cf pass calls into the CF manipulation helpers which attempt to keep NIR's SSA form sane. However, when the only break is removed from a loop, dominance gets messed up anyway because the CF SSA clean-up code only looks at phis and doesn't consider the case of code becoming unreachable. One solution to this would be to put the loop into LCSSA form before we modify any of its contents. Another (and the approach taken by this pass) is to just run the repair_ssa pass afterwards because the CF manipulation helpers are smart enough to keep all the use/def stuff sane; they just don't always preserve dominance properties. While we're here, we clean up some bogus indentation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111405 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111069 Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/repair_ssa: Insert deref casts when neededJason Ekstrand2019-09-061-2/+29
| | | | | Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/repair_ssa: Repair dominance for unreachable blocksJason Ekstrand2019-09-061-4/+8
| | | | | | | | | | | | NIR currently assumes that unreachable blocks are trivially dominated by everything. However, when considering well-formed SSA, there is no path from any block to an unreachable block. Therefore, we can break any use-def chains where the use is in an unreachable block. This removes any dependencies on code created by uses in unreachable blocks and lets DCE do a better job of cleaning it up. Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a block_is_unreachable helperJason Ekstrand2019-09-062-0/+15
| | | | | Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Don't infinitely recurse in lower_ssa_defs_to_regs_blockJason Ekstrand2019-09-061-5/+15
| | | | | Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Handle complex derefs in nir_split_array_varsJason Ekstrand2019-09-061-2/+5
| | | | | | | | We already bail and don't split the vars but we were passing a NULL to _mesa_hash_table_search which is not allowed. Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..." Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>