summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir
Commit message (Collapse)AuthorAgeFilesLines
* nir/format_convert: Add [us]norm conversion helpersJason Ekstrand2018-08-291-0/+56
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/format_convert: Rename nir_format_bitcast_uint_vecJason Ekstrand2018-08-291-2/+3
| | | | | | | | We have a name for that, it's called a uvec. This just makes the function name a bit shorter. While we're here, we also add an assert for one of the assumptions this function makes. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/format_convert: Add vec mask and sign-extend helpersJason Ekstrand2018-08-291-8/+27
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/format_convert: Add support for unpacking signed integersJason Ekstrand2018-08-291-8/+29
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/opcodes: Make unpack_half_2x16_split_* variable-widthJason Ekstrand2018-08-291-4/+4
| | | | | | | There is nothing inherent about these opcodes that requires them to only take scalars. It's very convenient if we let them take vectors as well. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Add some max/min optimizationsJason Ekstrand2018-08-291-0/+6
| | | | | | | | | | | | | | | Found by inspection. This doesn't help much now but we'll see this pattern with images if you load UNORM and then store UNORM. Shader-db results on Kaby Lake: total instructions in shared programs: 15166916 -> 15166910 (<.01%) instructions in affected programs: 761 -> 755 (-0.79%) helped: 6 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Add more extract_[iu](8|16) optimizationsJason Ekstrand2018-08-291-0/+10
| | | | | | | | | | | | | | | | This adds the "(a << N) >> M" family of mask or sign-extensions. Not a huge win right now but this pattern will soon be generated by NIR format lowering code. Shader-db results on Kaby Lake: total instructions in shared programs: 15166918 -> 15166916 (<.01%) instructions in affected programs: 36 -> 34 (-5.56%) helped: 2 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Be more careful converting ushr to extract_u8/16Jason Ekstrand2018-08-291-2/+2
| | | | | | | | | | If it's not the right bit-size, it may not actually be the correct extraction. For now, we'll only worry about 32-bit versions. Fixes: 905ff8619824 "nir: Recognize open-coded extract_u16" Fixes: 76289fbfa84a "nir: Recognize open-coded extract_u8" Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: add loop unroll support for wrapper loopsTimothy Arceri2018-08-291-0/+77
| | | | | | | | | | | | | | | | | | | | This adds support for unrolling the classic do { // ... } while (false) that is used to wrap multi-line macros. GLSL IR also wraps switch statements in a loop like this. shader-db results IVB: total loops in shared programs: 2515 -> 2512 (-0.12%) loops in affected programs: 33 -> 30 (-9.09%) helped: 3 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_loop_unroll: Remove unneeded phis if we make progressTimothy Arceri2018-08-291-1/+9
| | | | | | | | | | | Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This avoids validation issues with some CTS tests with the following patch, but its possible this we could also see the same problem with the existing unrolling passes. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add complex_loop bool to loop infoTimothy Arceri2018-08-292-2/+12
| | | | | | | | | | | In order to be sure loop_terminator_list is an accurate representation of all the jumps in the loop we need to be sure we didn't encounter any other complex behaviour such as continues, nested breaks, etc during analysis. This will be used in the following patch. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: always attempt to find loop terminatorsTimothy Arceri2018-08-291-7/+7
| | | | | | | This will help later patches with unrolling loops that end with a break i.e. loops the always exit on their first interation. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Remove outdated commentCaio Marcelo de Oliveira Filho2018-08-281-3/+0
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_orderingKevin Rogovin2018-08-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This extension provides new GLSL built-in function beginFragmentShaderOrderingIntel() that guarantees (taking wording of GL_INTEL_fragment_shader_ordering extension) that any memory transactions issued by shader invocations from previous primitives mapped to same xy window coordinates (and same sample when per-sample shading is active), complete and are visible to the shader invocation that called beginFragmentShaderOrderingINTEL(). One advantage of INTEL_fragment_shader_ordering over ARB_fragment_shader_interlock is that it provides a function that operates as a memory barrie (instead of a defining a critcial section) that can be called under arbitary control flow from any function (in contrast the begin/end of ARB_fragment_shader_interlock may only be called once, from main(), under no control flow. Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
* nir: Pull block_ends_in_jump into nir.hJason Ekstrand2018-08-273-23/+13
| | | | | | | We had two different implementations in different files. May as well have one and put it in nir.h. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add an array copy optimizationJason Ekstrand2018-08-233-0/+414
| | | | | | | | | | | | This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a array-of-vector variable shrinking passJason Ekstrand2018-08-232-0/+718
| | | | | | | This pass looks for variables with vector or array-of-vector types and narrows the type to only the components used. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add an array splitting passJason Ekstrand2018-08-232-0/+584
| | | | | | | | | | | | | | | | | | | | | | | This pass looks for array variables where at least one level of the array is never indirected and splits it into multiple smaller variables. This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through arrays of arrays and can detect indirects on just one level or even see that arr[i][0][5] does not alias arr[i][1][j]. This pass exists to help other passes more easily see through arrays of arrays. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. v2 (Jason Ekstrand): - Better comments and naming (some from Caio) - Rework to use one hash map instead of two v2.1 (Jason Ekstrand): - Fix a couple of bugs that were added in the rework including one which basically prevented it from running Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a structure splitting passJason Ekstrand2018-08-233-0/+277
| | | | | | | | | | | This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through structures and considers them to be "split". This pass exists to help other passes more easily see through structure variables. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add floating point atomic min, max, and compare-swap instrinsicsIan Romanick2018-08-223-2/+24
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add floating point atomic add instrinsicsIan Romanick2018-08-223-0/+8
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Give end_block its own indexCaio Marcelo de Oliveira Filho2018-08-221-1/+4
| | | | | | | | | | | Since there's no particular reason for the index to be 0, choose an index that is not used by other block. This is convenient when we store "per-block" data in an array AND look for the successors data (e.g. any kind of backwards data-flow analysis). v2: Add a note about end_block's index. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Skip common instructions when comparing deref pathsCaio Marcelo de Oliveira Filho2018-08-221-0/+3
| | | | | | | | | | | | | | | | | | | Deref paths may share the same deref instructions in their chains, e.g. ssa_100 = deref_var A ssa_101 = deref_struct "array_field" of ssa_100 ssa_102 = deref_array "[1]" of ssa_101 ssa_103 = deref_struct "field_a" of ssa_102 ssa_104 = deref_struct "field_a" of ssa_103 when comparing the two last deref instructions, their paths will share a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next iteration if the deref instructions are the same. Path[0] (the var) is still handled specially, so in the case above, only ssa_101 and ssa_102 will be skipped. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Export deref comparison functionsCaio Marcelo de Oliveira Filho2018-08-223-132/+132
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/vars_to_ssa: Don't build deref nodes for non-local variablesJason Ekstrand2018-08-221-4/+14
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: mark *prev_block as MAYBE_UNUSED in opt_peel_loop_initial_ifKai Wasserbäch2018-08-181-1/+1
| | | | | | | | | | | | | Only used, when asserts are enabled. Fixes an unused-variable warning with gcc-8: ../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if': ../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable] nir_block *prev_block = ^~~~~~~~~~ Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: allow more nested loops to be unrolledTimothy Arceri2018-08-181-14/+17
| | | | | | | | | | | | | | | | | | | | The innermost check was added to stop us from unrolling multiple loops in a single pass, and to stop outer loops from unrolling. When we successfully unroll a loop we need to run the analysis pass again before deciding if we want to go ahead an unroll a second loop. However the logic was flawed because it never tried to unroll any nested loops other than the first innermost loop it found. If this innermost loop is not unrolled we end up skipping all other nested loops. This unrolls a loop in a Deus Ex: MD shader on ultra settings and also unrolls a loop in a shader from the game Prey when running on DXVK. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/glsl: make nir_remap_attributes publicAlejandro Piñeiro2018-08-132-0/+27
| | | | | | As we plan to reuse it for ARB_gl_spirv implementation. Reviewed-by: Timothy Arceri <[email protected]>
* nir: add how_declared to nir_variable.dataAlejandro Piñeiro2018-08-133-1/+26
| | | | | | | | | Equivalent to the already existing how_declared at GLSL IR. The only difference is that we are not adding all the declaration_type available on GLSL, only the one that we will use on the short term. We would add more mode if needed on the future. Reviewed-by: Timothy Arceri <[email protected]>
* meson: Build with Python 3Mathieu Bridon2018-08-101-7/+7
| | | | | | | | | | | | Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* python: Better check for integer typesMathieu Bridon2018-08-091-3/+5
| | | | | | | | | | | | Python 3 lost the long type: now everything is an int, with the right size. This commit makes the script compatible with Python 2 (where we check for both int and long) and Python 3 (where we only check for int). Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* python: Do not mix bytes and unicode stringsMathieu Bridon2018-08-091-1/+10
| | | | | | | | | | | | Mixing the two is a long-standing recipe for errors in Python 2, so much so that Python 3 now completely separates them. This commit stops treating both as if they were the same, and in the process makes the script compatible with both Python 2 and 3. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* python: Use the right function for the jobMathieu Bridon2018-08-091-1/+1
| | | | | | | | | | The code was just reimplementing itertools.combinations_with_replacement in a less efficient way. This does change the order of the results slightly, but it should be ok. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* python: Specify the template output encodingMathieu Bridon2018-08-072-2/+2
| | | | | | | | | | | | | | | We're trying to write a unicode string (i.e decoded) to a file opened in binary (i.e encoded) mode. In Python 2 this works, because of the automatic conversion between byte and unicode strings. In Python 3 this fails though, as no automatic conversion is attempted. This change makes the scripts compatible with both versions of Python. Signed-off-by: Mathieu Bridon <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir: Transform expressions of b2f(a) and b2f(b) to a == bIan Romanick2018-08-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276886 -> 14276838 (<.01%) instructions in affected programs: 312 -> 264 (-15.38%) helped: 2 HURT: 0 total cycles in shared programs: 532578395 -> 532570985 (<.01%) cycles in affected programs: 682562 -> 675152 (-1.09%) helped: 374 HURT: 4 helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18 helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28% HURT stats (abs) min: 2 max: 114 x̄: 53.50 x̃: 49 HURT stats (rel) min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15% 95% mean confidence interval for cycles value: -21.30 -17.91 95% mean confidence interval for cycles %-change: -1.30% -1.06% Cycles are helped. Sandy Bridge total instructions in shared programs: 10488123 -> 10488075 (<.01%) instructions in affected programs: 336 -> 288 (-14.29%) helped: 2 HURT: 0 total cycles in shared programs: 150260379 -> 150260439 (<.01%) cycles in affected programs: 4726 -> 4786 (1.27%) helped: 0 HURT: 2 No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform expressions of b2f(a) and b2f(b) to a ^^ bIan Romanick2018-08-041-0/+3
| | | | | | | | | | | | | | | | All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276892 -> 14276886 (<.01%) instructions in affected programs: 484 -> 478 (-1.24%) helped: 2 HURT: 0 total cycles in shared programs: 532578397 -> 532578395 (<.01%) cycles in affected programs: 3522 -> 3520 (-0.06%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform expressions of b2f(a) and b2f(b) to !(a && b)Ian Romanick2018-08-041-0/+3
| | | | | | | | | | | | | | | | | All Gen platforms had pretty similar results. (Skylake shown) total cycles in shared programs: 532578400 -> 532578397 (<.01%) cycles in affected programs: 2784 -> 2781 (-0.11%) helped: 1 HURT: 1 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform expressions of b2f(a) and b2f(b) to a && bIan Romanick2018-08-041-0/+3
| | | | | | | | | No changes on any Gen platform. v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform expressions of b2f(a) and b2f(b) to !(a || b)Ian Romanick2018-08-041-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276961 -> 14276892 (<.01%) instructions in affected programs: 3215 -> 3146 (-2.15%) helped: 28 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92% 95% mean confidence interval for instructions value: -2.87 -2.06 95% mean confidence interval for instructions %-change: -5.73% -2.95% Instructions are helped. total cycles in shared programs: 532577068 -> 532578400 (<.01%) cycles in affected programs: 121864 -> 123196 (1.09%) helped: 35 HURT: 30 helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22 helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86% HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36 HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58% 95% mean confidence interval for cycles value: -5.02 46.01 95% mean confidence interval for cycles %-change: -0.99% 1.65% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781299 -> 7781342 (<.01%) instructions in affected programs: 22300 -> 22343 (0.19%) helped: 13 HURT: 40 helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3 helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43% 95% mean confidence interval for instructions value: 0.23 1.39 95% mean confidence interval for instructions %-change: -1.18% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 177878928 -> 177879332 (<.01%) cycles in affected programs: 383298 -> 383702 (0.11%) helped: 7 HURT: 43 helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10 helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40% HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12 HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09% 95% mean confidence interval for cycles value: 5.21 10.95 95% mean confidence interval for cycles %-change: -0.51% 0.21% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform -fabs(a) >= 0 to a == 0Ian Romanick2018-08-041-0/+9
| | | | | | | | | | | | | | | | | | | | | | All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276964 -> 14276961 (<.01%) instructions in affected programs: 411 -> 408 (-0.73%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68% total cycles in shared programs: 532577062 -> 532577068 (<.01%) cycles in affected programs: 1093 -> 1099 (0.55%) helped: 1 HURT: 1 helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77% HURT stats (abs) min: 22 max: 22 x̄: 22.00 x̃: 22 HURT stats (rel) min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48% Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform expressions of b2f(a) and b2f(b) to a || bIan Romanick2018-08-041-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Transform -fabs(a) < 0 to a != 0Ian Romanick2018-08-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if a is NaN. All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277216 -> 14277184 (<.01%) instructions in affected programs: 2300 -> 2268 (-1.39%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3 helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01% 95% mean confidence interval for instructions value: -6.45 -1.55 95% mean confidence interval for instructions %-change: -9.96% 1.13% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 532577848 -> 532577990 (<.01%) cycles in affected programs: 17486 -> 17628 (0.81%) helped: 2 HURT: 5 helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93% HURT stats (abs) min: 6 max: 50 x̄: 30.00 x̃: 26 HURT stats (rel) min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02% 95% mean confidence interval for cycles value: -1.06 41.63 95% mean confidence interval for cycles %-change: -0.58% 1.74% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Rearrange bcsel with two bcsel sourcesIan Romanick2018-08-041-0/+4
| | | | | | | | | | | | | | | | All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277220 -> 14277216 (<.01%) instructions in affected programs: 422 -> 418 (-0.95%) helped: 2 HURT: 0 total cycles in shared programs: 532577908 -> 532577848 (<.01%) cycles in affected programs: 2800 -> 2740 (-2.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Collapse more repeated bcsels on the same argumentIan Romanick2018-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277230 -> 14277220 (<.01%) instructions in affected programs: 751 -> 741 (-1.33%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32% 95% mean confidence interval for instructions value: -3.42 -1.58 95% mean confidence interval for instructions %-change: -1.47% -1.17% Instructions are helped. total cycles in shared programs: 532577947 -> 532577908 (<.01%) cycles in affected programs: 10641 -> 10602 (-0.37%) helped: 4 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7 helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60% HURT stats (abs) min: 2 max: 8 x̄: 5.33 x̃: 6 HURT stats (rel) min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23% 95% mean confidence interval for cycles value: -20.69 9.55 95% mean confidence interval for cycles %-change: -1.63% 0.63% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Don't compare i2f or u2i with zeroIan Romanick2018-08-041-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277620 -> 14277230 (<.01%) instructions in affected programs: 36905 -> 36515 (-1.06%) helped: 101 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6 helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51% HURT stats (abs) min: 1 max: 28 x̄: 10.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47% 95% mean confidence interval for instructions value: -4.59 -2.70 95% mean confidence interval for instructions %-change: -1.90% -1.41% Instructions are helped. total cycles in shared programs: 532580716 -> 532577947 (<.01%) cycles in affected programs: 940575 -> 937806 (-0.29%) helped: 92 HURT: 12 helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62 helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41% HURT stats (abs) min: 10 max: 1112 x̄: 160.58 x̃: 63 HURT stats (rel) min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20% 95% mean confidence interval for cycles value: -50.66 -2.59 95% mean confidence interval for cycles %-change: -2.09% -0.73% Cycles are helped. total spills in shared programs: 8116 -> 8124 (0.10%) spills in affected programs: 200 -> 208 (4.00%) helped: 0 HURT: 2 total fills in shared programs: 11086 -> 11094 (0.07%) fills in affected programs: 436 -> 444 (1.83%) helped: 0 HURT: 2 Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12979054 -> 12978067 (<.01%) instructions in affected programs: 33633 -> 32646 (-2.93%) helped: 120 HURT: 2 helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13 helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17% HURT stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 HURT stats (rel) min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00% 95% mean confidence interval for instructions value: -9.19 -6.99 95% mean confidence interval for instructions %-change: -5.27% -3.62% Instructions are helped. total cycles in shared programs: 411212880 -> 411199636 (<.01%) cycles in affected programs: 696441 -> 683197 (-1.90%) helped: 107 HURT: 5 helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146 helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88% HURT stats (abs) min: 2 max: 50 x̄: 24.00 x̃: 22 HURT stats (rel) min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25% 95% mean confidence interval for cycles value: -136.96 -99.54 95% mean confidence interval for cycles %-change: -9.75% -6.53% Cycles are helped. total spills in shared programs: 78623 -> 78631 (0.01%) spills in affected programs: 66 -> 74 (12.12%) helped: 0 HURT: 2 total fills in shared programs: 80104 -> 80108 (<.01%) fills in affected programs: 133 -> 137 (3.01%) helped: 0 HURT: 2 No changes on Sandy Bridge, Iron Lake, or GM45. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Remove f2i(i2f(x)) conversionsIan Romanick2018-08-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277978 -> 14277620 (<.01%) instructions in affected programs: 36957 -> 36599 (-0.97%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4 helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36% 95% mean confidence interval for instructions value: -7.06 -2.24 95% mean confidence interval for instructions %-change: -1.28% -0.77% Instructions are helped. total cycles in shared programs: 532584581 -> 532580716 (<.01%) cycles in affected programs: 973591 -> 969726 (-0.40%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32 helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19% HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280 HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10% 95% mean confidence interval for cycles value: -386.98 286.59 95% mean confidence interval for cycles %-change: -1.41% -0.81% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8127 -> 8116 (-0.14%) spills in affected programs: 108 -> 97 (-10.19%) helped: 1 HURT: 0 total fills in shared programs: 11090 -> 11086 (-0.04%) fills in affected programs: 440 -> 436 (-0.91%) helped: 1 HURT: 1 Haswell total instructions in shared programs: 12979174 -> 12979054 (<.01%) instructions in affected programs: 9040 -> 8920 (-1.33%) helped: 14 HURT: 1 helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6 helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19% 95% mean confidence interval for instructions value: -13.58 -2.42 95% mean confidence interval for instructions %-change: -3.94% -1.01% Instructions are helped. total cycles in shared programs: 411227148 -> 411212880 (<.01%) cycles in affected programs: 630506 -> 616238 (-2.26%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38 helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17% 95% mean confidence interval for cycles value: -2544.28 641.88 95% mean confidence interval for cycles %-change: -6.89% -0.94% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78626 -> 78623 (<.01%) spills in affected programs: 42 -> 39 (-7.14%) helped: 1 HURT: 0 total fills in shared programs: 80111 -> 80104 (<.01%) fills in affected programs: 140 -> 133 (-5.00%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11684101 -> 11684030 (<.01%) instructions in affected programs: 3080 -> 3009 (-2.31%) helped: 4 HURT: 1 helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5 helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for instructions value: -45.59 17.19 95% mean confidence interval for instructions %-change: -9.38% -1.56% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 258407697 -> 258389653 (<.01%) cycles in affected programs: 328323 -> 310279 (-5.50%) helped: 5 HURT: 0 helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32 helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60% 95% mean confidence interval for cycles value: -11616.71 4399.11 95% mean confidence interval for cycles %-change: -16.56% -2.03% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4537 -> 4528 (-0.20%) spills in affected programs: 64 -> 55 (-14.06%) helped: 1 HURT: 0 total fills in shared programs: 4823 -> 4815 (-0.17%) fills in affected programs: 189 -> 181 (-4.23%) helped: 1 HURT: 1 Sandy Bridge total instructions in shared programs: 10488464 -> 10488449 (<.01%) instructions in affected programs: 272 -> 257 (-5.51%) helped: 3 HURT: 0 helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49% total cycles in shared programs: 150263359 -> 150263263 (<.01%) cycles in affected programs: 7978 -> 7882 (-1.20%) helped: 3 HURT: 0 helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32 helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: Mark the 0.0 < abs(a) transformation as impreciseIan Romanick2018-08-041-1/+1
| | | | | | | | | | Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if the source is NaN. No shader-db changes on any platform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* nir: add fall through comment to nir_gather_infoTimothy Arceri2018-08-031-0/+1
| | | | | | | This stops Coverity reporting a defect and helps make the code less error-prone. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_indirect: Bail early if modes == 0Jason Ekstrand2018-08-011-0/+3
| | | | | | | There's no point in walking the program if we're never going to actually lower anything. Reviewed-by: Timothy Arceri <[email protected]>
* nir/meson: fix c vs cpp args for nir testDylan Baker2018-08-011-1/+1
| | | | | | | Fixes: d1992255bb29054fa51763376d125183a9f602f3 ("meson: Add build Intel "anv" vulkan driver") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>