aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
...
* android: anv: fix libexpat shared dependencyMauro Rossi2019-03-041-1/+1
| | | | | | | | Fixes undefined reference building errors for XML_* functions Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: "19.0" <[email protected]>
* android: anv: fix generated files depedencies (v2)Mauro Rossi2019-03-041-15/+25
| | | | | | | | | | | | | | | | | Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies Rename the variable labels according to targets and python scripts Align the building rules as per Automake for simplification Fixes building errors during rebuils due to missing dependencies (v2) Fixed a missing $(VULKAN_API_XML) reference Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build") Fixes: dd088d4bec7 ("anv/extensions: Generate a header file with extension tables") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Cc: "19.0" <[email protected]>
* intel/compiler: Move int64/doubles lowering optionsJordan Justen2019-03-022-34/+39
| | | | | | | | | | Instead of calculating the int64 and doubles lowering options each time a shader is preprocessed, save and use the values in nir_shader_compiler_options. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Don't assert on b2f with a saturate modifierIan Romanick2019-03-021-1/+3
| | | | | | | | This ran afoul of Iris's use of nir_lower_clamp_color_outputs which applies fsat() before writes to vertex shader color outpus. Reviewed-by: Kenneth Graunke <[email protected]> Fixes: 7725d609387 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
* anv: add support for INTEL_DEBUG=batLionel Landwerlin2019-03-024-2/+93
| | | | | | | | | | | | | | | | | As requested by Ken ;) v2: Also decode simple batches (Caio) Fix u_vector usage issues (Lionel) v3: Make binding/instruction/state/surface available (Lionel) v4: Going through device pools for simple batches (Lionel) Centralize search BO callbacks into anv_device.c (Lionel) v5: Clear decoded batch buffer var after use (Caio) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Add commas on final values of compaction table arraysMatt Turner2019-03-011-15/+15
| | | | Reviewed-by: Jordan Justen <[email protected]>
* intel/fs: Generate if instructions with inverted conditionsIan Romanick2019-03-011-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per-platform results were all over the place, so I have included all the results here. There is an important note at the bottom of the commit message. Skylake total instructions in shared programs: 15184683 -> 15184679 (<.01%) instructions in affected programs: 2786 -> 2782 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 370961367 -> 370961173 (<.01%) cycles in affected programs: 205867 -> 205673 (-0.09%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16 helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -93.01 28.34 95% mean confidence interval for cycles %-change: -0.82% 0.08% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 15465366 -> 15465362 (<.01%) instructions in affected programs: 2799 -> 2795 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 410938419 -> 410938531 (<.01%) cycles in affected programs: 566028 -> 566140 (0.02%) helped: 18 HURT: 17 helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1 helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01% HURT stats (abs) min: 1 max: 12 x̄: 10.29 x̃: 12 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09% 95% mean confidence interval for cycles value: 0.31 6.09 95% mean confidence interval for cycles %-change: -0.10% 0.05% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13749760 -> 13749759 (<.01%) instructions in affected programs: 2241 -> 2240 (-0.04%) helped: 1 HURT: 0 total cycles in shared programs: 385398913 -> 385398363 (<.01%) cycles in affected programs: 554914 -> 554364 (-0.10%) helped: 31 HURT: 1 helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6 helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -45.88 11.51 95% mean confidence interval for cycles %-change: -0.05% -0.02% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total cycles in shared programs: 180663626 -> 180663881 (<.01%) cycles in affected programs: 472350 -> 472605 (0.05%) helped: 15 HURT: 30 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 8 max: 10 x̄: 9.00 x̃: 9 HURT stats (rel) min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: 4.21 7.12 95% mean confidence interval for cycles %-change: 0.05% 0.08% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154568664 -> 154569225 (<.01%) cycles in affected programs: 356486 -> 357047 (0.16%) helped: 1 HURT: 31 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02% HURT stats (abs) min: 4 max: 33 x̄: 18.16 x̃: 8 HURT stats (rel) min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for cycles value: 12.19 22.87 95% mean confidence interval for cycles %-change: 0.10% 0.16% Cycles are HURT. Iron Lake total instructions in shared programs: 8206589 -> 8206565 (<.01%) instructions in affected programs: 3024 -> 3000 (-0.79%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.82% -0.77% Instructions are helped. total cycles in shared programs: 187657428 -> 187656228 (<.01%) cycles in affected programs: 95748 -> 94548 (-1.25%) helped: 12 HURT: 0 helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100 helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21% 95% mean confidence interval for cycles value: -113.27 -86.73 95% mean confidence interval for cycles %-change: -1.43% -1.11% Cycles are helped. GM45 total instructions in shared programs: 5037569 -> 5037557 (<.01%) instructions in affected programs: 1521 -> 1509 (-0.79%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.83% -0.75% Instructions are helped. total cycles in shared programs: 128101478 -> 128100758 (<.01%) cycles in affected programs: 52746 -> 52026 (-1.37%) helped: 6 HURT: 0 helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120 helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41% 95% mean confidence interval for cycles value: -120.00 -120.00 95% mean confidence interval for cycles %-change: -1.70% -1.12% Cycles are helped. This change has almost no effect right now. However, removing this patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071022 -> 15089710 (0.12%) instructions in affected programs: 1022219 -> 1040907 (1.83%) helped: 1 HURT: 3937 helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41 helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01% HURT stats (abs) min: 1 max: 256 x̄: 4.76 x̃: 4 HURT stats (rel) min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60% 95% mean confidence interval for instructions value: 4.56 4.93 95% mean confidence interval for instructions %-change: 2.54% 2.64% Instructions are HURT. total cycles in shared programs: 369777134 -> 370092923 (0.09%) cycles in affected programs: 17516573 -> 17832362 (1.80%) helped: 115 HURT: 3624 helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28 helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65% HURT stats (abs) min: 1 max: 12640 x̄: 89.71 x̃: 54 HURT stats (rel) min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52% 95% mean confidence interval for cycles value: 75.21 93.71 95% mean confidence interval for cycles %-change: 4.43% 4.64% Cycles are HURT. total spills in shared programs: 9450 -> 9442 (-0.08%) spills in affected programs: 166 -> 158 (-4.82%) helped: 2 HURT: 0 total fills in shared programs: 21115 -> 21094 (-0.10%) fills in affected programs: 438 -> 417 (-4.79%) helped: 2 HURT: 0 LOST: 1 GAINED: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))Ian Romanick2019-03-012-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a)) maps -1 => 0.0 and 0 => 1.0. This is equivalent to 1.0 + float(boolBitsToInt(a)). On Intel GPUs, ADD is one of the few instructions that can type-convert during write to destination, so we can achieve this in a single instruction: add g47F, g26D, 1D v2: Fix swizzles. v3: Fix typos in comments. Noticed by Ken. All Gen6+ platforms had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185583 -> 15184683 (<.01%) instructions in affected programs: 239389 -> 238489 (-0.38%) helped: 899 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09% 95% mean confidence interval for instructions value: -1.01 -0.99 95% mean confidence interval for instructions %-change: -0.51% -0.48% Instructions are helped. total cycles in shared programs: 370964249 -> 370961508 (<.01%) cycles in affected programs: 1487586 -> 1484845 (-0.18%) helped: 420 HURT: 268 helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6 helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41% HURT stats (abs) min: 1 max: 230 x̄: 24.90 x̃: 10 HURT stats (rel) min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52% 95% mean confidence interval for cycles value: -7.61 -0.36 95% mean confidence interval for cycles %-change: -0.44% -0.02% Cycles are helped. No changes on Iron Lake or GM45. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+Ian Romanick2019-03-011-0/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of operands is free on Gen8+. v2: Fix swizzles. Fix types for cmod propagation. v3: Simplify logic for inverting source of inot(ixor(a, b)). Suggested by Ken. Skylake and Broadwell had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185593 -> 15185583 (<.01%) instructions in affected programs: 5673 -> 5663 (-0.18%) helped: 12 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -1.66 0.13 95% mean confidence interval for instructions %-change: -2.60% -0.15% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 370977726 -> 370964249 (<.01%) cycles in affected programs: 869987 -> 856510 (-1.55%) helped: 15 HURT: 2 helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16 helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53% HURT stats (abs) min: 14 max: 42 x̄: 28.00 x̃: 28 HURT stats (rel) min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13% 95% mean confidence interval for cycles value: -1654.87 69.34 95% mean confidence interval for cycles %-change: -2.29% -0.23% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Emit logical-not of operands on Gen8+Ian Romanick2019-03-012-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Gen8+ specifying negation of a logical operation such as AND actually performs a logical-not. Take advantage of this to generate fewer instructions. v2: Major rebase. Use nir_src_as_alu_instr. Fix swizzle handling. No changes on any pre-Gen8 platform. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15466902 -> 15466274 (<.01%) instructions in affected programs: 1262953 -> 1262325 (-0.05%) helped: 682 HURT: 4 helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04% HURT stats (abs) min: 1 max: 62 x̄: 17.50 x̃: 3 HURT stats (rel) min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10% 95% mean confidence interval for instructions value: -1.10 -0.73 95% mean confidence interval for instructions %-change: -0.19% -0.15% Instructions are helped. total cycles in shared programs: 410996093 -> 410950440 (-0.01%) cycles in affected programs: 144389048 -> 144343395 (-0.03%) helped: 519 HURT: 51 helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140 helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03% HURT stats (abs) min: 1 max: 4060 x̄: 167.90 x̃: 22 HURT stats (rel) min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25% 95% mean confidence interval for cycles value: -97.16 -63.02 95% mean confidence interval for cycles %-change: -0.32% -0.13% Cycles are helped. total spills in shared programs: 95311 -> 95329 (0.02%) spills in affected programs: 881 -> 899 (2.04%) helped: 0 HURT: 4 total fills in shared programs: 93629 -> 93634 (<.01%) fills in affected programs: 794 -> 799 (0.63%) helped: 1 HURT: 2 Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Refactor ALU source and destination handling to a separate functionIan Romanick2019-03-012-33/+60
| | | | | | | | | | | | | | Other places will need to do this soon to properly handle source swizzles. The patch looks a little odd, but the change is pretty straight forward. All of the swizzle and mask handling is moved out, but the code for handling move instructions and vecN instructions remains in nir_emit_alu. I'm not terribly pleased with the "need_dest" parameter, but get_nir_dest is (somewhat surprisingly) destructive. I am open to suggestions of alternatives. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Handle OR source modifiers in algebraic optimizationIan Romanick2019-03-011-1/+10
| | | | | | Found by inspection. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Relax type matching rules in cmod propagation from MOV instructionsIan Romanick2019-03-011-4/+19
| | | | | | | | | | | | | | | | | | | | To allow cmod propagation from a MOV in a sequence like: and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD mov.nz.f0(16) null<1>F g31<8,8,1>D A similar change to the vec4 backend had no effect. Somewhere between c1ec5820593 and 40fc4b5acd6 (1,094 commits) the effectiveness of this patch diminished, and as of commit d7e0d47b9de (nir: Add a bunch of b2[if] optimizations) this optimization no longer has any effect on any platform. A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+," generates some instruction sequences that require this change in order for cmod propagation to make progress. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/vec4: Emit constants for some ALU sources as immediate valuesIan Romanick2019-03-011-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases of flow control, the constant propagation is not able to determine that the source of an instruction must be a constant value. When we still have NIR SSA values, we can easily determine this. Emit the immediate value during code generation to possible avoid spurious loads of constants into registers. I wrote this patch to prevent a couple trivial regressions in vec4 shaders caused by "nir/algebraic: Replace i2b used by bcsel or if-statement with comparison". The final result was quite a bit better than that... No shader-db changes on any Gen8+ platform. v2: Assert that we never get a negation source modifier on Gen8+. Suggested by Ken. This should never happen because we don't normally use vec4 for Gen8+ (requires and environment variable to force it), and there's no code to generate these negations. Still, erring on the side of caution is better. Haswell total instructions in shared programs: 13776218 -> 13764783 (-0.08%) instructions in affected programs: 663931 -> 652496 (-1.72%) helped: 3495 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24% 95% mean confidence interval for instructions value: -3.39 -3.15 95% mean confidence interval for instructions %-change: -1.84% -1.75% Instructions are helped. total cycles in shared programs: 386818984 -> 386511910 (-0.08%) cycles in affected programs: 20379636 -> 20072562 (-1.51%) helped: 3052 HURT: 476 helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6 helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69% HURT stats (abs) min: 2 max: 416 x̄: 62.76 x̃: 24 HURT stats (rel) min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18% 95% mean confidence interval for cycles value: -115.57 -58.51 95% mean confidence interval for cycles %-change: -0.93% -0.73% Cycles are helped. total spills in shared programs: 100482 -> 100480 (<.01%) spills in affected programs: 79 -> 77 (-2.53%) helped: 3 HURT: 1 total fills in shared programs: 96883 -> 96877 (<.01%) fills in affected programs: 85 -> 79 (-7.06%) helped: 4 HURT: 0 Ivy Bridge total instructions in shared programs: 12000562 -> 11990113 (-0.09%) instructions in affected programs: 572581 -> 562132 (-1.82%) helped: 3106 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49% 95% mean confidence interval for instructions value: -3.49 -3.23 95% mean confidence interval for instructions %-change: -1.91% -1.81% Instructions are helped. total cycles in shared programs: 180958504 -> 180664500 (-0.16%) cycles in affected programs: 19991810 -> 19697806 (-1.47%) helped: 2654 HURT: 486 helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6 helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68% HURT stats (abs) min: 2 max: 396 x̄: 59.18 x̃: 24 HURT stats (rel) min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16% 95% mean confidence interval for cycles value: -125.62 -61.64 95% mean confidence interval for cycles %-change: -0.76% -0.56% Cycles are helped. Sandy Bridge total instructions in shared programs: 10842336 -> 10835438 (-0.06%) instructions in affected programs: 395340 -> 388442 (-1.74%) helped: 1926 HURT: 0 helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2 helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42% 95% mean confidence interval for instructions value: -3.73 -3.43 95% mean confidence interval for instructions %-change: -1.84% -1.72% Instructions are helped. total cycles in shared programs: 154590074 -> 154569050 (-0.01%) cycles in affected programs: 8159932 -> 8138908 (-0.26%) helped: 1670 HURT: 228 helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6 helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28% HURT stats (abs) min: 2 max: 1798 x̄: 40.58 x̃: 14 HURT stats (rel) min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31% 95% mean confidence interval for cycles value: -13.51 -8.64 95% mean confidence interval for cycles %-change: -0.60% -0.46% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8212357 -> 8206587 (-0.07%) instructions in affected programs: 323664 -> 317894 (-1.78%) helped: 1457 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3 helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44% 95% mean confidence interval for instructions value: -4.14 -3.78 95% mean confidence interval for instructions %-change: -1.93% -1.78% Instructions are helped. total cycles in shared programs: 187668016 -> 187657422 (<.01%) cycles in affected programs: 14856234 -> 14845640 (-0.07%) helped: 1372 HURT: 83 helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6 helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08% HURT stats (abs) min: 2 max: 14 x̄: 3.20 x̃: 2 HURT stats (rel) min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for cycles value: -7.65 -6.91 95% mean confidence interval for cycles %-change: -0.11% -0.10% Cycles are helped. Reviewed-by: Kenneth Graunke <[email protected]>
* anv: fix typoEric Engestrom2019-03-011-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: remove spaces around kwargs assignmentEric Engestrom2019-03-011-4/+4
| | | | | | | | pylint complains: > C0326: No space allowed around keyword argument assignment Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: drop unused parameterEric Engestrom2019-03-011-3/+2
| | | | | | | | I'm guessing a previous version of this script used an index-based map of entrypoints, but that's not the case anymore. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: simplify chained comparisonEric Engestrom2019-03-011-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Re-prefix non-logical surface opcodes with VEC4Jason Ekstrand2019-02-2810-47/+25
| | | | | | | | The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so we no longer need the non-logical opcodes there. Prefix them VEC4 so it's clear that they're only used by the vec4 back-end. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/schedule_instructions: Move some commentsJason Ekstrand2019-02-281-37/+37
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Drop unused surface opcodesJason Ekstrand2019-02-287-60/+0
| | | | | | | | | The unused typed surface read/write support in the vec4 back-end has been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all image and buffer ops. There's no reason to keep these opcodes around anymore. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Get rid of the IMAGE_SIZE opcodeJason Ekstrand2019-02-283-10/+5
| | | | | | | Since switching to SHADER_OPCODE_SEND for image operations, we no longer need the non-logical opcode. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/vec4: Drop dead code for handling typed surface messagesJason Ekstrand2019-02-285-272/+0
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Drop the fs_surface_builderJason Ekstrand2019-02-285-471/+195
| | | | | | | | | | All of the actual abstraction (except possibly setting size_written) happens as part of the logical opcodes. The only thing that the surface builder is providing at this point is extra levels of functions to call through. I'm going to be adding bindless image support soon and all the extra abstraction here is just getting in the way. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Re-order logical surface argumentsJason Ekstrand2019-02-281-2/+2
| | | | | | | | It makes more sense to start at the surface then move on to the address and then the data. Also, this is a really good test of whether or not we got all the places that use the sources by explicit integer number. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add an enum type for logical sampler inst sourcesJason Ekstrand2019-02-283-39/+58
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: use correct swizzle for replacementLionel Landwerlin2019-02-271-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | The optimization in 4cd1a0be76883c introduced a replacement of : cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D ... cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D By : cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D ... mov(8) vgrf11.y:D, vgrf15.yyyy:D The first cmp instruction is storing in x while the second mov is sourcing from y. We need to take into account where the replacement on the scan_inst destination is going to store thing so that the replacement mov can source things from the correct location. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 4cd1a0be76883c ("i965/vec4: Propagate conditional modifiers from more compares to other compares") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759 Reviewed-by: Ian Romanick <[email protected]>
* android: add liblog to libmesa_intel_common buildTapani Pälli2019-02-271-1/+1
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add support for sampling from XYUV imagesKasireddy, Vivek2019-02-262-0/+2
| | | | | | | | | Add support to the i965 DRI driver to sample from XYUV8888 buffers. Signed-off-by: Vivek Kasireddy <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/eu: Add an EOT parameter to send_indirect_[split]_messageJason Ekstrand2019-02-254-19/+29
| | | | | | | For split indirect sends we have to put the EOT parameter in the extended descriptor as well as the instruction itself so just calling brw_inst_set_eot is insufficient. Moving the EOT handling handling into the send_indirect_[split]_message helper lets us handle it properly.
* intel/aub_viewer: silence more compiler warningsLionel Landwerlin2019-02-251-2/+2
| | | | | | | format not a string literal and no format arguments. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/aub_viewer: silence compiler warningLionel Landwerlin2019-02-251-1/+1
| | | | | | | buffer_addr may be used uninitialized. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/aub_viewer: printout 48bits addressesLionel Landwerlin2019-02-251-14/+14
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* Revert "anv: add support for INTEL_DEBUG=bat"Lionel Landwerlin2019-02-241-49/+0
| | | | | | This reverts commit e4d88396d259c4ec6032d2834d1c9073d55e9b45. Apologies, I pushed the wrong commit.
* anv: add support for INTEL_DEBUG=batLionel Landwerlin2019-02-231-0/+49
| | | | | | As requested by Ken ;) Signed-off-by: Lionel Landwerlin <[email protected]>
* anv: advertise 8 subpixel precision bitsJuan A. Suarez Romero2019-02-222-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is used to select between 8 bit subpixel precision (value 0) or 4 bit subpixel precision (value 1). As this value is not set, means it is taking the value 0, so 8 bit are used. On the other side, in the Vulkan CTS tests, if the reference rasterizer, which uses 8 bit precision, as it is used to check what should be the expected value for the tests, is changed to use 4 bit as ANV was advertising so far, some of the tests will fail. So it seems ANV is actually using 8 bits. v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason) v3: use _8Bit definition as value (Jason) v4: (by Jason) anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect This field was added on gen8 even though there's an identically defined one in 3DSTATE_SF. CC: Jason Ekstrand <[email protected]> CC: Kenneth Graunke <[email protected]> CC: 18.3 19.0 <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: add missing field values for 3DSTATE_SFJuan A. Suarez Romero2019-02-226-6/+24
| | | | | | | | | Fill out "Vertex Sub Pixel Precision Select" possible values. CC: 18.3 19.0 <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel: fix urb size for CFL GT1Lionel Landwerlin2019-02-221-0/+1
| | | | | | | | Same 192Kb amount as SKL/KBL GT1 applies. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Fixes: de7ed0ba5522 ("i965/CFL: Add PCI Ids for Coffee Lake.")
* isl: the display engine requires 64B alignment for linear surfacesSamuel Iglesias Gonsálvez2019-02-221-0/+8
| | | | | | | v2: Add PRM quote (Lionel) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* android: intel/isl: remove redundant building rulesMauro Rossi2019-02-221-13/+0
| | | | | | | | | | | | | | | | Fixes the following building error: including ./external/mesa/Android.mk ... build/core/base_rules.mk:183: *** external/mesa/src/intel: MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel. make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1 ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c and that source file includes isl_tiled_memcpy.c source Fixes: 96bb328 ("iris: add Android build") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer ↵Francisco Jerez2019-02-211-3/+11
| | | | | | | | | | | | | multiply. Even though the hardware spec claims that any "integer DWord multiply" operation is affected by the regioning restrictions of CHV/BXT/GLK, this is inconsistent with the behavior of the simulator and with empirical evidence -- Return false from has_dst_aligned_region_restriction() for such instructions as a micro-optimization. Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Implement extended strides greater than 4 for IR source regions.Francisco Jerez2019-02-211-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Strides up to 32B can be implemented for the source regions of most instructions by leveraging either the vertical or the horizontal stride of the hardware Align1 region. The main motivation for this is that currently the lower_integer_multiplication() pass will happily double the stride of one of the 32-bit sources, which can blow up if the stride of the original source was already the maximum value allowed by the hardware. An alternative would be to use the regioning legalization pass in order to lower such strides into the composition of multiple legal strides, but that would be somewhat less efficient. This showed up as a regression from my commit cbea91eb57a501bebb1ca2 in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a pre-existing problem that had affected conformance on other platforms without native support for integer multiplication. CHV/BXT were getting around it because the code I removed in that commit had the "fortunate" side effect of emitting narrower regions that didn't hit the hardware stride limit after lowering. Beyond fixing the regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on ICL (that's why this patch is marked for inclusion in mesa-stable even though the original regressing patch was not). According to Jason, a nearly equivalent change had been committed previously as e8c9e65185de3e821e1 and then (mistakenly?) reverted as a31d0382084c8aa8. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <[email protected]> Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Cap dst-aligned region stride to maximum representable hstride value.Francisco Jerez2019-02-211-5/+23
| | | | | | | | | | | This is required in combination with the following commit, because otherwise if a source region with an extended 8+ stride is present in the instruction (which we're about to declare legal) we'll end up emitting code that attempts to write to such a region, even though strides greater than four are still illegal for the destination. Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Lower integer multiply correctly when destination stride equals 4.Francisco Jerez2019-02-211-4/+8
| | | | | | | | | | | | | | | | | Because the "low" temporary needs to be accessed with word type and twice the original stride, attempting to preserve the alignment of the original destination can potentially lead to instructions with illegal destination stride greater than four. Because the CHV/BXT alignment restrictions are now being enforced by the regioning lowering pass run after lower_integer_multiplication(), there is no real need to preserve the original strides anymore. Note that this bug can be reproduced on stable branches, but back-porting would be non-trivial, because the fix relies on the regioning lowering pass recently introduced. Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Exclude control sources from execution type and region alignment ↵Francisco Jerez2019-02-213-4/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | calculations. Currently the execution type calculation will return a bogus value in cases like: mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u Which will be considered to have a 32-bit integer execution type even though the actual indirect move operation will be carried out with 16-bit precision. Similarly there's no need to apply the CHV/BXT double-precision region alignment restrictions to such control sources, since they aren't directly involved in the double-precision arithmetic operations emitted by these virtual instructions. Applying the CHV/BXT restrictions to control sources was expected to be harmless if mildly inefficient, but unfortunately it exposed problems at codegen level for virtual instructions (namely the SHUFFLE instruction used for the Vulkan 1.1 subgroup feature) that weren't prepared to accept control sources with an arbitrary strided region. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <[email protected]> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass." Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: Remove extra space in gen4/45/5 field nameJordan Justen2019-02-213-15/+15
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml/gen_bits_header.py: Use regex to strip no alphanum charsJordan Justen2019-02-211-26/+4
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: handle PatchVerticesIn as a system value.Kenneth Graunke2019-02-211-0/+2
|
* iris: add Android buildTapani Pälli2019-02-211-0/+13
| | | | | | | | | | | | | | | Note that at least following additional libs/components require changes since they refer to BOARD_GPU_DRIVERS variable which is used to select the driver: - mixins - minigbm - libdrm - drm_gralloc v2: (feedback by Gustaw Smolarczyk) Fix trailing \ in a few cases Signed-off-by: Tapani Pälli <[email protected]>
* imgui: make sure our copy of imgui doesn't clash with others in the same processLionel Landwerlin2019-02-211-0/+2
| | | | | | | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Jason Ekstrand <[email protected]> +1-by: Mike Lothian <[email protected]> +1-by: Tapani Pälli <[email protected]> +1-by: Eric Engestrom <[email protected]> +1-by: Yurii Kolesnykov <[email protected]> +1-by: myfreeweb <[email protected]> +1-by: Kenneth Graunke <[email protected]>