aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* pan/bi: Use STAGE srcs for scheduler nopsAlyssa Rosenzweig2020-04-011-2/+2
| | | | | | | | ..rather than using port 0 for the source, which may or may not actually exist. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4396>
* pan/bi: Fix writes_component for VECTORAlyssa Rosenzweig2020-04-011-0/+4
| | | | | | | | I'm not convinced this is the best way and it's sort of a hack, but it fixes RA for st_vary. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4396>
* pan/bit: Wire through I/OAlyssa Rosenzweig2020-04-013-12/+76
| | | | | | | | | We'd like to wire in attributes and uniforms as inputs and look at the varying as output for automatic testing on-device, building up a test framework for us. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4396>
* pan/bit: Add `run` mode to the cmdlineAlyssa Rosenzweig2020-04-011-0/+34
| | | | | | | | | This emulates the functionality of shader_runner (built for kbase) using the bifrost testing infrastructure so it runs on mainline. Ideally this will let us test shaders from the assembler. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4396>
* freedreno/log: fix build errorRob Clark2020-04-011-2/+4
| | | | | | | | | | | | | | | | It seems some versions of gcc are less clever about const initializers: ``` ../src/gallium/drivers/freedreno/freedreno_log.c:58:33: error: initializer element is not constant const unsigned msgs_per_chunk = bo_size / sizeof(uint64_t); ^~~~~~~ ``` See https://gitlab.freedesktop.org/mesa/mesa/-/issues/2713 Signed-off-by: Rob Clark <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4390> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4390>
* nir/algebraic: Remove a redundant fabs patternIan Romanick2020-04-011-1/+0
| | | | | | | | | | | Made redundant by 5544b2cbbd2 ("nir/algebraic: Use value range analysis to eliminate useless unary ops"). No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* nir/algebraic: Use value range analysis to convert fmax to fsatIan Romanick2020-04-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is conceptually similar to the 1-fsat(a) <=> fsat(1-a) rearragement done in: 3b747909419 ("nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))") 2d259713b7 ("nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions"). Note: this helps the Aztex Ruins shader that was hurt for spills and fills on Braodwell in the previous commit, but it does not fix the spills or fills. :( All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14528985 -> 14526116 (-0.02%) instructions in affected programs: 477300 -> 474431 (-0.60%) helped: 2332 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.23 x̃: 1 helped stats (rel) min: 0.07% max: 8.89% x̄: 0.88% x̃: 0.64% 95% mean confidence interval for instructions value: -1.27 -1.19 95% mean confidence interval for instructions %-change: -0.92% -0.85% Instructions are helped. total cycles in shared programs: 203723684 -> 203692984 (-0.02%) cycles in affected programs: 4878847 -> 4848147 (-0.63%) helped: 1764 HURT: 324 helped stats (abs) min: 1 max: 706 x̄: 22.94 x̃: 17 helped stats (rel) min: <.01% max: 17.75% x̄: 1.94% x̃: 1.66% HURT stats (abs) min: 1 max: 400 x̄: 30.15 x̃: 10 HURT stats (rel) min: <.01% max: 17.76% x̄: 1.91% x̃: 0.69% 95% mean confidence interval for cycles value: -16.55 -12.86 95% mean confidence interval for cycles %-change: -1.44% -1.24% Cycles are helped. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* nir/algebraic: Distribute source modifiers into instructionsIan Romanick2020-04-013-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three main classes of cases that are helped by this change: 1. When the negation is applied to a value being type converted (e.g., float(-x)). This could possibly also be handled with more clever code generation. 2. When the negation is applied to a phi node source (e.g., x = -(...); at the end of a basic block). This was the original case that caught my attention while looking at shader-db dumps. 3. When the negation is applied to the source of an instruction that cannot have source modifiers. This includes texture instructions and math box instructions on pre-Gen7 platforms (see more details below). In many these cases the negation can be propagated into the instructions that generate the value (e.g., -(a*b) = (-a)*b). In addition to the operations implemtned in this patch, I also tried: - frcp - Helped 6 or fewer shaders on Gen7+, and hurt just as many on pre-Gen7. On Gen6 and earlier, frcp is a math box instruction, and math box instructions cannot have source modifiers. I suspect this is why so many more shaders are helped on Gen6 than on Gen5 or Gen7. Gen6 supports OpenGL 3.3, so a lot more shaders compile on it. A lot of these shaders may have things like cos(-x) or rcp(-x) that could result in an explicit negation instruction. - bcsel - Hurt a few shaders with none helped. bcsel operates on integer sources, so the fabs or fneg cannot be a source modifier in the bcsel itself. - Integer instructions - No changes on any Intel platform. Some notes about the shader-db results below. - On Tiger Lake, a single Deus Ex fragment shader is hurt for both spills and fills. - On Haswell, a different Deus Ex fragment shader is hurt for both spills and fills. - On GM45, the "LOST: 1" and "GAINED: 1" is a single Left4Dead 2 (very high graphics settings, lol) fragment shader that upgrades from SIMD8 to SIMD16. v2: Add support for fsign. Add some patterns that remove redundant negations and redundant absolute value rather than trying to push them down the tree. Tiger Lake total instructions in shared programs: 17611333 -> 17586465 (-0.14%) instructions in affected programs: 3033734 -> 3008866 (-0.82%) helped: 10310 HURT: 632 helped stats (abs) min: 1 max: 35 x̄: 2.61 x̃: 1 helped stats (rel) min: 0.04% max: 16.67% x̄: 1.43% x̃: 1.01% HURT stats (abs) min: 1 max: 47 x̄: 3.21 x̃: 2 HURT stats (rel) min: 0.04% max: 5.08% x̄: 0.88% x̃: 0.63% 95% mean confidence interval for instructions value: -2.33 -2.21 95% mean confidence interval for instructions %-change: -1.32% -1.27% Instructions are helped. total cycles in shared programs: 338365223 -> 338262252 (-0.03%) cycles in affected programs: 125291811 -> 125188840 (-0.08%) helped: 5224 HURT: 2031 helped stats (abs) min: 1 max: 5670 x̄: 46.73 x̃: 12 helped stats (rel) min: <.01% max: 34.78% x̄: 1.91% x̃: 0.97% HURT stats (abs) min: 1 max: 2882 x̄: 69.50 x̃: 14 HURT stats (rel) min: <.01% max: 44.93% x̄: 2.35% x̃: 0.74% 95% mean confidence interval for cycles value: -18.71 -9.68 95% mean confidence interval for cycles %-change: -0.80% -0.63% Cycles are helped. total spills in shared programs: 8942 -> 8946 (0.04%) spills in affected programs: 8 -> 12 (50.00%) helped: 0 HURT: 1 total fills in shared programs: 9399 -> 9401 (0.02%) fills in affected programs: 21 -> 23 (9.52%) helped: 0 HURT: 1 Ice Lake total instructions in shared programs: 16124348 -> 16102258 (-0.14%) instructions in affected programs: 2830928 -> 2808838 (-0.78%) helped: 11294 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.96 x̃: 1 helped stats (rel) min: 0.07% max: 17.65% x̄: 1.32% x̃: 0.93% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.45% max: 4.00% x̄: 3.72% x̃: 3.72% 95% mean confidence interval for instructions value: -1.99 -1.93 95% mean confidence interval for instructions %-change: -1.34% -1.29% Instructions are helped. total cycles in shared programs: 335393932 -> 335325794 (-0.02%) cycles in affected programs: 123834609 -> 123766471 (-0.06%) helped: 5034 HURT: 2128 helped stats (abs) min: 1 max: 3256 x̄: 43.39 x̃: 11 helped stats (rel) min: <.01% max: 35.79% x̄: 1.98% x̃: 1.00% HURT stats (abs) min: 1 max: 2634 x̄: 70.63 x̃: 16 HURT stats (rel) min: <.01% max: 49.49% x̄: 2.73% x̃: 0.62% 95% mean confidence interval for cycles value: -13.66 -5.37 95% mean confidence interval for cycles %-change: -0.69% -0.48% Cycles are helped. LOST: 0 GAINED: 2 Skylake total instructions in shared programs: 14949240 -> 14927930 (-0.14%) instructions in affected programs: 2594756 -> 2573446 (-0.82%) helped: 11000 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.94 x̃: 1 helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76% 95% mean confidence interval for instructions value: -1.97 -1.91 95% mean confidence interval for instructions %-change: -1.42% -1.37% Instructions are helped. total cycles in shared programs: 324829346 -> 324821596 (<.01%) cycles in affected programs: 121566087 -> 121558337 (<.01%) helped: 4611 HURT: 2147 helped stats (abs) min: 1 max: 3715 x̄: 33.29 x̃: 10 helped stats (rel) min: <.01% max: 36.08% x̄: 1.94% x̃: 1.00% HURT stats (abs) min: 1 max: 2551 x̄: 67.88 x̃: 16 HURT stats (rel) min: <.01% max: 53.79% x̄: 3.69% x̃: 0.89% 95% mean confidence interval for cycles value: -4.25 1.96 95% mean confidence interval for cycles %-change: -0.28% -0.02% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 14971203 -> 14949957 (-0.14%) instructions in affected programs: 2635699 -> 2614453 (-0.81%) helped: 10982 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.93 x̃: 1 helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76% 95% mean confidence interval for instructions value: -1.97 -1.90 95% mean confidence interval for instructions %-change: -1.42% -1.37% Instructions are helped. total cycles in shared programs: 336215033 -> 336086458 (-0.04%) cycles in affected programs: 127383198 -> 127254623 (-0.10%) helped: 4884 HURT: 1963 helped stats (abs) min: 1 max: 25696 x̄: 51.78 x̃: 12 helped stats (rel) min: <.01% max: 58.28% x̄: 2.00% x̃: 1.05% HURT stats (abs) min: 1 max: 3401 x̄: 63.33 x̃: 16 HURT stats (rel) min: <.01% max: 39.95% x̄: 2.20% x̃: 0.70% 95% mean confidence interval for cycles value: -29.99 -7.57 95% mean confidence interval for cycles %-change: -0.89% -0.71% Cycles are helped. total fills in shared programs: 24905 -> 24901 (-0.02%) fills in affected programs: 117 -> 113 (-3.42%) helped: 4 HURT: 0 LOST: 0 GAINED: 16 Haswell total instructions in shared programs: 13148927 -> 13131528 (-0.13%) instructions in affected programs: 2220941 -> 2203542 (-0.78%) helped: 8017 HURT: 4 helped stats (abs) min: 1 max: 12 x̄: 2.17 x̃: 1 helped stats (rel) min: 0.07% max: 15.25% x̄: 1.40% x̃: 0.93% HURT stats (abs) min: 1 max: 7 x̄: 2.50 x̃: 1 HURT stats (rel) min: 0.33% max: 4.76% x̄: 2.73% x̃: 2.91% 95% mean confidence interval for instructions value: -2.21 -2.13 95% mean confidence interval for instructions %-change: -1.43% -1.37% Instructions are helped. total cycles in shared programs: 321221791 -> 321079870 (-0.04%) cycles in affected programs: 126886055 -> 126744134 (-0.11%) helped: 4674 HURT: 1729 helped stats (abs) min: 1 max: 23654 x̄: 56.47 x̃: 16 helped stats (rel) min: <.01% max: 53.22% x̄: 2.13% x̃: 1.05% HURT stats (abs) min: 1 max: 3694 x̄: 70.58 x̃: 18 HURT stats (rel) min: <.01% max: 63.06% x̄: 2.48% x̃: 0.90% 95% mean confidence interval for cycles value: -33.31 -11.02 95% mean confidence interval for cycles %-change: -0.99% -0.78% Cycles are helped. total spills in shared programs: 19872 -> 19874 (0.01%) spills in affected programs: 21 -> 23 (9.52%) helped: 0 HURT: 1 total fills in shared programs: 20941 -> 20941 (0.00%) fills in affected programs: 62 -> 62 (0.00%) helped: 1 HURT: 1 LOST: 0 GAINED: 8 Ivy Bridge total instructions in shared programs: 11875553 -> 11853839 (-0.18%) instructions in affected programs: 1553112 -> 1531398 (-1.40%) helped: 7304 HURT: 3 helped stats (abs) min: 1 max: 16 x̄: 2.97 x̃: 2 helped stats (rel) min: 0.07% max: 15.25% x̄: 1.62% x̃: 1.15% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.05% max: 3.33% x̄: 2.44% x̃: 2.94% 95% mean confidence interval for instructions value: -3.04 -2.90 95% mean confidence interval for instructions %-change: -1.65% -1.59% Instructions are helped. total cycles in shared programs: 178246425 -> 178184484 (-0.03%) cycles in affected programs: 13702146 -> 13640205 (-0.45%) helped: 4409 HURT: 1566 helped stats (abs) min: 1 max: 531 x̄: 24.52 x̃: 13 helped stats (rel) min: <.01% max: 38.67% x̄: 2.14% x̃: 1.02% HURT stats (abs) min: 1 max: 356 x̄: 29.48 x̃: 10 HURT stats (rel) min: <.01% max: 64.73% x̄: 1.87% x̃: 0.70% 95% mean confidence interval for cycles value: -11.60 -9.14 95% mean confidence interval for cycles %-change: -1.19% -0.99% Cycles are helped. LOST: 0 GAINED: 10 Sandy Bridge total instructions in shared programs: 10695740 -> 10667483 (-0.26%) instructions in affected programs: 2337607 -> 2309350 (-1.21%) helped: 10720 HURT: 1 helped stats (abs) min: 1 max: 49 x̄: 2.64 x̃: 2 helped stats (rel) min: 0.07% max: 20.00% x̄: 1.54% x̃: 1.13% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.04% max: 1.04% x̄: 1.04% x̃: 1.04% 95% mean confidence interval for instructions value: -2.69 -2.58 95% mean confidence interval for instructions %-change: -1.57% -1.51% Instructions are helped. total cycles in shared programs: 153478839 -> 153416223 (-0.04%) cycles in affected programs: 22050900 -> 21988284 (-0.28%) helped: 5342 HURT: 2200 helped stats (abs) min: 1 max: 1020 x̄: 20.34 x̃: 16 helped stats (rel) min: <.01% max: 24.05% x̄: 1.51% x̃: 0.86% HURT stats (abs) min: 1 max: 335 x̄: 20.93 x̃: 6 HURT stats (rel) min: <.01% max: 20.18% x̄: 1.03% x̃: 0.30% 95% mean confidence interval for cycles value: -9.18 -7.42 95% mean confidence interval for cycles %-change: -0.82% -0.71% Cycles are helped. Iron Lake total instructions in shared programs: 8114882 -> 8105574 (-0.11%) instructions in affected programs: 1232504 -> 1223196 (-0.76%) helped: 4109 HURT: 2 helped stats (abs) min: 1 max: 6 x̄: 2.27 x̃: 1 helped stats (rel) min: 0.05% max: 8.33% x̄: 0.99% x̃: 0.66% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.94% max: 4.35% x̄: 2.65% x̃: 2.65% 95% mean confidence interval for instructions value: -2.31 -2.21 95% mean confidence interval for instructions %-change: -1.01% -0.96% Instructions are helped. total cycles in shared programs: 188504036 -> 188466296 (-0.02%) cycles in affected programs: 31203798 -> 31166058 (-0.12%) helped: 3447 HURT: 36 helped stats (abs) min: 2 max: 92 x̄: 11.03 x̃: 8 helped stats (rel) min: <.01% max: 5.41% x̄: 0.21% x̃: 0.13% HURT stats (abs) min: 2 max: 30 x̄: 7.33 x̃: 6 HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.18% x̃: 0.10% 95% mean confidence interval for cycles value: -11.16 -10.51 95% mean confidence interval for cycles %-change: -0.22% -0.20% Cycles are helped. LOST: 0 GAINED: 1 GM45 total instructions in shared programs: 4989697 -> 4984531 (-0.10%) instructions in affected programs: 703952 -> 698786 (-0.73%) helped: 2493 HURT: 2 helped stats (abs) min: 1 max: 6 x̄: 2.07 x̃: 1 helped stats (rel) min: 0.05% max: 8.33% x̄: 1.03% x̃: 0.66% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.95% max: 4.35% x̄: 2.65% x̃: 2.65% 95% mean confidence interval for instructions value: -2.13 -2.01 95% mean confidence interval for instructions %-change: -1.07% -0.99% Instructions are helped. total cycles in shared programs: 128929136 -> 128903886 (-0.02%) cycles in affected programs: 21583096 -> 21557846 (-0.12%) helped: 2214 HURT: 17 helped stats (abs) min: 2 max: 92 x̄: 11.44 x̃: 8 helped stats (rel) min: <.01% max: 5.41% x̄: 0.24% x̃: 0.13% HURT stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4 HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.20% x̃: 0.09% 95% mean confidence interval for cycles value: -11.75 -10.88 95% mean confidence interval for cycles %-change: -0.25% -0.22% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* nir/algebraic: Change the default cursor location when replacing a unary opIan Romanick2020-04-011-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the expression tree that is being replaced has a unary operation at its root, set the cursor (location where new instructions are inserted) at the source instruction instead. This doesn't do much now because there are very few patterns that have a unary operation as the root. Almost all of the patterns that do have a unary operation as the root have inot. All of the shaders that are affected by this commit have expression trees with an inot at the root. This change prevents some significant, spurious caused by the next commit. There is further explanation in the large comment added in the code. I also considered a couple other options that may still be worth exploring. 1. Add some mark-up to the search pattern to denote where new instructions should be added. I considered using "@" to denote the cursor location. For example, (('fneg', ('fadd@', a, b)), ...) 2. To prevent other kinds of unintended code motion, add the ability to name expressions in the search pattern so that they can be reused in the replacement. For example, (('bcsel', ('ige', ('find_lsb=b', a), 0), ('find_lsb', a), -1), b), An alternative would be to add some kind of CSE at the time of inserting the replacements. Create a new instruction, then check to see if it already exists. That option might be better overall. Over the years I know Matt has heard me complain, "I added a pattern that just deleted an instruction, but it added a bunch of spills!" This was always in large, complex shaders that are very hard to analyze. I always blamed these cases on the scheduler being dumb. I am now very suspicious that unintended code motion was the real problem. All Gen4+ Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 17611405 -> 17611333 (<.01%) instructions in affected programs: 18613 -> 18541 (-0.39%) helped: 41 HURT: 13 helped stats (abs) min: 1 max: 18 x̄: 4.46 x̃: 4 helped stats (rel) min: 0.27% max: 5.68% x̄: 1.29% x̃: 1.34% HURT stats (abs) min: 1 max: 20 x̄: 8.54 x̃: 7 HURT stats (rel) min: 0.30% max: 4.20% x̄: 2.15% x̃: 2.38% 95% mean confidence interval for instructions value: -3.29 0.63 95% mean confidence interval for instructions %-change: -0.95% 0.02% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 338366118 -> 338365223 (<.01%) cycles in affected programs: 257889 -> 256994 (-0.35%) helped: 42 HURT: 15 helped stats (abs) min: 2 max: 120 x̄: 39.38 x̃: 34 helped stats (rel) min: 0.04% max: 2.55% x̄: 0.86% x̃: 0.76% HURT stats (abs) min: 6 max: 204 x̄: 50.60 x̃: 34 HURT stats (rel) min: 0.11% max: 4.75% x̄: 1.12% x̃: 0.56% 95% mean confidence interval for cycles value: -30.39 -1.02 95% mean confidence interval for cycles %-change: -0.66% -0.02% Cycles are helped. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* intel/vec4: Allow late copy propagation on vec4Ian Romanick2020-04-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change incurs a small amount of hurt now, but it enables a lot of benefit on vec4 shaders on the next commit. nir_opt_algebraic_late converts dph, dot3, etc. to dhp_replicated, dot_replicated3, etc. In the process, it introduces extra moves. If the original NIR contained vec1 32 ssa_45 = fdot4 ssa_51, ssa_44 vec1 32 ssa_46 = fneg ssa_45 nir_opt_algebraic_late will produce vec4 32 ssa_18 = fdot_replicated4 ssa_1, ssa_15 vec1 32 ssa_19 = mov ssa_18.x vec1 32 ssa_17 = fneg ssa_19 The algebraic pass added in the next commit can't see through the move to know that the fneg applies to a fdot_replicated4. Haswell, Ivy Bridge, and Sandybridge had similar results. (Haswell shown) total cycles in shared programs: 187077604 -> 187079858 (<.01%) cycles in affected programs: 350132 -> 352386 (0.64%) helped: 174 HURT: 194 helped stats (abs) min: 2 max: 124 x̄: 23.60 x̃: 16 helped stats (rel) min: 0.12% max: 15.88% x̄: 4.98% x̃: 3.86% HURT stats (abs) min: 2 max: 164 x̄: 32.78 x̃: 16 HURT stats (rel) min: 0.17% max: 22.82% x̄: 6.46% x̃: 0.86% 95% mean confidence interval for cycles value: 2.04 10.21 95% mean confidence interval for cycles %-change: 0.17% 1.93% Cycles are HURT. No shader-db changes on any other Intel platform. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* nir: fix crash in varying packing on interface mismatchTimothy Arceri2020-03-311-2/+22
| | | | | | | | | | | For example when the outputs are scalars but the inputs are struct members. Fixes: 26aa460940f6 ("nir: rewrite varying component packing") Reviewed-By: Timur Kristóf <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4351> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4351>
* freedreno/turnip: Use the NIR info to decide if we need helper invocations.Eric Anholt2020-03-313-1/+7
| | | | | | | | | | | | | | We had an approximation that was assuming any ddx or tex instruction needed helper invocations, but that's not true for texelFetch() or textureSize(). It also meant that we were setting PIXLOD on vertex and compute shaders doing texturing, which doesn't really make sense. shader-db (with a hack to log pixlod): total pixlod in shared programs: 582 -> 573 (-1.55%) Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2681 Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4308> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4308>
* freedreno: Drop an unnecessary include marked "this should go away"Eric Anholt2020-03-311-3/+0
| | | | | | | | It came in with the initial import, and doesn't seem to be necessary any more. Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4289> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4289>
* freedreno/ir3: fix android buildRob Clark2020-03-311-2/+2
| | | | | | | Fixes: e5339fe4a47 ("Move compiler.h and imports.h/c from src/mesa/main into src/util") Signed-off-by: Rob Clark <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4381> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4381>
* util: move ALIGN/ROUND_DOWN_TO to u_math.hRob Clark2020-03-312-46/+46
| | | | | | | | | These are less mesa specific than the rest of macros.h, and would be nice to use outside of mesa. Prep for next patch. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4381>
* spirv: Implement OpCopyObject and OpCopyLogical as blind copiesJason Ekstrand2020-03-311-3/+23
| | | | | | | | | | | | | | | | Because the types etc. are required to logically match, we can just copy-propagate the guts of the vtn_value. This was causing issues with some new CTS tests that are doing an OpCopyObject of a sampler which is a special-cased type in spirv_to_nir. Of course, this is only a partial solution. Ideally, we've got a bit of work to do to make all the composite stuff able to handle all types including images, sampler, and combined image/samplers but this gets some CTS tests passing. Cc: [email protected] Reviewed-by: Ian Romanick <[email protected]> Acked-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375>
* isl: don't warn in physical extent calculation for yuv formatsLionel Landwerlin2020-03-312-2/+8
| | | | | | | | | | | | | | | Those format have correct descriptions already with the exception of the planar format. In that case we introduce an assert. This fine because we don't use the planar format in any of our drivers. There are restrictions on how the addresses of the 2 planes are relative to one another which make this annoying. The sampler is also more limited than what we can do with a shader snippet. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2999> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2999>
* isl: set bpb for Y8_UNORMLionel Landwerlin2020-03-311-1/+1
| | | | | | | | | | | | This isn't a format we use in any of the drivers but for consistency just give it a correct bpb. We also set the luminance in the G channel. We can't actually use this format with the 3D sampler (only media). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2999>
* scons: prune unused Makefile.sourcesEric Engestrom2020-03-312-187/+0
| | | | | | | | | | Fixes: 2e92d3381988a85b2a6d ("scons: Prune out unnecessary targets.") Signed-off-by: Eric Engestrom <[email protected]> Reviewed by: Jose Fonseca <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4373> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4373>
* tu: Return the correct alignment for imagesConnor Abbott2020-03-312-11/+1
| | | | | | | | | The alignment field was never initialized, so we were just returning an alignment of 0. Return the alignment from fdl, and while we're here cleanup some leftovers in tu_private.h. Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4357> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4357>
* freedreno/fdl: Add base_alignConnor Abbott2020-03-312-13/+21
| | | | | | | | | | Tell users what the base address of the image needs to be aligned to. These values are based on experimentation via passing an offset to vkBindImageMemory with turnip and seeing if tests still pass. Note that r8g8 is also special in this regard, however it actually has an increased alignment (in bytes). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4357>
* anv/allocator: Use util_dynarray for blocks in anv_state_streamJason Ekstrand2020-03-312-38/+22
| | | | | | | | | | | | | | | | | | | | | | When we originally wrote a bunch of the allocation data structures, we re-used the GPU memory for CPU-side data structures. It's a bit more memory efficient and usually ok. However, this has a couple of problems: 1. It makes it MUCH more likely that the GPU will accidentlly stomp CPU-side data structures and cause nearly impossible to debug crashes. 2. With discrete GPUs, the memory will be mapped somehow and that map may be across the BAR so it could have horribly slow CPU access. This is bad for our CPU-side data structures. In the case of anv_state_stream, it also made the data structure massively more complex than it needed to be. Reviewed-by: Lionel Landwerlin <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
* anv: Account for the header in anv_state_stream_allocJason Ekstrand2020-03-311-2/+3
| | | | | | | | | | | If we have an allocation that's exactly the block size, we end up computing a new block size to allocate that's exactly the block size, add in the header, and then assert fail. When computing the block size, we need to account for the header. Fixes: 955127db937 "anv/allocator: Add support for large stream..." Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
* st/mesa: add environment variable pin_app_thread for faster glthread on AMD ZenMarek Olšák2020-03-301-0/+10
| | | | | | Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4369> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4369>
* gallium/u_threaded: call the driver to pin threads to L3 immediatelyMarek Olšák2020-03-301-6/+14
| | | | | | | | This is thread-safe and we want it to be done immediately for good L3 cache usage. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4369>
* lima: also check tiled and depth case when importQiang Yu2020-03-311-19/+20
| | | | | | | | | We missed the tiled and depth case when check buffer alignment. Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4362> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4362>
* lima: fix buffer import with offsetQiang Yu2020-03-311-4/+15
| | | | | | | | | | | | | | | | | | | | With EGL_EXT_image_dma_buf_import, user can import dma_buf with offset. This is also used by AOSP GLConsumer::updateTexImage with HAL_PIXEL_FORMAT_YV12 buffer which store YUV planes in the same buffer with offset. Render sample from it using GL_OES_EGL_image_external. This should fix some video display problem when using MediaCodec soft decoding which generates HAL_PIXEL_FORMAT_YV12 buffer and render it on screen. Test program: https://github.com/yuq/gfx/tree/master/yuv2rgb/dma-buf Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4362>
* pan/bi: Fix handling of constants with COMBINEAlyssa Rosenzweig2020-03-311-0/+6
| | | | | | | | | | We should never see COMBINE constants explicitly since they'll become moves anyway, so we can simplify that. On the other hand, we do need the type information for the lowering to work properly. Signed-off-by: Alyssa Rosenzweig <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Handle fp16/abs scheduling restrictionAlyssa Rosenzweig2020-03-313-3/+33
| | | | | | | | | See previous commit for the packing side. Here we update the scheduler to accomodate this. Note we don't actually hit this path yet, but it's good to be proactive. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Handle abs packing for fp16/FMA add/minAlyssa Rosenzweig2020-03-312-3/+62
| | | | | | | | It's seriously quirky, and all to save a single bit. Alas. It also introduces an edge case for the scheduler which is a bit annoying. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Handle core faddminmax16 packingAlyssa Rosenzweig2020-03-311-4/+33
| | | | | | | | This works without the exception of absolute values, which have some... odd properties to be handled in the next commit. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Structify fadd/min/max16Alyssa Rosenzweig2020-03-311-0/+19
| | | | | | | There is some quirky encoding here. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Add v2f16 versions of rounding opsAlyssa Rosenzweig2020-03-311-8/+16
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Handle round opcodes in frontendAlyssa Rosenzweig2020-03-311-0/+22
| | | | | | | These correspond to various ops routed through BI_ROUND Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Assert out i16 related converts for nowAlyssa Rosenzweig2020-03-311-3/+4
| | | | | | | Needs more investigation, and GLSL doesn't use it quite yet sadly. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Add one-source f32->f16 opAlyssa Rosenzweig2020-03-311-1/+8
| | | | | | | | This really has a second op for vectorization but we don't handle this quite yet... Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Add bifrost_fma_2src genericAlyssa Rosenzweig2020-03-311-0/+6
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Handle standard FMA conversionsAlyssa Rosenzweig2020-03-311-3/+12
| | | | | | | These are plain old 1-sources so they're easy to start with. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Enumerate conversionsAlyssa Rosenzweig2020-03-311-1/+51
| | | | | | | | There are lots of Bifrost conversion opcodes that can all be emitted from BI_CONVERT, let's pattern match. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Expand out FMA conversion opcodesAlyssa Rosenzweig2020-03-312-0/+41
| | | | | | | | | | There are a *lot* of them, with lots of symmetry we can exploit to simplify the packing logic (but not entirely). Let's add the corresponding header structs/defines, although we don't actually poke the disassembler at this stage. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Pack outmod and roundmode with FMAAlyssa Rosenzweig2020-03-311-0/+4
| | | | | | | The fields got missed. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Add FMA16 packingAlyssa Rosenzweig2020-03-311-12/+51
| | | | | | | It's like the original FMA packing but with swizzles introduced. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Fix missing type for fmulAlyssa Rosenzweig2020-03-311-0/+1
| | | | | | | | We add a zero argument, we want it to align with the size of whatever the other arguments were for optimization. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Finish FMA structuresAlyssa Rosenzweig2020-03-311-4/+21
| | | | | | | | There were some missing fields for the 32-bit case, and the 16-bit case has separate packing. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Ignore swizzle in unwritten componentAlyssa Rosenzweig2020-03-313-0/+13
| | | | | | | Otherwise we can trip the assert for no good reason. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Handle f2f* opcodesAlyssa Rosenzweig2020-03-311-0/+4
| | | | | | | Just more converts that got missed. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* panfrost: Enable PIPE_SHADER_CAP_FP16 on BifrostAlyssa Rosenzweig2020-03-312-2/+10
| | | | | | | | We don't have fp16 implemented on Midgard yet but on Bifrost we can flip it on now. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Enable precision lowering in standalone compilerAlyssa Rosenzweig2020-03-311-1/+2
| | | | | | | ..since there's no CAP to guide here. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Fix off-by-one in scoreboarding packingAlyssa Rosenzweig2020-03-312-6/+5
| | | | | | | Clauses actually encode the *next* clauses' dependencies. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
* pan/bi: Fix overzealous write barriersAlyssa Rosenzweig2020-03-311-1/+4
| | | | | | | It's possible this triggers an INSTR_INVALID_ENC. Signed-off-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>