| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3970>
|
|
|
|
| |
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3686>
|
|
|
|
|
|
|
|
|
|
| |
This fixes a build failure on MSVC.
BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.
Signed-off-by: Brian Paul <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6cb ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16328255 -> 16315391 (-0.08%)
instructions in affected programs: 218318 -> 205454 (-5.89%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 72 x̄: 13.02 x̃: 10
helped stats (rel) min: 0.33% max: 16.04% x̄: 6.27% x̃: 4.88%
95% mean confidence interval for instructions value: -13.69 -12.35
95% mean confidence interval for instructions %-change: -6.55% -5.99%
Instructions are helped.
total cycles in shared programs: 363683977 -> 363615417 (-0.02%)
cycles in affected programs: 1475193 -> 1406633 (-4.65%)
helped: 923
HURT: 36
helped stats (abs) min: 1 max: 624 x̄: 75.78 x̃: 48
helped stats (rel) min: 0.08% max: 13.89% x̄: 5.20% x̃: 5.08%
HURT stats (abs) min: 1 max: 179 x̄: 38.58 x̃: 4
HURT stats (rel) min: 0.06% max: 16.56% x̄: 3.33% x̃: 0.29%
95% mean confidence interval for cycles value: -75.88 -67.10
95% mean confidence interval for cycles %-change: -5.10% -4.66%
Cycles are helped.
Sandy Bridge
total instructions in shared programs: 10785779 -> 10785654 (<.01%)
instructions in affected programs: 13855 -> 13730 (-0.90%)
helped: 67
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.87 x̃: 1
helped stats (rel) min: 0.20% max: 3.45% x̄: 0.97% x̃: 0.78%
95% mean confidence interval for instructions value: -2.47 -1.26
95% mean confidence interval for instructions %-change: -1.13% -0.81%
Instructions are helped.
total cycles in shared programs: 153704799 -> 153704481 (<.01%)
cycles in affected programs: 101509 -> 101191 (-0.31%)
helped: 38
HURT: 13
helped stats (abs) min: 1 max: 38 x̄: 12.53 x̃: 16
helped stats (rel) min: 0.07% max: 2.69% x̄: 0.87% x̃: 0.53%
HURT stats (abs) min: 1 max: 36 x̄: 12.15 x̃: 7
HURT stats (rel) min: 0.06% max: 2.53% x̄: 0.73% x̃: 0.44%
95% mean confidence interval for cycles value: -10.24 -2.24
95% mean confidence interval for cycles %-change: -0.75% -0.17%
Cycles are helped.
LOST: 2
GAINED: 0
No shader-db change on Iron Lake or GM45.
|
|
|
|
|
|
|
| |
This allows the reslut of mov and bcsel to be separately interpreted as
float or int depending on the use.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders are hurt by this change because now a
load_const(0x00000000) is not recognized as eq_zero when loaded as a
float. This behavior is restored in a later patch (nir/range-analysis:
Use types to provide better ranges from bcsel and mov).
v2: Add a comment about reinterpretation of int/uint/bool. Suggested by
Caio. Rewrite condition the check for types being float versus checking
for types not being all the things that aren't float.
Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16327543 -> 16328255 (<.01%)
instructions in affected programs: 55928 -> 56640 (1.27%)
helped: 0
HURT: 208
HURT stats (abs) min: 1 max: 16 x̄: 3.42 x̃: 3
HURT stats (rel) min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12%
95% mean confidence interval for instructions value: 3.06 3.79
95% mean confidence interval for instructions %-change: 1.17% 1.46%
Instructions are HURT.
total cycles in shared programs: 363682759 -> 363683977 (<.01%)
cycles in affected programs: 325758 -> 326976 (0.37%)
helped: 44
HURT: 133
helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5
helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29%
HURT stats (abs) min: 1 max: 157 x̄: 20.28 x̃: 14
HURT stats (rel) min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73%
95% mean confidence interval for cycles value: 0.38 13.39
95% mean confidence interval for cycles %-change: -0.06% 0.96%
Inconclusive result (%-change mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs: 10787433 -> 10787443 (<.01%)
instructions in affected programs: 1842 -> 1852 (0.54%)
helped: 0
HURT: 10
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.36% 1.10%
Instructions are HURT.
total cycles in shared programs: 153724543 -> 153724563 (<.01%)
cycles in affected programs: 8407 -> 8427 (0.24%)
helped: 1
HURT: 3
helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18
helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98%
HURT stats (abs) min: 4 max: 18 x̄: 12.67 x̃: 16
HURT stats (rel) min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72%
95% mean confidence interval for cycles value: -21.31 31.31
95% mean confidence interval for cycles %-change: -1.11% 1.46%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on Iron Lake or GM45.
|
|
|
|
|
|
|
|
|
| |
It's only correct when 'a' is an integral greater or equal to 0.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493
Fixes: 5544b2cbbd2 ("nir/algebraic: Use value range analysis to eliminate useless unary ops")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Update several of the comments. Drop some redundant uses of
ASSERT_UNION_OF_OTHERS_MATCHES_UNKNOWN_*_SOURCE source. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One shader from Metro Last Light and the rest from Rochard. In the
Rochard cases, something like:
min(1.0, max(pow(saturate(x), y), z))
was transformed to
saturate(max(pow(saturate(x), y), z))
because the result of the pow must be >= 0.
The Metro Last Light case was similar. An instance of
min(pow(abs(x), y), 1.0)
became
saturate(pow(abs(x), y))
v2: Fix some comments. Suggested by Caio.
v3: Fix setting is_intgral when the exponent might be negative. See
also Mesa MR !1778.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280670 -> 16280659 (<.01%)
instructions in affected programs: 1130 -> 1119 (-0.97%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.72% max: 1.43% x̄: 1.03% x̃: 0.97%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.19% -0.86%
Instructions are helped.
total cycles in shared programs: 367168430 -> 367168270 (<.01%)
cycles in affected programs: 10281 -> 10121 (-1.56%)
helped: 10
HURT: 1
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.31% max: 2.43% x̄: 1.79% x̃: 1.70%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 3.10% max: 3.10% x̄: 3.10% x̃: 3.10%
95% mean confidence interval for cycles value: -20.06 -9.04
95% mean confidence interval for cycles %-change: -2.36% -0.32%
Cycles are helped.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I discovered this while looking at a shader that was hurt by some other
work I'm doing. When I examined the changes, I was confused that one
instance of a comparison that was used in a discard_if was (incorrectly)
eliminated, while another instance used by a bcsel was (correctly) not
eliminated. I had to use NIR_PRINT=true to see exactly where things
when wrong.
A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and
Strike Suit Zero were impacted.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280659 -> 16281075 (<.01%)
instructions in affected programs: 21042 -> 21458 (1.98%)
helped: 0
HURT: 136
HURT stats (abs) min: 1 max: 9 x̄: 3.06 x̃: 3
HURT stats (rel) min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03%
95% mean confidence interval for instructions value: 2.93 3.19
95% mean confidence interval for instructions %-change: 2.08% 2.37%
Instructions are HURT.
total cycles in shared programs: 367168270 -> 367170313 (<.01%)
cycles in affected programs: 172020 -> 174063 (1.19%)
helped: 14
HURT: 111
helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9
helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79%
HURT stats (abs) min: 2 max: 584 x̄: 21.08 x̃: 5
HURT stats (rel) min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40%
95% mean confidence interval for cycles value: 5.41 27.28
95% mean confidence interval for cycles %-change: 0.64% 1.81%
Cycles are HURT.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found by inspection. I tried really, really hard to make a test case
that would trigger this problem, but I was unsuccesful. It's very hard
to get an instruction to produce a ne_zero result without ne_zero
sources. The most plausible way is using bcsel. That proves
problematic because bcsel interprets its sources as integers, so it
cannot currently be used to "clean" values for floating point
instructions.
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
flush-to-zero
Fixes piglit tests (new in piglit!110):
- fs-underflow-fma-compare-zero.shader_test
- fs-underflow-mul-compare-zero.shader_test
v2: Add back part of comment accidentally deleted. Noticed by
Caio. Remove is_not_zero function as it is no longer used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: fa116ce357b ("nir/range-analysis: Range tracking for ffma and flrp")
Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
All Gen7+ platforms** had similar results. (Ice Lake shown)
total instructions in shared programs: 16278465 -> 16279492 (<.01%)
instructions in affected programs: 16765 -> 17792 (6.13%)
helped: 0
HURT: 23
HURT stats (abs) min: 7 max: 275 x̄: 44.65 x̃: 8
HURT stats (rel) min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62%
95% mean confidence interval for instructions value: 9.57 79.74
95% mean confidence interval for instructions %-change: 1.85% 6.61%
Instructions are HURT.
total cycles in shared programs: 367135159 -> 367154270 (<.01%)
cycles in affected programs: 279306 -> 298417 (6.84%)
helped: 0
HURT: 23
HURT stats (abs) min: 13 max: 6029 x̄: 830.91 x̃: 54
HURT stats (rel) min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49%
95% mean confidence interval for cycles value: 100.89 1560.94
95% mean confidence interval for cycles %-change: 0.94% 13.71%
Cycles are HURT.
total spills in shared programs: 8870 -> 8869 (-0.01%)
spills in affected programs: 19 -> 18 (-5.26%)
helped: 1
HURT: 0
total fills in shared programs: 21904 -> 21901 (-0.01%)
fills in affected programs: 81 -> 78 (-3.70%)
helped: 1
HURT: 0
LOST: 0
GAINED: 1
** On Broadwell, a shader was hurt for spills / fills instead of
helped.
No changes on any earlier platforms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes piglit tests (new in piglit!110):
- fs-underflow-exp2-compare-zero.shader_test
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Most of the shaders affected are, unsurprisingly, in Unigine Heaven.
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16278207 -> 16278465 (<.01%)
instructions in affected programs: 11374 -> 11632 (2.27%)
helped: 0
HURT: 58
HURT stats (abs) min: 2 max: 13 x̄: 4.45 x̃: 4
HURT stats (rel) min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82%
95% mean confidence interval for instructions value: 3.77 5.13
95% mean confidence interval for instructions %-change: 2.19% 2.64%
Instructions are HURT.
total cycles in shared programs: 367134284 -> 367135159 (<.01%)
cycles in affected programs: 81207 -> 82082 (1.08%)
helped: 17
HURT: 36
helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6
helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78%
HURT stats (abs) min: 4 max: 235 x̄: 66.97 x̃: 16
HURT stats (rel) min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09%
95% mean confidence interval for cycles value: -20.36 53.38
95% mean confidence interval for cycles %-change: -1.08% 4.67%
Inconclusive result (value mean confidence interval includes 0).
No changes on any earlier platforms.
|
|
|
|
|
| |
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A similar technique could be used for fmin3, fmax3, and fmid3.
This could be squashed with the previous commit. I kept it separate to
ease review.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This could be squashed with the previous commit. I kept it separate to
ease review.
v2: Add some missing cases. Use nir_src_is_const helper. Both
suggested by Caio. Use a table for mapping source ranges to a result
range.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This could be squashed with the previous commit. I kept it separate to
ease review.
v2: Use a switch statement and add more comments. Both suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
Most integer operations are omitted because dealing with integer
overflow is hard. There are a few things that could be smarter if there
was a small amount more tracking of ranges of integer types (i.e.,
operands are Boolean, operand values fit in 16 bits, etc.).
The changes to nir_search_helpers.h are included in this patch to
simplify reordering the changes to nir_opt_algebraic.py.
v2: Memoize range analysis results. Without this, some shaders appear
to get stuck in infinite loops.
v3: Rebase on many months of Mesa changes, including 1-bit Boolean
changes.
v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction".
v5: Use nir_alu_srcs_equal for detecting (a*a). Previously just the SSA
value was compared, and this incorrectly matched (a.x*a.y).
v6: Many code improvements including (but not limited to) better names,
more comments, and better use of helper functions. All suggested by
Caio. Rework the handling of several opcodes to use a table for mapping
source ranges to a result range. This change fixed a bug that caused
fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero.
Slightly tighten the range of fmul by recognizing that x*x is gt_zero if
x is gt_zero. Add similar handling for -x*x.
v7: Use _______ in the tables as an alias for unknown. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|