diff options
author | Ian Romanick <[email protected]> | 2019-07-10 16:28:38 -0700 |
---|---|---|
committer | Ian Romanick <[email protected]> | 2019-08-14 11:15:32 -0700 |
commit | 73aaeac0a3a46e441953ef6beab665af18180eec (patch) | |
tree | b080a6e8226c644ca97a60c778f4100a839b5d52 | |
parent | ff2225cf88d8773c599277362306a4bcf4ba6b01 (diff) |
nir/algebraic: Reassociate add-and-shift to be shift-and-add
A common thing in many shaders:
uniform vs { vec4 bones[...]; };
...
x = some_calculation(bones[i + 0]);
y = some_calculation(bones[i + 1]);
z = some_calculation(bones[i + 2]);
This turns into stuff like
vec1 32 ssa_12 = iadd ssa_11, ssa_0
vec1 32 ssa_13 = ishl ssa_12, ssa_3
vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
vec1 32 ssa_15 = iadd ssa_11, ssa_1
vec1 32 ssa_16 = ishl ssa_15, ssa_3
vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
vec1 32 ssa_18 = iadd ssa_11, ssa_2
vec1 32 ssa_19 = ishl ssa_18, ssa_3
vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)
By reassociating the shift and the add, we can reduce this to
vec1 32 ssa_12 = ishl ssa_11, ssa_3
vec1 32 ssa_13 = iadd ssa_12, ssa_0
vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
vec1 32 ssa_16 = iadd ssa_12, ssa_1
vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
vec1 32 ssa_19 = iadd ssa_12, ssa_2
vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)
v2: Add some commentary from Rhys Perry's nearly identical patch.
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16277758 -> 16250704 (-0.17%)
instructions in affected programs: 1440284 -> 1413230 (-1.88%)
helped: 4920
HURT: 6
helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4
helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79%
HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3
HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55%
95% mean confidence interval for instructions value: -5.67 -5.31
95% mean confidence interval for instructions %-change: -2.26% -2.16%
Instructions are helped.
total cycles in shared programs: 367118526 -> 365895358 (-0.33%)
cycles in affected programs: 93504145 -> 92280977 (-1.31%)
helped: 2754
HURT: 1269
helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16
helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12%
HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9
HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75%
95% mean confidence interval for cycles value: -387.31 -220.78
95% mean confidence interval for cycles %-change: -2.11% -1.68%
Cycles are helped.
LOST: 1
GAINED: 1
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
-rw-r--r-- | src/compiler/nir/nir_opt_algebraic.py | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index a3a244c41fb..bf5c69314c5 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -225,6 +225,11 @@ optimizations = [ # a * (#b << #c) (('ishl', ('imul', a, '#b'), '#c'), ('imul', a, ('ishl', b, c))), + # This is common for address calculations. Reassociating may enable the + # 'a<<c' to be CSE'd. It also helps architectures that have an ISHLADD + # instruction or a constant offset field for in load / store instructions. + (('ishl', ('iadd', a, '#b'), '#c'), ('iadd', ('ishl', a, c), ('ishl', b, c))), + # Comparison simplifications (('~inot', ('flt', a, b)), ('fge', a, b)), (('~inot', ('fge', a, b)), ('flt', a, b)), |