nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))

All Gen6+ GPUs had similar results. (Skylake shown) total instructions in shared programs: 15336712 -> 15336622 (<.01%) instructions in affected programs: 3952 -> 3862 (-2.28%) helped: 24 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4 helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46% 95% mean confidence interval for instructions value: -4.06 -3.44 95% mean confidence interval for instructions %-change: -2.47% -2.22% Instructions are helped. total cycles in shared programs: 355722052 -> 355721235 (<.01%) cycles in affected programs: 27326 -> 26509 (-2.99%) helped: 20 HURT: 4 helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14 helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23% HURT stats (abs) min: 2 max: 64 x̄: 19.50 x̃: 6 HURT stats (rel) min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55% 95% mean confidence interval for cycles value: -61.61 -6.47 95% mean confidence interval for cycles %-change: -5.59% -0.39% Cycles are helped. No changes on Ice Lake, Iron Lake, or GM45. Reviewed-by: Matt Turner <[email protected]>
author: Ian Romanick <[email protected]> 2019-03-12 17:43:38 -0700
committer: Ian Romanick <[email protected]> 2019-05-14 11:38:21 -0700
commit: 3b747909419d35b4d23def90ba3c49a79b404170 (patch)
tree: 66aff7c020f2a2e8eef34142aff6634bb0b957c8 /src/compiler
parent: a79570099bddecbd927b87c43e51b9a2c77c4833 (diff)
1 files changed, 12 insertions, 3 deletions
diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
index 837ac5fd6bf..bdf432be09d 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -147,9 +147,18 @@ optimizations = [
    (('fadd', a, ('fneg', ('ffract', a))), ('ffloor', a), '!options->lower_ffloor'),
    (('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
    (('fceil', a), ('fneg', ('ffloor', ('fneg', a))), 'options->lower_fceil'),
-   (('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', 'c@1')))), ('fmul', b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
-   (('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg',         c ))), ('fmul', b,         c )), ('flrp', a, b, c), '!options->lower_flrp32'),
-   (('~fadd@64', ('fmul', a, ('fadd', 1.0, ('fneg',         c ))), ('fmul', b,         c )), ('flrp', a, b, c), '!options->lower_flrp64'),
+   (('~fadd',    ('fmul', a,          ('fadd', 1.0, ('fneg', ('b2f', 'c@1')))), ('fmul', b, ('b2f',  c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
+   (('~fadd@32', ('fmul', a,          ('fadd', 1.0, ('fneg',          c   ) )), ('fmul', b,          c )), ('flrp', a, b, c), '!options->lower_flrp32'),
+   (('~fadd@64', ('fmul', a,          ('fadd', 1.0, ('fneg',          c   ) )), ('fmul', b,          c )), ('flrp', a, b, c), '!options->lower_flrp64'),
+   # These are the same as the previous three rules, but it depends on
+   # 1-fsat(x) <=> fsat(1-x):
+   #
+   # If x >= 0 and x <= 1: fsat(1 - x) == 1 - fsat(x) trivially
+   # If x < 0: 1 - fsat(x) => 1 - 0 => 1 and fsat(1 - x) => fsat(> 1) => 1
+   # If x > 1: 1 - fsat(x) => 1 - 1 => 0 and fsat(1 - x) => fsat(< 0) => 0
+   (('~fadd@32', ('fmul', a, ('fsat', ('fadd', 1.0, ('fneg',          c   )))), ('fmul', b, ('fsat', c))), ('flrp', a, b, ('fsat', c)), '!options->lower_flrp32'),
+   (('~fadd@64', ('fmul', a, ('fsat', ('fadd', 1.0, ('fneg',          c   )))), ('fmul', b, ('fsat', c))), ('flrp', a, b, ('fsat', c)), '!options->lower_flrp64'),
+
    (('~fadd', a, ('fmul', ('b2f', 'c@1'), ('fadd', b, ('fneg', a)))), ('bcsel', c, b, a), 'options->lower_flrp32'),
    (('~fadd@32', a, ('fmul',         c , ('fadd', b, ('fneg', a)))), ('flrp', a, b, c), '!options->lower_flrp32'),
    (('~fadd@64', a, ('fmul',         c , ('fadd', b, ('fneg', a)))), ('flrp', a, b, c), '!options->lower_flrp64'),
author	Ian Romanick <[email protected]>	2019-03-12 17:43:38 -0700
committer	Ian Romanick <[email protected]>	2019-05-14 11:38:21 -0700
commit	3b747909419d35b4d23def90ba3c49a79b404170 (patch)
tree	66aff7c020f2a2e8eef34142aff6634bb0b957c8 /src/compiler
parent	a79570099bddecbd927b87c43e51b9a2c77c4833 (diff)