diff options
author | Ian Romanick <[email protected]> | 2018-08-19 12:42:05 -0700 |
---|---|---|
committer | Ian Romanick <[email protected]> | 2019-05-06 22:52:29 -0700 |
commit | ab86926156053cbc78a2be995db6b1f01c44c6c0 (patch) | |
tree | 96728a97aff57ce9ac7678d36ab9f48871daeb78 /src/compiler/nir/nir_opt_algebraic.py | |
parent | c995d1ca3a3f14c2e6823ecdad90e7bb03e70c41 (diff) |
nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,
"Re-associating based on whether or not something has a constant
value of 1.0 seems a bit sneaky. I think it's well within the rules
but it seems like something that could bite you."
That is possibly true. The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as
fabs(c) approaches 0.
However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction. Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c). On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc. Gen6 seems to implement LRP as a+c(b-a). On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc. The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.
This patch just cuts out the middle man. Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.
A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch. This includes a few
shaders in ET:QW.
I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any. The helped /
hurt cycles was about even.
No changes on any other Intel platforms.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.
total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.
Reviewed-by: Matt Turner <[email protected]>
Diffstat (limited to 'src/compiler/nir/nir_opt_algebraic.py')
-rw-r--r-- | src/compiler/nir/nir_opt_algebraic.py | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 6379a399431..0d708d4fe1a 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -1107,6 +1107,9 @@ late_optimizations = [ (('b2f(is_used_more_than_once)', ('inot', 'a@1')), ('bcsel', a, 0.0, 1.0)), (('fneg(is_used_more_than_once)', ('b2f', ('inot', 'a@1'))), ('bcsel', a, -0.0, -1.0)), + (('~fadd@32', 1.0, ('fmul(is_used_once)', c , ('fadd', b, -1.0 ))), ('fadd', ('fadd', 1.0, ('fneg', c)), ('fmul', b, c)), 'options->lower_flrp32'), + (('~fadd@64', 1.0, ('fmul(is_used_once)', c , ('fadd', b, -1.0 ))), ('fadd', ('fadd', 1.0, ('fneg', c)), ('fmul', b, c)), 'options->lower_flrp64'), + # we do these late so that we don't get in the way of creating ffmas (('fmin', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmin', a, b))), (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmax', a, b))), |