diff options
author | Jason Ekstrand <[email protected]> | 2016-12-09 22:32:08 -0800 |
---|---|---|
committer | Jason Ekstrand <[email protected]> | 2017-01-06 16:44:29 -0800 |
commit | 45912fb908f7a1d2efbce0f1dbe81e5bc975fbe1 (patch) | |
tree | f29612fecd38e068a9dee0a8260d33b7629f54b6 /m4/ax_prog_flex.m4 | |
parent | 62332d139c8f6deb7fd8b72a48b34b4b652df7c1 (diff) |
i965/compiler: Use the new nir_opt_copy_prop_vars pass
We run this after nir_lower_vars_to_ssa so that as many load/store_var
intrinsics as possible before copy_prop_vars executes. This is because the
pass isn't particularly efficient (it does a lot of linear walks of a
linked list) so we'd like as much of the work as possible to be done before
copy_prop_vars runs.
Shader DB results on Sky Lake:
total instructions in shared programs: 12020290 -> 12013627 (-0.06%)
instructions in affected programs: 26033 -> 19370 (-25.59%)
helped: 16
HURT: 13
total cycles in shared programs: 137772848 -> 137549012 (-0.16%)
cycles in affected programs: 6955660 -> 6731824 (-3.22%)
helped: 217
HURT: 237
total loops in shared programs: 3208 -> 3208 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 4112 -> 4057 (-1.34%)
spills in affected programs: 483 -> 428 (-11.39%)
helped: 2
HURT: 0
total fills in shared programs: 5519 -> 5102 (-7.56%)
fills in affected programs: 993 -> 576 (-41.99%)
helped: 2
HURT: 0
LOST: 0
GAINED: 0
Broadwell had similar results. On older hardware, the impact isn't as
large because they don't advertise GL 4.5. Of the hurt programs, all but
one are hurt by a single instruction and the one is hurt by 3 instructions.
All of the helped programs, on the other hand, are helped by at least 3
instructions and one kerbal space program shader is helped by 44.59%.
The real star of the show, however, is the Gl43CSDof synmark2 benchmark
which has two shaders which are cut by 28% and 40% and the over-all runtime
performance of the benchmark on my Sky Lake laptop is improved by around
25-30% (it's a bit hard to be exact due to thermal throttling).
Reviewed-by: Timothy Arceri <[email protected]>
Diffstat (limited to 'm4/ax_prog_flex.m4')
0 files changed, 0 insertions, 0 deletions