diff options
author | Jason Ekstrand <[email protected]> | 2019-05-09 10:01:06 -0500 |
---|---|---|
committer | Jason Ekstrand <[email protected]> | 2019-05-14 12:30:22 -0500 |
commit | 41b310e2196a89a2fdd05509f8160b207d0e4d9b (patch) | |
tree | 064d99bfb58f79b8b7f9109fbf0dfa35e6a3f11d /src/intel/compiler/brw_fs.cpp | |
parent | e1511f1d4c3949b107fda284ea9117632b9ef61d (diff) |
util/ra: Improve the performance of ra_simplify
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation. This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast. Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.
In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa. In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15311100 -> 15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 355468050 -> 355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)
Reviewed-by: Eric Anholt <[email protected]>
Diffstat (limited to 'src/intel/compiler/brw_fs.cpp')
0 files changed, 0 insertions, 0 deletions