summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler/brw_fs.cpp
diff options
context:
space:
mode:
authorJason Ekstrand <[email protected]>2019-05-09 10:01:06 -0500
committerJason Ekstrand <[email protected]>2019-05-14 12:30:22 -0500
commit41b310e2196a89a2fdd05509f8160b207d0e4d9b (patch)
tree064d99bfb58f79b8b7f9109fbf0dfa35e6a3f11d /src/intel/compiler/brw_fs.cpp
parente1511f1d4c3949b107fda284ea9117632b9ef61d (diff)
util/ra: Improve the performance of ra_simplify
The most expensive part of register allocation is the ra_simplify step which is a fixed-point algorithm with a worst-case complexity of O(n^2) which adds the registers to a stack which we then use later to do the actual allocation. This commit uses bit sets and changes the core loop of ra_simplify to first walk 32-node chunks and then walk each chunk. This lets us skip whole 32-node chunks in one go based on bit operations and compute the minimum q value potentially 32x as fast. Of course, the algorithm still has the same fundamental O(n^2) run-time but the constant is now much lower. In the nasty Aztec Ruins compute shader, this shaves a full four seconds off the 30s compile time for a release build of mesa. In a debug build (needed for accurate stack traces), perf says that ra_select takes 20% of runtime before this patch and only 5-6% of runtime after this patch. It also makes shader-db runs faster. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%) Reviewed-by: Eric Anholt <[email protected]>
Diffstat (limited to 'src/intel/compiler/brw_fs.cpp')
0 files changed, 0 insertions, 0 deletions