summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler/brw_fs_validate.cpp
diff options
context:
space:
mode:
authorJason Ekstrand <[email protected]>2019-05-09 14:44:16 -0500
committerJason Ekstrand <[email protected]>2019-05-14 12:30:22 -0500
commit621232694176ea83752505643b106c8d1c719893 (patch)
tree1d38d58f72eff779ea151b52ba4718b528b7ab99 /src/intel/compiler/brw_fs_validate.cpp
parent41b310e2196a89a2fdd05509f8160b207d0e4d9b (diff)
intel/fs: Stop doing extra RA calls
In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <[email protected]>
Diffstat (limited to 'src/intel/compiler/brw_fs_validate.cpp')
0 files changed, 0 insertions, 0 deletions