intel/fs: Stop doing extra RA calls

In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <[email protected]>
author: Jason Ekstrand <[email protected]> 2019-05-09 14:44:16 -0500
committer: Jason Ekstrand <[email protected]> 2019-05-14 12:30:22 -0500
commit: 621232694176ea83752505643b106c8d1c719893 (patch)
tree: 1d38d58f72eff779ea151b52ba4718b528b7ab99 /src/intel/compiler
parent: 41b310e2196a89a2fdd05509f8160b207d0e4d9b (diff)
1 files changed, 27 insertions, 19 deletions
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 66ee7605bea..f9fbffca7ce 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -7246,7 +7246,7 @@ fs_visitor::fixup_3src_null_dest()
 void
 fs_visitor::allocate_registers(unsigned min_dispatch_width, bool allow_spilling)
 {
-   bool allocated_without_spills;
+   bool allocated;
 
    static const enum instruction_scheduler_mode pre_modes[] = {
       SCHEDULE_PRE,
@@ -7265,15 +7265,28 @@ fs_visitor::allocate_registers(unsigned min_dispatch_width, bool allow_spilling)
 
       if (0) {
          assign_regs_trivial();
-         allocated_without_spills = true;
-      } else {
-         allocated_without_spills = assign_regs(false, spill_all);
+         allocated = true;
+         break;
       }
-      if (allocated_without_spills)
+
+      /* We only allow spilling for the last schedule mode and only if the
+       * allow_spilling parameter and dispatch width work out ok.
+       */
+      bool can_spill = allow_spilling &&
+                       (i == ARRAY_SIZE(pre_modes) - 1) &&
+                       dispatch_width == min_dispatch_width;
+
+      /* We should only spill registers on the last scheduling. */
+      assert(!spilled_any_registers);
+
+      do {
+         allocated = assign_regs(can_spill, spill_all);
+      } while (!allocated && can_spill && !failed);
+      if (allocated)
          break;
    }
 
-   if (!allocated_without_spills) {
+   if (!allocated) {
       if (!allow_spilling)
          fail("Failure to register allocate and spilling is not allowed.");
 
@@ -7284,21 +7297,16 @@ fs_visitor::allocate_registers(unsigned min_dispatch_width, bool allow_spilling)
       if (dispatch_width > min_dispatch_width) {
          fail("Failure to register allocate.  Reduce number of "
               "live scalar values to avoid this.");
-      } else {
-         compiler->shader_perf_log(log_data,
-                                   "%s shader triggered register spilling.  "
-                                   "Try reducing the number of live scalar "
-                                   "values to improve performance.\n",
-                                   stage_name);
       }
 
-      /* Since we're out of heuristics, just go spill registers until we
-       * get an allocation.
-       */
-      while (!assign_regs(true, spill_all)) {
-         if (failed)
-            break;
-      }
+      /* If we failed to allocate, we must have a reason */
+      assert(failed);
+   } else if (spilled_any_registers) {
+      compiler->shader_perf_log(log_data,
+                                "%s shader triggered register spilling.  "
+                                "Try reducing the number of live scalar "
+                                "values to improve performance.\n",
+                                stage_name);
    }
 
    /* This must come after all optimization and register allocation, since
author	Jason Ekstrand <[email protected]>	2019-05-09 14:44:16 -0500
committer	Jason Ekstrand <[email protected]>	2019-05-14 12:30:22 -0500
commit	621232694176ea83752505643b106c8d1c719893 (patch)
tree	1d38d58f72eff779ea151b52ba4718b528b7ab99 /src/intel/compiler
parent	41b310e2196a89a2fdd05509f8160b207d0e4d9b (diff)