intel/nir: Vectorize all IO

The IO scalarization pass that we run to help with linking end up turning some shader I/O such as that for tessellation and geometry shaders into many scalar URB operations rather than one vector one. To alleviate this, we now vectorize the I/O once again. This fixes a 10% performance regression in the GfxBench tessellation test that was caused by scalarizing. Shader-db results on Kaby Lake: total instructions in shared programs: 15224023 -> 15220871 (-0.02%) instructions in affected programs: 342009 -> 338857 (-0.92%) helped: 1236 HURT: 443 total spills in shared programs: 23471 -> 23465 (-0.03%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 31770 -> 31766 (-0.01%) fills in affected programs: 4 -> 0 helped: 1 HURT: 0 Cycles was just a lot of churn do to moves being different places. Most of the pure churn in instructions was +/- one or two instructions in fragment shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early" Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies" Reviewed-by: Connor Abbott <[email protected]>
author: Jason Ekstrand <[email protected]> 2019-03-07 15:01:37 -0600
committer: Jason Ekstrand <[email protected]> 2019-03-12 15:34:06 +0000
commit: 6d5d89d25a0a4299dbfcbfeca71b6c7e65ef3d45 (patch)
tree: 47ae49a676ffc842f2fc0139a55720acf423a61e /src/intel
parent: 5ef2b8f1f2ebcdb4ffe5c98b3f4f48e584cb4b22 (diff)
1 files changed, 17 insertions, 0 deletions
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index ae239de6117..ba90a8392dc 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -788,6 +788,23 @@ brw_nir_link_shaders(const struct brw_compiler *compiler,
       *producer = brw_nir_optimize(*producer, compiler, p_is_scalar, false);
       *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar, false);
    }
+
+   NIR_PASS_V(*producer, nir_lower_io_to_vector, nir_var_shader_out);
+   NIR_PASS_V(*consumer, nir_lower_io_to_vector, nir_var_shader_in);
+
+   if ((*producer)->info.stage != MESA_SHADER_TESS_CTRL) {
+      /* Calling lower_io_to_vector creates output variable writes with
+       * write-masks.  On non-TCS outputs, the back-end can't handle it and we
+       * need to call nir_lower_io_to_temporaries to get rid of them.  This,
+       * in turn, creates temporary variables and extra copy_deref intrinsics
+       * that we need to clean up.
+       */
+      NIR_PASS_V(*producer, nir_lower_io_to_temporaries,
+                 nir_shader_get_entrypoint(*producer), true, false);
+      NIR_PASS_V(*producer, nir_lower_global_vars_to_local);
+      NIR_PASS_V(*producer, nir_split_var_copies);
+      NIR_PASS_V(*producer, nir_lower_var_copies);
+   }
 }
 
 /* Prepare the given shader for codegen
author	Jason Ekstrand <[email protected]>	2019-03-07 15:01:37 -0600
committer	Jason Ekstrand <[email protected]>	2019-03-12 15:34:06 +0000
commit	6d5d89d25a0a4299dbfcbfeca71b6c7e65ef3d45 (patch)
tree	47ae49a676ffc842f2fc0139a55720acf423a61e /src/intel
parent	5ef2b8f1f2ebcdb4ffe5c98b3f4f48e584cb4b22 (diff)