i965/nir: Run the ffma peephole after the rest of the optimizations

The idea here is that fusing multiply-add combinations too early can reduce our ability to perform CSE and value-numbering. Instead, we split ffma opcodes up-front, hope CSE cleans up, and then fuse after-the-fact. Unless an algebraic pass does something silly where it inserts something between the multiply and the add, splitting and re-fusing should never cause a problem. We run the late algebraic optimizations after this so that things like compare-with-zero don't hurt our ability to fuse things. shader-db results for fragment shaders on Haswell: total instructions in shared programs: 4390538 -> 4379236 (-0.26%) instructions in affected programs: 989359 -> 978057 (-1.14%) helped: 5308 HURT: 97 GAINED: 78 LOST: 5 This does, unfortunately, cause some substantial hurt to a shader in Kerbal Space Program. However, the damage is caused by changing a single instruction from a ffma to an add. This, in turn, *decreases* register pressure in one part of the program causing it to fail to register allocate and spill. Given the overwhelmingly positive results in other shaders and the fact that the NIR for the Kerbal shaders is actually better, this should be considered a positive. Reviewed-by: Matt Turner <[email protected]>
author: Jason Ekstrand <[email protected]> 2015-03-23 15:08:31 -0700
committer: Jason Ekstrand <[email protected]> 2015-04-01 12:51:04 -0700
commit: 37703040a142da6bc7c458479a70e35118e10e6b (patch)
tree: a374db9eb3199a20212d86b63ce8609ab1367499 /src
parent: 7f344721b1a94a6166b53f959ff6b159af3b5f9a (diff)
2 files changed, 11 insertions, 2 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
index ed6fdffd265..21c8bd331b5 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -558,6 +558,11 @@ brw_initialize_context_constants(struct brw_context *brw)
 
    static const nir_shader_compiler_options gen6_nir_options = {
       .native_integers = true,
+      /* In order to help allow for better CSE at the NIR level we tell NIR
+       * to split all ffma instructions during opt_algebraic and we then
+       * re-combine them as a later step.
+       */
+      .lower_ffma = true,
    };
 
    /* We want the GLSL compiler to emit code that uses condition codes */
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 4f4b74620fe..94641cf2ec1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -52,8 +52,6 @@ nir_optimize(nir_shader *nir)
       nir_validate_shader(nir);
       progress |= nir_opt_algebraic(nir);
       nir_validate_shader(nir);
-      progress |= nir_opt_peephole_ffma(nir);
-      nir_validate_shader(nir);
       progress |= nir_opt_constant_folding(nir);
       nir_validate_shader(nir);
       progress |= nir_opt_remove_phis(nir);
@@ -149,6 +147,12 @@ fs_visitor::emit_nir_code()
 
    nir_optimize(nir);
 
+   if (brw->gen >= 6) {
+      /* Try and fuse multiply-adds */
+      nir_opt_peephole_ffma(nir);
+      nir_validate_shader(nir);
+   }
+
    nir_opt_algebraic_late(nir);
    nir_validate_shader(nir);
author	Jason Ekstrand <[email protected]>	2015-03-23 15:08:31 -0700
committer	Jason Ekstrand <[email protected]>	2015-04-01 12:51:04 -0700
commit	37703040a142da6bc7c458479a70e35118e10e6b (patch)
tree	a374db9eb3199a20212d86b63ce8609ab1367499 /src
parent	7f344721b1a94a6166b53f959ff6b159af3b5f9a (diff)