i965/nir: do int64 lowering before optimization

Otherwise loop unrolling will fail to see the actual cost of the unrolling operations when the loop body contains 64-bit integer instructions, and very specially when the divmod64 lowering applies, since its lowering is quite expensive. Without this change, some in-development CTS tests for int64 get stuck forever trying to register allocate a shader with over 50K SSA values. The large number of SSA values is the result of NIR first unrolling multiple seemingly simple loops that involve int64 instructions, only to then lower these instructions to produce a massive pile of code (due to the divmod64 lowering in the unrolled instructions). With this change, loop unrolling will see the loops with the int64 code already lowered and will realize that it is too expensive to unroll. v2: Run nir_algebraic first so we can hopefully get rid of some of the int64 instructions before we even attempt to lower them. Reviewed-by: Matt Turner <[email protected]>
author: Iago Toral Quiroga <[email protected]> 2017-12-01 13:46:23 +0100
committer: Iago Toral Quiroga <[email protected]> 2018-02-06 07:49:27 +0100
commit: 1d20001d9711a7ea06f42292d3ed545d7ca0f50c (patch)
tree: b1ac5c33ce199a5695601cb27c5f887992bae87c /src/intel/compiler
parent: 02a6d901eee188492af54e98c92680a607b02bf8 (diff)
1 files changed, 12 insertions, 4 deletions
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index dbddef0d04d..287bd4908d9 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -634,6 +634,18 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
 
    OPT(nir_split_var_copies);
 
+   /* Run opt_algebraic before int64 lowering so we can hopefully get rid
+    * of some int64 instructions.
+    */
+   OPT(nir_opt_algebraic);
+
+   /* Lower int64 instructions before nir_optimize so that loop unrolling
+    * sees their actual cost.
+    */
+   nir_lower_int64(nir, nir_lower_imul64 |
+                        nir_lower_isign64 |
+                        nir_lower_divmod64);
+
    nir = brw_nir_optimize(nir, compiler, is_scalar);
 
    if (is_scalar) {
@@ -661,10 +673,6 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
       brw_nir_no_indirect_mask(compiler, nir->info.stage);
    nir_lower_indirect_derefs(nir, indirect_mask);
 
-   nir_lower_int64(nir, nir_lower_imul64 |
-                        nir_lower_isign64 |
-                        nir_lower_divmod64);
-
    /* Get rid of split copies */
    nir = brw_nir_optimize(nir, compiler, is_scalar);
author	Iago Toral Quiroga <[email protected]>	2017-12-01 13:46:23 +0100
committer	Iago Toral Quiroga <[email protected]>	2018-02-06 07:49:27 +0100
commit	1d20001d9711a7ea06f42292d3ed545d7ca0f50c (patch)
tree	b1ac5c33ce199a5695601cb27c5f887992bae87c /src/intel/compiler
parent	02a6d901eee188492af54e98c92680a607b02bf8 (diff)