diff options
author | Jason Ekstrand <[email protected]> | 2019-03-07 15:01:37 -0600 |
---|---|---|
committer | Dylan Baker <[email protected]> | 2019-03-12 09:27:29 -0700 |
commit | 8d43691b269f94f88e469ff5eb15fa9e00be47c6 (patch) | |
tree | 84fcddb24c39a60c683ef74c828d6c70a822dfe8 | |
parent | 47db151b9b8222fee45f5185018f01c51493f61a (diff) |
intel/nir: Vectorize all IO
The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one. To
alleviate this, we now vectorize the I/O once again. This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15224023 -> 15220871 (-0.02%)
instructions in affected programs: 342009 -> 338857 (-0.92%)
helped: 1236
HURT: 443
total spills in shared programs: 23471 -> 23465 (-0.03%)
spills in affected programs: 6 -> 0
helped: 1
HURT: 0
total fills in shared programs: 31770 -> 31766 (-0.01%)
fills in affected programs: 4 -> 0
helped: 1
HURT: 0
Cycles was just a lot of churn do to moves being different places. Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <[email protected]>
(cherry picked from commit 6d5d89d25a0a4299dbfcbfeca71b6c7e65ef3d45)
-rw-r--r-- | src/intel/compiler/brw_nir.c | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index bc1e2a5ef53..bc3abef709a 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -829,6 +829,23 @@ brw_nir_link_shaders(const struct brw_compiler *compiler, *producer = brw_nir_optimize(*producer, compiler, p_is_scalar, false); *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar, false); } + + NIR_PASS_V(*producer, nir_lower_io_to_vector, nir_var_shader_out); + NIR_PASS_V(*consumer, nir_lower_io_to_vector, nir_var_shader_in); + + if ((*producer)->info.stage != MESA_SHADER_TESS_CTRL) { + /* Calling lower_io_to_vector creates output variable writes with + * write-masks. On non-TCS outputs, the back-end can't handle it and we + * need to call nir_lower_io_to_temporaries to get rid of them. This, + * in turn, creates temporary variables and extra copy_deref intrinsics + * that we need to clean up. + */ + NIR_PASS_V(*producer, nir_lower_io_to_temporaries, + nir_shader_get_entrypoint(*producer), true, false); + NIR_PASS_V(*producer, nir_lower_global_vars_to_local); + NIR_PASS_V(*producer, nir_split_var_copies); + NIR_PASS_V(*producer, nir_lower_var_copies); + } } /* Prepare the given shader for codegen |