From 3ee3024804f9817dfa4f9ee4fa3d6b963a84c9cb Mon Sep 17 00:00:00 2001 From: Caio Marcelo de Oliveira Filho Date: Wed, 27 Mar 2019 15:07:59 -0700 Subject: intel/fs: Add support for CS to group invocations in quads When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick (v3) Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs.cpp | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'src/intel/compiler/brw_fs.cpp') diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 0c2439d9daf..a637ee3422f 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -8017,6 +8017,11 @@ compile_cs_to_nir(const struct brw_compiler *compiler, nir_shader *shader = nir_shader_clone(mem_ctx, src_shader); shader = brw_nir_apply_sampler_key(shader, compiler, &key->tex, true); brw_nir_lower_cs_intrinsics(shader, dispatch_width); + + /* Clean up after the local index and ID calculations. */ + nir_opt_constant_folding(shader); + nir_opt_dce(shader); + return brw_postprocess_nir(shader, compiler, true); } -- cgit v1.2.3