diff options
author | Caio Marcelo de Oliveira Filho <[email protected]> | 2019-03-27 15:07:59 -0700 |
---|---|---|
committer | Caio Marcelo de Oliveira Filho <[email protected]> | 2019-04-08 19:29:33 -0700 |
commit | 3ee3024804f9817dfa4f9ee4fa3d6b963a84c9cb (patch) | |
tree | 11abd7e67d12017067fe1f6eab7cb9b908c84135 /src/intel/compiler/brw_compiler.c | |
parent | ef0339d5ea645390dd2ab8b6c328311fc945025a (diff) |
intel/fs: Add support for CS to group invocations in quads
When using quads, instead of mapping the elements to the next 4 local
invocation indices, we map the two next in the "current" row and two
next in the "next row". A side effect is that a thread will execute
the indices in a different order.
We now perform the lowering of both local invocation ID and index
together -- and don't rely anymore on lowering done by
nir_lower_system_values. That is convenient when doing the math for
quads, because we need X and Y to get the right invocation index.
When the pass progresses, fold the constants and clean up to reduce
the noise from the indexing math.
This implements the derivative_group_quadsNV semantics from
NV_compute_shader_derivatives.
v2: Take subgroup_id into account, otherwise only values in the first
subgroup would be used. (Jason)
v3: Calculate invocation index and ID together, to avoid duplicating
some math in the quads case when both index and ID are used. (Jason)
v4: Don't call cleanup passes as part of the lowering, let that to the
call site. (Jason)
Change calculation to use less instructions. (Jason)
Reviewed-by: Ian Romanick <[email protected]> (v3)
Reviewed-by: Jason Ekstrand <[email protected]>
Diffstat (limited to 'src/intel/compiler/brw_compiler.c')
-rw-r--r-- | src/intel/compiler/brw_compiler.c | 1 |
1 files changed, 0 insertions, 1 deletions
diff --git a/src/intel/compiler/brw_compiler.c b/src/intel/compiler/brw_compiler.c index a3a0a393fad..d3f8c7ef1e0 100644 --- a/src/intel/compiler/brw_compiler.c +++ b/src/intel/compiler/brw_compiler.c @@ -45,7 +45,6 @@ .lower_flrp64 = true, \ .lower_isign = true, \ .lower_ldexp = true, \ - .lower_cs_local_id_from_index = true, \ .lower_device_index_to_zero = true, \ .native_integers = true, \ .use_interpolated_input_intrinsics = true, \ |