diff options
author | Chris Wilson <[email protected]> | 2017-07-21 16:36:52 +0100 |
---|---|---|
committer | Kenneth Graunke <[email protected]> | 2017-08-04 10:26:37 -0700 |
commit | 6c530ad1160518d9f035da4aba5a9d4df7369972 (patch) | |
tree | 8b144bdd9332dacd3466ba17211ffe381ec7c84e /src/mesa/drivers/dri/i965/brw_compute.c | |
parent | 2aacd22c0b7935b40911593c3b01f0b8d12eddd4 (diff) |
i965: Reduce passing 2x32b of reloc_domains to 2 bits
The kernel only cares about whether the object is to be written to or
not, only reduces (reloc.read_domains, reloc.write_domain) down to just
!!reloc.write_domain. When we use NO_RELOC, the kernel doesn't even read
those relocs and instead userspace has to pass that information in the
execobject.flags. We can simplify our reloc api by also removing the
unused read/write domains and only pass the resultant flags.
The caveat to the above are when we need to make the kernel aware that
certain objects need to take into account different work arounds.
Previously, this was done using the magic (INSTRUCTION, INSTRUCTION)
reloc domains. NO_RELOC requires this to be passed in the execobject
flags as well, and now we push that up the callstack.
The API is more compact, more expressive of what happens underneath, but
unfortunately requires more knowledge of the system at the point of use.
Conversely it also means that knowledge is specific and not generally
applied and so not overused.
text data bss dec hex filename
8502991 356912 424944 9284847 8dacef lib/i965_dri.so (before)
8500455 356912 424944 9282311 8da307 lib/i965_dri.so (after)
v2: (by Ken) Rebase.
Reviewed-by: Kenneth Graunke <[email protected]>
Diffstat (limited to 'src/mesa/drivers/dri/i965/brw_compute.c')
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_compute.c | 24 |
1 files changed, 6 insertions, 18 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c index d6cb0161f40..ed22d712a67 100644 --- a/src/mesa/drivers/dri/i965/brw_compute.c +++ b/src/mesa/drivers/dri/i965/brw_compute.c @@ -40,15 +40,9 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw) GLintptr indirect_offset = brw->compute.num_work_groups_offset; struct brw_bo *bo = brw->compute.num_work_groups_bo; - brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo, - I915_GEM_DOMAIN_VERTEX, 0, - indirect_offset + 0); - brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo, - I915_GEM_DOMAIN_VERTEX, 0, - indirect_offset + 4); - brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo, - I915_GEM_DOMAIN_VERTEX, 0, - indirect_offset + 8); + brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo, indirect_offset + 0); + brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo, indirect_offset + 4); + brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo, indirect_offset + 8); if (brw->gen > 7) return; @@ -65,9 +59,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw) ADVANCE_BATCH(); /* Load compute_dispatch_indirect_x_size into SRC0 */ - brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, - I915_GEM_DOMAIN_INSTRUCTION, 0, - indirect_offset + 0); + brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 0); /* predicate = (compute_dispatch_indirect_x_size == 0); */ BEGIN_BATCH(1); @@ -78,9 +70,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw) ADVANCE_BATCH(); /* Load compute_dispatch_indirect_y_size into SRC0 */ - brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, - I915_GEM_DOMAIN_INSTRUCTION, 0, - indirect_offset + 4); + brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 4); /* predicate |= (compute_dispatch_indirect_y_size == 0); */ BEGIN_BATCH(1); @@ -91,9 +81,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw) ADVANCE_BATCH(); /* Load compute_dispatch_indirect_z_size into SRC0 */ - brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, - I915_GEM_DOMAIN_INSTRUCTION, 0, - indirect_offset + 8); + brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 8); /* predicate |= (compute_dispatch_indirect_z_size == 0); */ BEGIN_BATCH(1); |