i965/vec4: implement access to DF source components Z/W

The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the the 16B offset into the register and use X/Y swizzles. The above, however, has the caveat that we can't do that without violating register region restrictions unless we probably do some sort of SIMD splitting. Alternatively, we can accomplish what we need without SIMD splitting by exploiting the gen7 hardware decompression bug for instructions with a vstride=0. For example, an instruction like this: mov(8) r2.x:DF r0.2<0>xyzw:DF Activates the hardware bug and produces this region: Component: x0 y0 z0 w0 x1 y1 z1 w1 Register: r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3 Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2 execution and r1.2 and r1.3 are the same for the second vertex. Using this to our advantage we can select r0.z:DF by doing r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing to split the instruction. Of course, this only works for gen7, but that is the only hardware platform were we implement align16/fp64 at the moment. v2: Adapted to the fact that we now do this after converting to hardware registers (Iago) Reviewed-by: Matt Turner <[email protected]>
author: Iago Toral Quiroga <[email protected]> 2016-07-18 13:43:00 +0200
committer: Samuel Iglesias Gonsálvez <[email protected]> 2017-01-03 11:26:51 +0100
commit: ac5a06ff83c32ab14e01e526e729b2fbfe3a2426 (patch)
tree: 4be33bc5cce899fd1992a4a2af1398173eba7837
parent: e238601a2da9512c0fd263e8378f30498a0a1507 (diff)
1 files changed, 21 insertions, 0 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 6d73bb2faec..cc0a76a7eb4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2267,7 +2267,28 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg,
     */
    assert(brw_is_single_value_swizzle(reg.swizzle));
 
+   /* To gain access to Z/W components we need to select the second half
+    * of the register and then use a X/Y swizzle to select Z/W respectively.
+    */
    unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0);
+
+   if (swizzle >= 2) {
+      *hw_reg = suboffset(*hw_reg, 2);
+      swizzle -= 2;
+   }
+
+   /* Any 64-bit source with an offset at 16B is intended to address the
+    * second half of a register and needs a vertical stride of 0 so we:
+    *
+    * 1. Don't violate register region restrictions.
+    * 2. Activate the gen7 instruction decompresion bug exploit when
+    *    execsize > 4
+    */
+   if (hw_reg->subnr % REG_SIZE == 16) {
+      assert(devinfo->gen == 7);
+      hw_reg->vstride = BRW_VERTICAL_STRIDE_0;
+   }
+
    hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1,
                                   swizzle * 2, swizzle * 2 + 1);
 }
author	Iago Toral Quiroga <[email protected]>	2016-07-18 13:43:00 +0200
committer	Samuel Iglesias Gonsálvez <[email protected]>	2017-01-03 11:26:51 +0100
commit	ac5a06ff83c32ab14e01e526e729b2fbfe3a2426 (patch)
tree	4be33bc5cce899fd1992a4a2af1398173eba7837
parent	e238601a2da9512c0fd263e8378f30498a0a1507 (diff)