From ac5a06ff83c32ab14e01e526e729b2fbfe3a2426 Mon Sep 17 00:00:00 2001 From: Iago Toral Quiroga Date: Mon, 18 Jul 2016 13:43:00 +0200 Subject: i965/vec4: implement access to DF source components Z/W The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the the 16B offset into the register and use X/Y swizzles. The above, however, has the caveat that we can't do that without violating register region restrictions unless we probably do some sort of SIMD splitting. Alternatively, we can accomplish what we need without SIMD splitting by exploiting the gen7 hardware decompression bug for instructions with a vstride=0. For example, an instruction like this: mov(8) r2.x:DF r0.2<0>xyzw:DF Activates the hardware bug and produces this region: Component: x0 y0 z0 w0 x1 y1 z1 w1 Register: r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3 Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2 execution and r1.2 and r1.3 are the same for the second vertex. Using this to our advantage we can select r0.z:DF by doing r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing to split the instruction. Of course, this only works for gen7, but that is the only hardware platform were we implement align16/fp64 at the moment. v2: Adapted to the fact that we now do this after converting to hardware registers (Iago) Reviewed-by: Matt Turner --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 6d73bb2faec..cc0a76a7eb4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2267,7 +2267,28 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg, */ assert(brw_is_single_value_swizzle(reg.swizzle)); + /* To gain access to Z/W components we need to select the second half + * of the register and then use a X/Y swizzle to select Z/W respectively. + */ unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0); + + if (swizzle >= 2) { + *hw_reg = suboffset(*hw_reg, 2); + swizzle -= 2; + } + + /* Any 64-bit source with an offset at 16B is intended to address the + * second half of a register and needs a vertical stride of 0 so we: + * + * 1. Don't violate register region restrictions. + * 2. Activate the gen7 instruction decompresion bug exploit when + * execsize > 4 + */ + if (hw_reg->subnr % REG_SIZE == 16) { + assert(devinfo->gen == 7); + hw_reg->vstride = BRW_VERTICAL_STRIDE_0; + } + hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1, swizzle * 2, swizzle * 2 + 1); } -- cgit v1.2.3