intel/compiler: workaround for SIMD8 half-float MAD in gen8

Empirical testing shows that gen8 has a bug where MAD instructions with a half-float source starting at a non-zero offset fail to execute properly. This scenario usually happened in SIMD8 executions, where we used to pack vector components Y and W in the second half of SIMD registers (therefore, with a 16B offset). It looks like we are not currently doing this any more but this would handle the situation properly if we ever happen to produce code like this again. v2 (Jason): - Move this workaround to the lower_regioning pass as an additional case to has_invalid_src_region() - Do not apply the workaround if the stride of the source operand is 0, testing suggests the problem doesn't exist in that case. v3 (Jason): - We want offset % REG_SIZE > 0, not just offset > 0 - Use a helper to compute the offset Reviewed-by: Topi Pohjolainen <[email protected]> (v1)
author: Iago Toral Quiroga <[email protected]> 2019-01-21 12:11:44 +0100
committer: Juan A. Suarez Romero <[email protected]> 2019-04-18 11:05:18 +0200
commit: 0986199b31ab2a6086131887e474bc8f79fbc28d (patch)
tree: 1a70ea5493efd1d21c1e33f8240e5af857284c09 /src/intel/compiler
parent: aaae24179ff1007776d2f3a5a813f2c52dc83eba (diff)
1 files changed, 28 insertions, 11 deletions
diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp b/src/intel/compiler/brw_fs_lower_regioning.cpp
index c60d4700419..a76fd262a10 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -127,20 +127,37 @@ namespace {
    has_invalid_src_region(const gen_device_info *devinfo, const fs_inst *inst,
                           unsigned i)
    {
-      if (is_unordered(inst) || inst->is_control_source(i)) {
+      if (is_unordered(inst) || inst->is_control_source(i))
          return false;
-      } else {
-         const unsigned dst_byte_stride = inst->dst.stride * type_sz(inst->dst.type);
-         const unsigned src_byte_stride = inst->src[i].stride *
-            type_sz(inst->src[i].type);
-         const unsigned dst_byte_offset = reg_offset(inst->dst) % REG_SIZE;
-         const unsigned src_byte_offset = reg_offset(inst->src[i]) % REG_SIZE;
 
-         return has_dst_aligned_region_restriction(devinfo, inst) &&
-                !is_uniform(inst->src[i]) &&
-                (src_byte_stride != dst_byte_stride ||
-                 src_byte_offset != dst_byte_offset);
+      /* Empirical testing shows that Broadwell has a bug affecting half-float
+       * MAD instructions when any of its sources has a non-zero offset, such
+       * as:
+       *
+       * mad(8) g18<1>HF -g17<4,4,1>HF g14.8<4,4,1>HF g11<4,4,1>HF { align16 1Q };
+       *
+       * We used to generate code like this for SIMD8 executions where we
+       * used to pack components Y and W of a vector at offset 16B of a SIMD
+       * register. The problem doesn't occur if the stride of the source is 0.
+       */
+      if (devinfo->gen == 8 &&
+          inst->opcode == BRW_OPCODE_MAD &&
+          inst->src[i].type == BRW_REGISTER_TYPE_HF &&
+          reg_offset(inst->src[i]) % REG_SIZE > 0 &&
+          inst->src[i].stride != 0) {
+         return true;
       }
+
+      const unsigned dst_byte_stride = inst->dst.stride * type_sz(inst->dst.type);
+      const unsigned src_byte_stride = inst->src[i].stride *
+         type_sz(inst->src[i].type);
+      const unsigned dst_byte_offset = reg_offset(inst->dst) % REG_SIZE;
+      const unsigned src_byte_offset = reg_offset(inst->src[i]) % REG_SIZE;
+
+      return has_dst_aligned_region_restriction(devinfo, inst) &&
+             !is_uniform(inst->src[i]) &&
+             (src_byte_stride != dst_byte_stride ||
+              src_byte_offset != dst_byte_offset);
    }
 
    /*
author	Iago Toral Quiroga <[email protected]>	2019-01-21 12:11:44 +0100
committer	Juan A. Suarez Romero <[email protected]>	2019-04-18 11:05:18 +0200
commit	0986199b31ab2a6086131887e474bc8f79fbc28d (patch)
tree	1a70ea5493efd1d21c1e33f8240e5af857284c09 /src/intel/compiler
parent	aaae24179ff1007776d2f3a5a813f2c52dc83eba (diff)