intel/fs: Implement extended strides greater than 4 for IR source regions.

Strides up to 32B can be implemented for the source regions of most instructions by leveraging either the vertical or the horizontal stride of the hardware Align1 region. The main motivation for this is that currently the lower_integer_multiplication() pass will happily double the stride of one of the 32-bit sources, which can blow up if the stride of the original source was already the maximum value allowed by the hardware. An alternative would be to use the regioning legalization pass in order to lower such strides into the composition of multiple legal strides, but that would be somewhat less efficient. This showed up as a regression from my commit cbea91eb57a501bebb1ca2 in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a pre-existing problem that had affected conformance on other platforms without native support for integer multiplication. CHV/BXT were getting around it because the code I removed in that commit had the "fortunate" side effect of emitting narrower regions that didn't hit the hardware stride limit after lowering. Beyond fixing the regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on ICL (that's why this patch is marked for inclusion in mesa-stable even though the original regressing patch was not). According to Jason, a nearly equivalent change had been committed previously as e8c9e65185de3e821e1 and then (mistakenly?) reverted as a31d0382084c8aa8. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <[email protected]> Tested-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
author: Francisco Jerez <[email protected]> 2019-01-18 12:51:57 -0800
committer: Francisco Jerez <[email protected]> 2019-02-21 14:07:25 -0800
commit: e03be78252afa8f1033b0824eff8d48df4fd6727 (patch)
tree: 2e8626006ab489596255873200680728946b6142 /src/intel
parent: 7f9f6263c1ecc3ab9e1984f5a9814f6820eb85cf (diff)
1 files changed, 10 insertions, 3 deletions
diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp
index 64872fc9487..1b9ef490502 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -90,9 +90,16 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
           *       different execution size when the number of components
           *       written to each destination GRF is not the same.
           */
-         const unsigned width = MIN2(reg_width, phys_width);
-         brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
-         brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+         if (reg->stride > 4) {
+            assert(reg != &inst->dst);
+            assert(reg->stride * type_sz(reg->type) <= REG_SIZE);
+            brw_reg = brw_vecn_reg(1, brw_file_from_reg(reg), reg->nr, 0);
+            brw_reg = stride(brw_reg, reg->stride, 1, 0);
+         } else {
+            const unsigned width = MIN2(reg_width, phys_width);
+            brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
+            brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+         }
 
          if (devinfo->gen == 7 && !devinfo->is_haswell) {
             /* From the IvyBridge PRM (EU Changes by Processor Generation, page 13):
author	Francisco Jerez <[email protected]>	2019-01-18 12:51:57 -0800
committer	Francisco Jerez <[email protected]>	2019-02-21 14:07:25 -0800
commit	e03be78252afa8f1033b0824eff8d48df4fd6727 (patch)
tree	2e8626006ab489596255873200680728946b6142 /src/intel
parent	7f9f6263c1ecc3ab9e1984f5a9814f6820eb85cf (diff)