intel/compiler: Properly consider UBO loads that cross 32B boundaries.

The UBO push analysis pass incorrectly assumed that all values would fit within a 32B chunk, and only recorded a bit for the 32B chunk containing the starting offset. For example, if a UBO contained the following, tightly packed: vec4 a; // [0, 16) float b; // [16, 20) vec4 c; // [20, 36) then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1, which means that we ought to record two 32B chunks in the bitfield. Similarly, dvec4s would suffer from the same problem. v2: Rewrite the accounting, my calculations were wrong. v3: Write a comment about partial values (requested by Jason). Reviewed-by: Rafael Antognolli <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]> [v3]
author: Kenneth Graunke <[email protected]> 2018-06-08 14:24:16 -0700
committer: Kenneth Graunke <[email protected]> 2018-06-14 14:58:59 -0700
commit: f6898f2b554e88255909bdd6bd0f7a91d99c446b (patch)
tree: cbead1bd0a96a88aeee807b09e7f2b90ea2eef51 /src/intel/compiler
parent: 37bd9ccd21b860d2b5ffea7e1f472ec83b68b43b (diff)
1 files changed, 14 insertions, 2 deletions
diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
index d58fe3dd2e3..cd5137da06e 100644
--- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
+++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
@@ -137,14 +137,26 @@ analyze_ubos_block(struct ubo_analysis_state *state, nir_block *block)
          const int block = block_const->u32[0];
          const int offset = offset_const->u32[0] / 32;
 
-         /* Won't fit in our bitfield */
+         /* Avoid shifting by larger than the width of our bitfield, as this
+          * is undefined in C.  Even if we require multiple bits to represent
+          * the entire value, it's OK to record a partial value - the backend
+          * is capable of falling back to pull loads for later components of
+          * vectors, as it has to shrink ranges for other reasons anyway.
+          */
          if (offset >= 64)
             continue;
 
+         /* The value might span multiple 32-byte chunks. */
+         const int bytes = nir_intrinsic_dest_components(intrin) *
+                           (nir_dest_bit_size(intrin->dest) / 8);
+         const int start = ROUND_DOWN_TO(offset_const->u32[0], 32);
+         const int end = ALIGN(offset_const->u32[0] + bytes, 32);
+         const int chunks = (end - start) / 32;
+
          /* TODO: should we count uses in loops as higher benefit? */
 
          struct ubo_block_info *info = get_block_info(state, block);
-         info->offsets |= 1ull << offset;
+         info->offsets |= ((1ull << chunks) - 1) << offset;
          info->uses[offset]++;
       }
    }
author	Kenneth Graunke <[email protected]>	2018-06-08 14:24:16 -0700
committer	Kenneth Graunke <[email protected]>	2018-06-14 14:58:59 -0700
commit	f6898f2b554e88255909bdd6bd0f7a91d99c446b (patch)
tree	cbead1bd0a96a88aeee807b09e7f2b90ea2eef51 /src/intel/compiler
parent	37bd9ccd21b860d2b5ffea7e1f472ec83b68b43b (diff)