i965/fs: force pull model for 64-bit GS inputs

Triggering the push model when 64-bit inputs are involved is not easy due to the constrains on the maximum number of registers that we allow for this mode, however, for GS with 'points' primitive type and just a couple of double varyings we can trigger this and it just doesn't work because the implementation is not 64-bit aware at all. For now, let's make sure that we don't attempt this model whith 64-bit inputs and we always fall back to pull model for them. Also, don't enable the VUE handles in the thread payload on the fly when we find an input for which we need the pull model, this is not safe: if we need to resort to the pull model we need to account for that when we setup the thread payload so we compute the first non-payload register properly. If we didn't do that correctly and we enable it on-the-fly here then we will end up VUE handles on the first non-payload register which will probably lead to GPU hangs. Instead, always enable the VUE handles for the pull model so we can safely use them when needed. The GS is going to resort to pull model almost in every situation anyway, so this shouldn't make a significant difference and it makes things easier and safer. v2: Always enable the VUE handles for pull model, this is easier and safer and the GS is going to fallback to pull model almost always anyway (Ken) v3: Only clamp the URB read length if we are over the maximum reserved for push inputs as we were doing in the original code (Ken). v4: No need to clamp the urb read length if invocations > 1 Reviewed-by: Kenneth Graunke <[email protected]>
author: Iago Toral Quiroga <[email protected]> 2017-09-27 11:36:31 +0200
committer: Iago Toral Quiroga <[email protected]> 2017-09-29 08:18:25 +0200
commit: 47e527bd81f21cd18333fa7923d8ae8fd3a74d2c (patch)
tree: 65f1f2f88f1c8c1dbad02de21d8532bf7733f2da /src/intel
parent: 2df897cf1ffd5ae01ecdbb66195d292a2a15df91 (diff)
2 files changed, 15 insertions, 7 deletions
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index eb9b4c38909..e33cb0e1186 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5602,6 +5602,17 @@ fs_visitor::setup_gs_payload()
       payload.num_regs++;
    }
 
+   /* Always enable VUE handles so we can safely use pull model if needed.
+    *
+    * The push model for a GS uses a ton of register space even for trivial
+    * scenarios with just a few inputs, so just make things easier and a bit
+    * safer by always having pull model available.
+    */
+   gs_prog_data->base.include_vue_handles = true;
+
+   /* R3..RN: ICP Handles for each incoming vertex (when using pull model) */
+   payload.num_regs += nir->info.gs.vertices_in;
+
    /* Use a maximum of 24 registers for push-model inputs. */
    const unsigned max_push_components = 24;
 
@@ -5612,12 +5623,7 @@ fs_visitor::setup_gs_payload()
     * have to multiply by VerticesIn to obtain the total storage requirement.
     */
    if (8 * vue_prog_data->urb_read_length * nir->info.gs.vertices_in >
-       max_push_components || gs_prog_data->invocations > 1) {
-      gs_prog_data->base.include_vue_handles = true;
-
-      /* R3..RN: ICP Handles for each incoming vertex (when using pull model) */
-      payload.num_regs += nir->info.gs.vertices_in;
-
+       max_push_components) {
       vue_prog_data->urb_read_length =
          ROUND_DOWN_TO(max_push_components / nir->info.gs.vertices_in, 8) / 8;
    }
diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp
index d760946e624..aa57bb9d786 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1915,7 +1915,9 @@ fs_visitor::emit_gs_input_load(const fs_reg &dst,
    const unsigned push_reg_count = gs_prog_data->base.urb_read_length * 8;
 
    /* TODO: figure out push input layout for invocations == 1 */
+   /* TODO: make this work with 64-bit inputs */
    if (gs_prog_data->invocations == 1 &&
+       type_sz(dst.type) <= 4 &&
        offset_const != NULL && vertex_const != NULL &&
        4 * (base_offset + offset_const->u32[0]) < push_reg_count) {
       int imm_offset = (base_offset + offset_const->u32[0]) * 4 +
@@ -1928,7 +1930,7 @@ fs_visitor::emit_gs_input_load(const fs_reg &dst,
    }
 
    /* Resort to the pull model.  Ensure the VUE handles are provided. */
-   gs_prog_data->base.include_vue_handles = true;
+   assert(gs_prog_data->base.include_vue_handles);
 
    unsigned first_icp_handle = gs_prog_data->include_primitive_id ? 3 : 2;
    fs_reg icp_handle = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
author	Iago Toral Quiroga <[email protected]>	2017-09-27 11:36:31 +0200
committer	Iago Toral Quiroga <[email protected]>	2017-09-29 08:18:25 +0200
commit	47e527bd81f21cd18333fa7923d8ae8fd3a74d2c (patch)
tree	65f1f2f88f1c8c1dbad02de21d8532bf7733f2da /src/intel
parent	2df897cf1ffd5ae01ecdbb66195d292a2a15df91 (diff)