diff options
author | Jason Ekstrand <[email protected]> | 2016-10-28 14:48:53 -0700 |
---|---|---|
committer | Jason Ekstrand <[email protected]> | 2016-10-28 17:11:16 -0700 |
commit | 2a4a86862c949055c71637429f6d5f2e725d07d8 (patch) | |
tree | 243e8cfae6d5b705987f452d3b5acb13f33d3b1a | |
parent | 4bf45a6079b5cc6b0360b637c0c7baa456b8257d (diff) |
i965/fs/generator: Don't use the address immediate for MOV_INDIRECT
The address immediate field is only 9 bits and, since the value is in
bytes, the highest GRF we can point to with it is g15. This makes it
pretty close to useless for MOV_INDIRECT. There were already piles of
restrictions preventing us from using it prior to Broadwell, so let's get
rid of the gen8+ code path entirely.
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97779
Cc: "12.0 13.0" <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 55 |
1 files changed, 27 insertions, 28 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index d7b5cc66584..c5b50e1f73e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -400,34 +400,33 @@ fs_generator::generate_mov_indirect(fs_inst *inst, indirect_byte_offset = retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW); - struct brw_reg ind_src; - if (devinfo->gen < 8) { - /* From the Haswell PRM section "Register Region Restrictions": - * - * "The lower bits of the AddressImmediate must not overflow to - * change the register address. The lower 5 bits of Address - * Immediate when added to lower 5 bits of address register gives - * the sub-register offset. The upper bits of Address Immediate - * when added to upper bits of address register gives the register - * address. Any overflow from sub-register offset is dropped." - * - * This restriction is only listed in the Haswell PRM but emperical - * testing indicates that it applies on all older generations and is - * lifted on Broadwell. - * - * Since the indirect may cause us to cross a register boundary, this - * makes the base offset almost useless. We could try and do - * something clever where we use a actual base offset if - * base_offset % 32 == 0 but that would mean we were generating - * different code depending on the base offset. Instead, for the - * sake of consistency, we'll just do the add ourselves. - */ - brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset)); - ind_src = brw_VxH_indirect(0, 0); - } else { - brw_MOV(p, addr, indirect_byte_offset); - ind_src = brw_VxH_indirect(0, imm_byte_offset); - } + /* There are a number of reasons why we don't use the base offset here. + * One reason is that the field is only 9 bits which means we can only + * use it to access the first 16 GRFs. Also, from the Haswell PRM + * section "Register Region Restrictions": + * + * "The lower bits of the AddressImmediate must not overflow to + * change the register address. The lower 5 bits of Address + * Immediate when added to lower 5 bits of address register gives + * the sub-register offset. The upper bits of Address Immediate + * when added to upper bits of address register gives the register + * address. Any overflow from sub-register offset is dropped." + * + * Since the indirect may cause us to cross a register boundary, this + * makes the base offset almost useless. We could try and do something + * clever where we use a actual base offset if base_offset % 32 == 0 but + * that would mean we were generating different code depending on the + * base offset. Instead, for the sake of consistency, we'll just do the + * add ourselves. This restriction is only listed in the Haswell PRM + * but empirical testing indicates that it applies on all older + * generations and is lifted on Broadwell. + * + * In the end, while base_offset is nice to look at in the generated + * code, using it saves us 0 instructions and would require quite a bit + * of case-by-case work. It's just not worth it. + */ + brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset)); + struct brw_reg ind_src = brw_VxH_indirect(0, 0); brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type)); |