From b2b621a0ec57f08586b9afcf666c0eadc0993ca0 Mon Sep 17 00:00:00 2001 From: Francisco Jerez Date: Mon, 8 Aug 2016 12:43:18 -0700 Subject: i965/fs: Switch to per-subspan discard jumps. ANY4H is more efficient than ANY8H and ANY16H because it makes sure that whenever a whole subspan hits a discard statement it gets disabled by the EU until the end of the program, regardless of whether the discard condition is uniform across all channels of the SIMD8-16 thread. OTOH ANY8H/ANY16H would cause the rest of the program to be executed for *all* channels if only one of the channels hadn't taken the discard branch, potentially increasing the bandwidth and ALU usage of the program unnecessarily. This change increases the FPS by over 3x of a simple micro-benchmark that discards a bunch of fragments and then does a single costly texturing operation. I've just re-verified the FPS change on HSW and SKL, but I expect all platforms from Gen6 up to get a similar benefit. Note that we could potentially be more aggressive and use the NORMAL predicate to discard individual channels, but that would need to happen post-scheduling because the scheduler currently doesn't care to reorder HALT instructions with respect to other instructions, and the NORMAL predicate would cause the results of subsequent derivative computations to become undefined -- If the scheduler didn't reorder HALT instructions it would actually be safe to switch to NORMAL because the behavior of derivative computations after a non-uniform discard statement is undefined by the GLSL spec, but that would make the optimization implemented by one of the following commits somewhat more difficult. Reviewed-by: Jason Ekstrand --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b89c6721ea0..c23b5588944 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1397,9 +1397,7 @@ fs_visitor::emit_discard_jump() fs_inst *discard_jump = bld.emit(FS_OPCODE_DISCARD_JUMP); discard_jump->flag_subreg = 1; - discard_jump->predicate = (dispatch_width == 8) - ? BRW_PREDICATE_ALIGN1_ANY8H - : BRW_PREDICATE_ALIGN1_ANY16H; + discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H; discard_jump->predicate_inverse = true; } -- cgit v1.2.3