summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKenneth Graunke <[email protected]>2016-04-21 21:42:08 -0700
committerKenneth Graunke <[email protected]>2016-05-05 14:24:00 -0700
commitb593737ed8349b280fa29242c35f565b59ab3025 (patch)
tree4aaec0651dbd97f5b2a20902dc2eb548f5de92fa
parentbc0062c54abeab6ee2848315990066fde2ca4d97 (diff)
i965: Switch to scalar TCS by default.
Normally, we expect SIMD8 shaders to be more instructions than SIMD4x2 shaders, as it takes four instructions to operate on a vec4, rather than a single instruction. However, the benefit is that it can process 8 objects per shader thread instead of 2. Surprisingly, the shader-db statistics show an improvement in both instruction and cycle counts: Synmark: -31.25% instructions, -29.27% cycles, 0 hurt. Tessmark: -36.92% instructions, -37.81% cycles, 0 hurt. Unigine Heaven: -3.42% instructions, -17.95% cycles, 0 hurt. Shadow of Mordor: +13.24% instructions (26 with fewer instructions, 45 with more), -5.23% cycles (44 with fewer cycles, 27 with more cycles). Presumably, this is because the SIMD8 URB messages are a much more natural fit than the SIMD4x2 URB messages - there's a ton less header setup. I benchmarked Shadow of Mordor and Unigine Heaven on my Skylake GT3e, and the performance seems to be the same or increase ever so slightly (< 1 FPS difference). So I believe it's strictly superior. There's also a lot more optimization potential we can do in scalar mode. This will also help us finish fp64 support, as scalar support is going to land much sooner than vec4-mode support. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
-rw-r--r--src/mesa/drivers/dri/i965/brw_compiler.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c b/src/mesa/drivers/dri/i965/brw_compiler.c
index 9977d792679..0ea5e8bba40 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -153,7 +153,7 @@ brw_compiler_create(void *mem_ctx, const struct brw_device_info *devinfo)
compiler->scalar_stage[MESA_SHADER_VERTEX] =
devinfo->gen >= 8 && !(INTEL_DEBUG & DEBUG_VEC4VS);
compiler->scalar_stage[MESA_SHADER_TESS_CTRL] =
- devinfo->gen >= 8 && env_var_as_boolean("INTEL_SCALAR_TCS", false);
+ devinfo->gen >= 8 && env_var_as_boolean("INTEL_SCALAR_TCS", true);
compiler->scalar_stage[MESA_SHADER_TESS_EVAL] =
devinfo->gen >= 8 && env_var_as_boolean("INTEL_SCALAR_TES", true);
compiler->scalar_stage[MESA_SHADER_GEOMETRY] =