diff options
author | Kenneth Graunke <[email protected]> | 2016-04-21 21:42:08 -0700 |
---|---|---|
committer | Kenneth Graunke <[email protected]> | 2016-05-05 14:24:00 -0700 |
commit | b593737ed8349b280fa29242c35f565b59ab3025 (patch) | |
tree | 4aaec0651dbd97f5b2a20902dc2eb548f5de92fa /src/intel/genxml | |
parent | bc0062c54abeab6ee2848315990066fde2ca4d97 (diff) |
i965: Switch to scalar TCS by default.
Normally, we expect SIMD8 shaders to be more instructions than SIMD4x2
shaders, as it takes four instructions to operate on a vec4, rather than
a single instruction. However, the benefit is that it can process 8
objects per shader thread instead of 2.
Surprisingly, the shader-db statistics show an improvement in both
instruction and cycle counts:
Synmark: -31.25% instructions, -29.27% cycles, 0 hurt.
Tessmark: -36.92% instructions, -37.81% cycles, 0 hurt.
Unigine Heaven: -3.42% instructions, -17.95% cycles, 0 hurt.
Shadow of Mordor:
+13.24% instructions (26 with fewer instructions, 45 with more),
-5.23% cycles (44 with fewer cycles, 27 with more cycles).
Presumably, this is because the SIMD8 URB messages are a much more
natural fit than the SIMD4x2 URB messages - there's a ton less header
setup.
I benchmarked Shadow of Mordor and Unigine Heaven on my Skylake GT3e,
and the performance seems to be the same or increase ever so slightly
(< 1 FPS difference). So I believe it's strictly superior.
There's also a lot more optimization potential we can do in scalar mode.
This will also help us finish fp64 support, as scalar support is going
to land much sooner than vec4-mode support.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Diffstat (limited to 'src/intel/genxml')
0 files changed, 0 insertions, 0 deletions