diff options
author | Kenneth Graunke <[email protected]> | 2014-04-13 14:15:49 -0700 |
---|---|---|
committer | Kenneth Graunke <[email protected]> | 2014-04-15 02:15:11 -0700 |
commit | be000b4d1911d2d520ec7b2366403d2ae3cf8bdc (patch) | |
tree | 889e7d742d5259457f90ad0b2e77b456cdf59b8f /src/mesa | |
parent | 313104e8d58002ad00d297e1b229ecd984d79298 (diff) |
i965: Update comments about Z16 being slow.
We've learned a few things since we originally disabled Z16; this attempts
to summarize the issue. I am no expert on this subject, though, so the
comment may not be totally accurate.
I did some benchmarking on GM45 and Ironlake, and discovered that for
GLBenchmark 2.7 EgyptHD, using Z16 was 3% slower on GM45 (n=15), and
4.5% slower on Ironlake (n=95). So, we can drop the "on Ivybridge"
aspect of the comment - it's always slower.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
Diffstat (limited to 'src/mesa')
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_surface_formats.c | 17 |
1 files changed, 10 insertions, 7 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index cef4020d9ab..196f13930d0 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -620,13 +620,16 @@ brw_init_surface_formats(struct brw_context *brw) ctx->TextureFormatSupported[MESA_FORMAT_Z_FLOAT32] = true; ctx->TextureFormatSupported[MESA_FORMAT_Z32_FLOAT_S8X24_UINT] = true; - /* It appears that Z16 is slower than Z24 (on Intel Ivybridge and newer - * hardware at least), so there's no real reason to prefer it unless you're - * under memory (not memory bandwidth) pressure. Our speculation is that - * this is due to either increased fragment shader execution from - * GL_LEQUAL/GL_EQUAL depth tests at the reduced precision, or due to - * increased depth stalls from a cacheline-based heuristic for detecting - * depth stalls. + /* Benchmarking shows that Z16 is slower than Z24, so there's no reason to + * use it unless you're under memory (not memory bandwidth) pressure. + * + * Apparently, the GPU's depth scoreboarding works on a 32-bit granularity, + * which corresponds to one pixel in the depth buffer for Z24 or Z32 formats. + * However, it corresponds to two pixels with Z16, which means both need to + * hit the early depth case in order for it to happen. + * + * Other speculation is that we may be hitting increased fragment shader + * execution from GL_LEQUAL/GL_EQUAL depth tests at reduced precision. * * However, desktop GL 3.0+ require that you get exactly 16 bits when * asking for DEPTH_COMPONENT16, so we have to respect that. |