diff options
author | Sergii Romantsov <[email protected]> | 2018-09-19 19:21:11 +0300 |
---|---|---|
committer | Jason Ekstrand <[email protected]> | 2018-10-16 13:20:51 -0500 |
commit | 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5 (patch) | |
tree | 7e00821118a401e3970d9dfe96ffd3672004dc13 | |
parent | 322a919a41f92f65f4621e565c94aa45a737bd03 (diff) |
anv/skylake: disable ForceThreadDispatchEnable
On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
-v2: enabling of ForceThreadDispatchEnable is only for gen8, for
gen9 and higher reverted enabling of PixelShaderHasUAV.
-v3 (Jason Ekstrand): Rework the comments a bit.
CC: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
-rw-r--r-- | src/intel/vulkan/genX_pipeline.c | 42 |
1 files changed, 35 insertions, 7 deletions
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 9595a7133ae..33f1f7832ac 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass, wm.EarlyDepthStencilControl = EDSC_NORMAL; } -#if GEN_GEN >= 8 - /* Gen8 hardware tries to compute ThreadDispatchEnable for us but - * doesn't take into account KillPixels when no depth or stencil - * writes are enabled. In order for occlusion queries to work - * correctly with no attachments, we need to force-enable PS thread - * dispatch. +#if GEN_GEN == 8 + /* Gen8 and later hardware tries to compute ThreadDispatchEnable for + * us but doesn't take into account KillPixels when no depth or + * stencil writes are enabled. In order for occlusion queries to + * work correctly with no attachments, we need to force-enable PS + * thread dispatch. * * The BDW docs are pretty clear that that this bit isn't validated * and probably shouldn't be used in production: @@ -1460,7 +1460,9 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass, * * Unfortunately, however, the other mechanism we have for doing this * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW. - * Given two bad options, we choose the one which works. + * Given two bad options, we choose the one which works. On Skylake + * and later, setting ForceThreadDispatchEnable causes GPU hangs so + * we use the PixelShaderHasUAV mechanism there. */ if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && !has_color_buffer_write_enabled(pipeline, blend)) @@ -1663,6 +1665,32 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline, wm_prog_data->uses_kill; #if GEN_GEN >= 9 + /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us + * but doesn't take into account KillPixels when no depth or stencil + * writes are enabled. In order for occlusion queries to work correctly + * with no attachments, we need to force-enable PS thread dispatch. + * + * The stricter cross-primitive coherency guarantees that the hardware + * gives us with the "Accesses UAV" bit set for at least one shader stage + * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are + * redundant within the current image, atomic counter and SSBO GL and + * Vulkan APIs, which all have very loose ordering and coherency + * requirements and generally rely on the application to insert explicit + * barriers when a shader invocation is expected to see the memory + * writes performed by the invocations of some previous primitive. + * Regardless of the value of "UAV coherency required", the "Accesses + * UAV" bits will implicitly cause an in most cases useless DC flush + * when the lowermost stage with the bit set finishes execution. + * + * Unfortunately, however, the other mechanism we have for doing this is + * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on + * Skylake and later hardware. On Broadwell, however, setting this bit + * causes GPU hangs so we use ForceThreadDispatchEnable there. + */ + if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && + !has_color_buffer_write_enabled(pipeline, blend)) + ps.PixelShaderHasUAV = true; + ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil; ps.PixelShaderPullsBary = wm_prog_data->pulls_bary; |