anv: Use on-the-fly surface states for dynamic buffer descriptors

We have a performance problem with dynamic buffer descriptors. Because we are currently implementing them by pushing an offset into the shader and adding that offset onto the already existing offset for the UBO/SSBO operation, all UBO/SSBO operations on dynamic descriptors are indirect. The back-end compiler implements indirect pull constant loads using what basically amounts to a texelFetch instruction. For pull constant loads with constant offsets, however, we use an oword block read message which goes through the constant cache and reads a whole cache line at a time. Because of these two things, direct pull constant loads are much faster than indirect pull constant loads. Because all loads from dynamically bound buffers are indirect, the user takes a substantial performance penalty when using this "performance" feature. There are two potential solutions I have seen for this problem. The alternate solution is to continue pushing offsets into the shader but wire things up in the back-end compiler so that we use the oword block read messages anyway. The only reason we can do this because we know a priori that the dynamic offsets are uniform and 16-byte aligned. Unfortunately, thanks to the 16-byte alignment requirement of the oword messages, we can't do some general "if the indirect offset is uniform, use an oword message" sort of thing. This solution, however, is recommended for a few of reasons: 1. Surface states are relatively cheap. We've been using on-the-fly surface state setup for some time in GL and it works well. Also, dynamic offsets with on-the-fly surface state should still be cheaper than allocating new descriptor sets every time you want to change a buffer offset which is really the only requirement of the dynamic offsets feature. 2. This requires substantially less compiler plumbing. Not only can we delete the entire apply_dynamic_offsets pass but we can also avoid having to add architecture for passing dynamic offsets to the back- end compiler in such a way that it can continue using oword messages. 3. We get robust buffer access range-checking for free. Because the offset and range are baked into the surface state, we no longer need to pass ranges around and do bounds-checking in the shader. 4. Once we finally get UBO pushing implemented, it will be much easier to handle pushing chunks of dynamic descriptors if the compiler remains blissfully unaware of dynamic descriptors. This commit improves performance of The Talos Principle on ULTRA settings by around 50% and brings it nicely into line with OpenGL performance. Reviewed-by: Lionel Landwerlin <[email protected]>
author: Jason Ekstrand <[email protected]> 2017-03-04 09:23:26 -0800
committer: Jason Ekstrand <[email protected]> 2017-03-13 07:58:00 -0700
commit: dd4db84640bbb694f180dd50850c3388f67228be (patch)
tree: 0d8c18b7c7bb21d2f331f4266b289ad4c80529af /src/intel/Makefile.sources
parent: 6b644e571e2344691e4d58ff0bba3ddc059c1a5d (diff)
1 files changed, 0 insertions, 1 deletions
diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 13375749ae3..4eaf380492f 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -176,7 +176,6 @@ VULKAN_FILES := \
 	vulkan/anv_image.c \
 	vulkan/anv_intel.c \
 	vulkan/anv_nir.h \
-	vulkan/anv_nir_apply_dynamic_offsets.c \
 	vulkan/anv_nir_apply_pipeline_layout.c \
 	vulkan/anv_nir_lower_input_attachments.c \
 	vulkan/anv_nir_lower_push_constants.c \
author	Jason Ekstrand <[email protected]>	2017-03-04 09:23:26 -0800
committer	Jason Ekstrand <[email protected]>	2017-03-13 07:58:00 -0700
commit	dd4db84640bbb694f180dd50850c3388f67228be (patch)
tree	0d8c18b7c7bb21d2f331f4266b289ad4c80529af /src/intel/Makefile.sources
parent	6b644e571e2344691e4d58ff0bba3ddc059c1a5d (diff)