From dd4db84640bbb694f180dd50850c3388f67228be Mon Sep 17 00:00:00 2001 From: Jason Ekstrand Date: Sat, 4 Mar 2017 09:23:26 -0800 Subject: anv: Use on-the-fly surface states for dynamic buffer descriptors We have a performance problem with dynamic buffer descriptors. Because we are currently implementing them by pushing an offset into the shader and adding that offset onto the already existing offset for the UBO/SSBO operation, all UBO/SSBO operations on dynamic descriptors are indirect. The back-end compiler implements indirect pull constant loads using what basically amounts to a texelFetch instruction. For pull constant loads with constant offsets, however, we use an oword block read message which goes through the constant cache and reads a whole cache line at a time. Because of these two things, direct pull constant loads are much faster than indirect pull constant loads. Because all loads from dynamically bound buffers are indirect, the user takes a substantial performance penalty when using this "performance" feature. There are two potential solutions I have seen for this problem. The alternate solution is to continue pushing offsets into the shader but wire things up in the back-end compiler so that we use the oword block read messages anyway. The only reason we can do this because we know a priori that the dynamic offsets are uniform and 16-byte aligned. Unfortunately, thanks to the 16-byte alignment requirement of the oword messages, we can't do some general "if the indirect offset is uniform, use an oword message" sort of thing. This solution, however, is recommended for a few of reasons: 1. Surface states are relatively cheap. We've been using on-the-fly surface state setup for some time in GL and it works well. Also, dynamic offsets with on-the-fly surface state should still be cheaper than allocating new descriptor sets every time you want to change a buffer offset which is really the only requirement of the dynamic offsets feature. 2. This requires substantially less compiler plumbing. Not only can we delete the entire apply_dynamic_offsets pass but we can also avoid having to add architecture for passing dynamic offsets to the back- end compiler in such a way that it can continue using oword messages. 3. We get robust buffer access range-checking for free. Because the offset and range are baked into the surface state, we no longer need to pass ranges around and do bounds-checking in the shader. 4. Once we finally get UBO pushing implemented, it will be much easier to handle pushing chunks of dynamic descriptors if the compiler remains blissfully unaware of dynamic descriptors. This commit improves performance of The Talos Principle on ULTRA settings by around 50% and brings it nicely into line with OpenGL performance. Reviewed-by: Lionel Landwerlin --- src/intel/vulkan/anv_pipeline.c | 6 ------ 1 file changed, 6 deletions(-) (limited to 'src/intel/vulkan/anv_pipeline.c') diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c index 6362c2db6b1..1e8e28dc3d3 100644 --- a/src/intel/vulkan/anv_pipeline.c +++ b/src/intel/vulkan/anv_pipeline.c @@ -352,9 +352,6 @@ anv_pipeline_compile(struct anv_pipeline *pipeline, prog_data->nr_params += MAX_PUSH_CONSTANTS_SIZE / sizeof(float); } - if (pipeline->layout && pipeline->layout->stage[stage].has_dynamic_offsets) - prog_data->nr_params += MAX_DYNAMIC_BUFFERS * 2; - if (nir->info->num_images > 0) { prog_data->nr_params += nir->info->num_images * BRW_IMAGE_PARAM_SIZE; pipeline->needs_data_cache = true; @@ -386,9 +383,6 @@ anv_pipeline_compile(struct anv_pipeline *pipeline, } } - /* Set up dynamic offsets */ - anv_nir_apply_dynamic_offsets(pipeline, nir, prog_data); - /* Apply the actual pipeline layout to UBOs, SSBOs, and textures */ if (pipeline->layout) anv_nir_apply_pipeline_layout(pipeline, nir, prog_data, map); -- cgit v1.2.3