anv/cmd_buffer: Rework aux tracking

This commit completely reworks aux tracking. This includes a number of somewhat distinct changes: 1) Since we are no longer fast-clearing multiple slices, we only need to track one fast clear color and one fast clear type. 2) We store two bits for fast clear instead of one to let us distinguish between zero and non-zero fast clear colors. This is needed so that we can do full resolves when transitioning to PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear values in all sorts of places we wouldn't normally. 3) We now track compression state as a boolean separate from fast clear type and this is tracked on a per-slice granularity. The previous scheme had some issues when it came to individual slices of a multi-LOD images. In particular, we only tracked "needs resolve" per-LOD but you could do a vkCmdPipelineBarrier that would only resolve a portion of the image and would set "needs resolve" to false anyway. Also, any transition from an undefined layout would reset the clear color for the entire LOD regardless of whether or not there was some clear color on some other slice. As far as full/partial resolves go, he assumptions of the previous scheme held because the one case where we do need a full resolve when CCS_E is enabled is for window-system images. Since we only ever allowed X-tiled window-system images, CCS was entirely disabled on gen9+ and we never got CCS_E. With the advent of Y-tiled window-system buffers, we now need to properly support doing a full resolve of images marked CCS_E. v2 (Jason Ekstrand): - Fix an bug in the compressed flag offset calculation - Treat 3D images as multi-slice for the purposes of resolve tracking v3 (Jason Ekstrand): - Set the compressed flag whenever we fast-clear - Simplify the resolve predicate computation logic Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
author: Jason Ekstrand <[email protected]> 2017-11-21 08:46:25 -0800
committer: Jason Ekstrand <[email protected]> 2018-02-08 16:35:31 -0800
commit: de3be6180169f95b781308398b31fbdd3db319e1 (patch)
tree: b34004cd9af2ae8aad718644f71e1c0148b66094 /src/intel/vulkan/anv_image.c
parent: 2cbfcb205ef777cb6e17ebca3ff658f9f2cb915f (diff)
1 files changed, 60 insertions, 44 deletions
diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index 11942d0f320..a297cc47320 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -190,46 +190,58 @@ all_formats_ccs_e_compatible(const struct gen_device_info *devinfo,
  * fast-clear values in non-trivial cases (e.g., outside of a render pass in
  * which a fast clear has occurred).
  *
- * For the purpose of discoverability, the algorithm used to manage this buffer
- * is described here. A clear value in this buffer is updated when a fast clear
- * is performed on a subresource. One of two synchronization operations is
- * performed in order for a following memory access to use the fast-clear
- * value:
- *    a. Copy the value from the buffer to the surface state object used for
- *       reading. This is done implicitly when the value is the clear value
- *       predetermined to be the default in other surface state objects. This
- *       is currently only done explicitly for the operation below.
- *    b. Do (a) and use the surface state object to resolve the subresource.
- *       This is only done during layout transitions for decent performance.
+ * In order to avoid having multiple clear colors for a single plane of an
+ * image (hence a single RENDER_SURFACE_STATE), we only allow fast-clears on
+ * the first slice (level 0, layer 0).  At the time of our testing (Jan 17,
+ * 2018), there were no known applications which would benefit from fast-
+ * clearing more than just the first slice.
  *
- * With the above scheme, we can fast-clear whenever the hardware allows except
- * for two cases in which synchronization becomes impossible or undesirable:
- *    * The subresource is in the GENERAL layout and is cleared to a value
- *      other than the special default value.
+ * The fast clear portion of the image is laid out in the following order:
  *
- *      Performing a synchronization operation in order to read from the
- *      subresource is undesirable in this case. Firstly, b) is not an option
- *      because a layout transition isn't required between a write and read of
- *      an image in the GENERAL layout. Secondly, it's undesirable to do a)
- *      explicitly because it would require large infrastructural changes. The
- *      Vulkan API supports us in deciding not to optimize this layout by
- *      stating that using this layout may cause suboptimal performance. NOTE:
- *      the auxiliary buffer must always be enabled to support a) implicitly.
+ *  * 1 or 4 dwords (depending on hardware generation) for the clear color
+ *  * 1 dword for the anv_fast_clear_type of the clear color
+ *  * On gen9+, 1 dword per level and layer of the image (3D levels count
+ *    multiple layers) in level-major order for compression state.
  *
+ * For the purpose of discoverability, the algorithm used to manage
+ * compression and fast-clears is described here:
  *
- *    * For the given miplevel, only some of the layers are cleared at once.
+ *  * On a transition from UNDEFINED or PREINITIALIZED to a defined layout,
+ *    all of the values in the fast clear portion of the image are initialized
+ *    to default values.
  *
- *      If the user clears each layer to a different value, then tries to
- *      render to multiple layers at once, we have no ability to perform a
- *      synchronization operation in between. a) is not helpful because the
- *      object can only hold one clear value. b) is not an option because a
- *      layout transition isn't required in this case.
+ *  * On fast-clear, the clear value is written into surface state and also
+ *    into the buffer and the fast clear type is set appropriately.  Both
+ *    setting the fast-clear value in the buffer and setting the fast-clear
+ *    type happen from the GPU using MI commands.
+ *
+ *  * Whenever a render or blorp operation is performed with CCS_E, we call
+ *    genX(cmd_buffer_mark_image_written) to set the compression state to
+ *    true (which is represented by UINT32_MAX).
+ *
+ *  * On pipeline barrier transitions, the worst-case transition is computed
+ *    from the image layouts.  The command streamer inspects the fast clear
+ *    type and compression state dwords and constructs a predicate.  The
+ *    worst-case resolve is performed with the given predicate and the fast
+ *    clear and compression state is set accordingly.
+ *
+ * See anv_layout_to_aux_usage and anv_layout_to_fast_clear_type functions for
+ * details on exactly what is allowed in what layouts.
+ *
+ * On gen7-9, we do not have a concept of indirect clear colors in hardware.
+ * In order to deal with this, we have to do some clear color management.
+ *
+ *  * For LOAD_OP_LOAD at the top of a renderpass, we have to copy the clear
+ *    value from the buffer into the surface state with MI commands.
+ *
+ *  * For any blorp operations, we pass the address to the clear value into
+ *    blorp and it knows to copy the clear color.
  */
 static void
-add_fast_clear_state_buffer(struct anv_image *image,
-                            VkImageAspectFlagBits aspect,
-                            uint32_t plane,
-                            const struct anv_device *device)
+add_aux_state_tracking_buffer(struct anv_image *image,
+                              VkImageAspectFlagBits aspect,
+                              uint32_t plane,
+                              const struct anv_device *device)
 {
    assert(image && device);
    assert(image->planes[plane].aux_surface.isl.size > 0 &&
@@ -251,20 +263,24 @@ add_fast_clear_state_buffer(struct anv_image *image,
              (image->planes[plane].offset + image->planes[plane].size));
    }
 
-   const unsigned entry_size = anv_fast_clear_state_entry_size(device);
-   /* There's no padding between entries, so ensure that they're always a
-    * multiple of 32 bits in order to enable GPU memcpy operations.
-    */
-   assert(entry_size % 4 == 0);
+   /* Clear color and fast clear type */
+   unsigned state_size = device->isl_dev.ss.clear_value_size + 4;
 
-   const unsigned plane_state_size =
-      entry_size * anv_image_aux_levels(image, aspect);
+   /* We only need to track compression on CCS_E surfaces. */
+   if (image->planes[plane].aux_usage == ISL_AUX_USAGE_CCS_E) {
+      if (image->type == VK_IMAGE_TYPE_3D) {
+         for (uint32_t l = 0; l < image->levels; l++)
+            state_size += anv_minify(image->extent.depth, l) * 4;
+      } else {
+         state_size += image->levels * image->array_size * 4;
+      }
+   }
 
    image->planes[plane].fast_clear_state_offset =
       image->planes[plane].offset + image->planes[plane].size;
 
-   image->planes[plane].size += plane_state_size;
-   image->size += plane_state_size;
+   image->planes[plane].size += state_size;
+   image->size += state_size;
 }
 
 /**
@@ -437,7 +453,7 @@ make_surface(const struct anv_device *dev,
             }
 
             add_surface(image, &image->planes[plane].aux_surface, plane);
-            add_fast_clear_state_buffer(image, aspect, plane, dev);
+            add_aux_state_tracking_buffer(image, aspect, plane, dev);
 
             /* For images created without MUTABLE_FORMAT_BIT set, we know that
              * they will always be used with the original format.  In
@@ -461,7 +477,7 @@ make_surface(const struct anv_device *dev,
                                  &image->planes[plane].aux_surface.isl);
       if (ok) {
          add_surface(image, &image->planes[plane].aux_surface, plane);
-         add_fast_clear_state_buffer(image, aspect, plane, dev);
+         add_aux_state_tracking_buffer(image, aspect, plane, dev);
          image->planes[plane].aux_usage = ISL_AUX_USAGE_MCS;
       }
    }
author	Jason Ekstrand <[email protected]>	2017-11-21 08:46:25 -0800
committer	Jason Ekstrand <[email protected]>	2018-02-08 16:35:31 -0800
commit	de3be6180169f95b781308398b31fbdd3db319e1 (patch)
tree	b34004cd9af2ae8aad718644f71e1c0148b66094 /src/intel/vulkan/anv_image.c
parent	2cbfcb205ef777cb6e17ebca3ff658f9f2cb915f (diff)