r600: increase performance for DRI PRIME offloading if 2nd GPU is Evergreen+

This is a direct port of Marek Olšáks patch "radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI" to r600. It uses SDMA for the detiling blit from renderoffload VRAM to GTT, as SDMA is much faster for tiled->linear blits from VRAM to GTT. Testing on a dual Radeon HD-5770 setup reduced the time for the render offload gpu to get its rendering into system RAM from approximately 16 msecs for simple rendering at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup! This was measured using ftrace to trace the time the radeon kms driver waited on the dmabuf fence of the renderoffload gpu to complete. All in all this brought the time for a flip down from 20 msecs to 9 msecs, so the prime setup can display at full 60 fps instead of barely 30 fps vsync'ed. The current r600 implementation supports SDMA on Evergreen and later, but not R600/R700 due to some bugs apparently present in their SDMA implementation. Signed-off-by: Mario Kleiner <[email protected]> Cc: Marek Olšák <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
author: Mario Kleiner <[email protected]> 2016-08-26 18:59:05 +0200
committer: Marek Olšák <[email protected]> 2016-08-26 19:57:21 +0200
commit: 2cc880cba54d687a122298c8187ecc31b4a0ee2d (patch)
tree: cd188a909363e0aebcf22bd3e7954e4ad03861f1
parent: 7970238fcff37c2450aebaae76e84b5c446d1b46 (diff)
1 files changed, 19 insertions, 0 deletions
diff --git a/src/gallium/drivers/r600/r600_blit.c b/src/gallium/drivers/r600/r600_blit.c
index adf616e6881..8fdc51c253a 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -850,11 +850,30 @@ static void r600_blit(struct pipe_context *ctx,
                       const struct pipe_blit_info *info)
 {
 	struct r600_context *rctx = (struct r600_context*)ctx;
+	struct r600_texture *rdst = (struct r600_texture *)info->dst.resource;
 
 	if (do_hardware_msaa_resolve(ctx, info)) {
 		return;
 	}
 
+	/* Using SDMA for copying to a linear texture in GTT is much faster.
+	 * This improves DRI PRIME performance.
+	 *
+	 * resource_copy_region can't do this yet, because dma_copy calls it
+	 * on failure (recursion).
+	 */
+	if (rdst->surface.level[info->dst.level].mode ==
+	    RADEON_SURF_MODE_LINEAR_ALIGNED &&
+	    rctx->b.dma_copy &&
+	    util_can_blit_via_copy_region(info, false)) {
+		rctx->b.dma_copy(ctx, info->dst.resource, info->dst.level,
+				 info->dst.box.x, info->dst.box.y,
+				 info->dst.box.z,
+				 info->src.resource, info->src.level,
+				 &info->src.box);
+		return;
+	}
+
 	assert(util_blitter_is_blit_supported(rctx->blitter, info));
 
 	/* The driver doesn't decompress resources automatically while
author	Mario Kleiner <[email protected]>	2016-08-26 18:59:05 +0200
committer	Marek Olšák <[email protected]>	2016-08-26 19:57:21 +0200
commit	2cc880cba54d687a122298c8187ecc31b4a0ee2d (patch)
tree	cd188a909363e0aebcf22bd3e7954e4ad03861f1
parent	7970238fcff37c2450aebaae76e84b5c446d1b46 (diff)