i965: perf: flush batchbuffers at the beginning of queries

As Chris commented, it makes more sense to have batch buffer flushes before the query. Usually applications like frame_retrace do a series of queries and in that case, with flushes at the end of the queries, we might still have the first query contained in 2 different batchs. More generally it would be quite usual to have the query contained in 2 batch buffers because we never now what's the fill rate of the current batch buffer. If we move the flushing at the beginning of the queries, it's pretty much guaranteed that queries will be contained in a single batch buffer (unless the amount of commands is huge, but then it's only fair to include reloading request times in the measurements). Fixes: adafe4b733c02 ("i965: perf: minimize the chances to spread queries across batchbuffers") Reported-by: Chris Wilson <[email protected]> Signed-off-by: Lionel Landwerlin <[email protected]> Cc: "17.2 17.1" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 9f439ae1201cb049ffedb9b0e2d4f393fb0a761e)
author: Lionel Landwerlin <[email protected]> 2017-07-25 17:49:22 +0100
committer: Emil Velikov <[email protected]> 2017-08-03 00:19:06 +0100
commit: f28a9b2bf9d9de226d6e51c71b260e1bc0e62fc9 (patch)
tree: 22326b037df77ce1e5acdb991a7a21bcfa9c01c3
parent: 3a9c5afe136b6c64eda063a7360cb264983d83ec (diff)
1 files changed, 8 insertions, 0 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c b/src/mesa/drivers/dri/i965/brw_performance_query.c
index 95f112e99f0..2f49efae00a 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -1001,6 +1001,14 @@ brw_begin_perf_query(struct gl_context *ctx,
       obj->oa.begin_report_id = brw->perfquery.next_query_start_report_id;
       brw->perfquery.next_query_start_report_id += 2;
 
+      /* We flush the batchbuffer here to minimize the chances that MI_RPC
+       * delimiting commands end up in different batchbuffers. If that's the
+       * case, the measurement will include the time it takes for the kernel
+       * scheduler to load a new request into the hardware. This is manifested in
+       * tools like frameretrace by spikes in the "GPU Core Clocks" counter.
+       */
+      intel_batchbuffer_flush(brw);
+
       /* Take a starting OA counter snapshot. */
       emit_mi_report_perf_count(brw, obj->oa.bo, 0,
                                 obj->oa.begin_report_id);
author	Lionel Landwerlin <[email protected]>	2017-07-25 17:49:22 +0100
committer	Emil Velikov <[email protected]>	2017-08-03 00:19:06 +0100
commit	f28a9b2bf9d9de226d6e51c71b260e1bc0e62fc9 (patch)
tree	22326b037df77ce1e5acdb991a7a21bcfa9c01c3
parent	3a9c5afe136b6c64eda063a7360cb264983d83ec (diff)