aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen6_queryobj.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Roll intel_reg.h into brw_defines.hJason Ekstrand2016-08-191-1/+0
| | | | | | | | More than half of the stuff in intel_reg.h had nothing whatsoever to do with registers and really belongs in brw_defines.h anyway. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: fix ignored qualifiers warningFrancesco Ansanelli2016-07-111-1/+1
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Implement rasterizer discard via SOL unless required for queries.Kenneth Graunke2016-06-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use CL_INVOCATION_COUNT for the GL_PRIMITIVES_GENERATED query, which involves passing all primitives to the clipper. When rasterizer discard is enabled, we program the clipper in REJECT_ALL mode, rather than using the SOL stage's "Rendering Disable" feature. See commit f09b91f78247409f54c975f56cb10d5f350fe64e for an explanation of why we implement GL_PRIMITIVES_GENERATED this way. Apparently the SOL stage's "Rendering Disable" feature is a lot faster than having the clipper reject all primitives. It's safe to use when no GL_PRIMITIVES_GENERATED query is active, as we don't care about CL_INVOCATION_COUNT incrementing. This patch makes us use SO_RENDERING_DISABLE when no query is active, but continues falling back to the clipper in REJECT_ALL mode when the queries are enabled. It brings back the perf_debug for the clipper case (which I removed in commit 1f9445ff57b, thinking it wasn't useful). Improves performance in Gl32GSCloth by 84.8303% +/- 2.07132% (n = 10) on my Broadwell GT2 laptop. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Implement ARB_query_buffer_object for HSW+Jordan Justen2016-05-041-0/+45
| | | | | | | | | | | | | | | v2: * Declare loop index variable at loop site (idr) * Make arrays of MI_MATH instructions 'static const' (idr) * Remove commented debug code (idr) * Updated comment in set_query_availability (Ken) * Replace switch with if/else in hsw_result_to_gpr0 (Ken) * Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on hsw and gen8 (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Use offset instead of index in brw_store_register_mem64Jordan Justen2016-05-041-48/+9
| | | | | | | | | | This matches the byte based offset of brw_load_register_mem*. The function is also moved into intel_batchbuffer.c like brw_load_register_mem*. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Implement ARB_pipeline_statistics_query tessellation counters.Kenneth Graunke2015-11-171-8/+8
| | | | | | | | | | We basically just need to uncomment Ben's code. v2: Fix obvious bugs caught by Ben. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Rename intel_emit* to reflect their new location in brw_pipe_controlChris Wilson2015-06-241-3/+3
| | | | | Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen9: Use raw PS invocation count for queriesBen Widawsky2015-06-091-1/+1
| | | | | | | | | | | | Previously the number needed to be divided by 4 to get the proper results. Now the hardware does the right thing. Through experimentation it seems Braswell (CHV) does also need the division by 4. Fixes piglit test: arb_pipeline_statistics_query-frag Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/gen6: Fix GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARBIago Toral Quiroga2015-02-201-0/+5
| | | | | | | | | | | | | | | | | | | In gen6 we need to compute the primitive count in the generated GS program. The current implementation only counts full primitives, that is, if the output primitive type is a triangle strip, it won't count individual triangles in the strip, only complete strips. If we want to count basic primitives instead we have two options: rework the assembly code we generate for strip primitives or simply use CL_INVOCATION_COUNT to resolve the query and let the hardware do that work for us. This patch implements the latter approach. Fixes the following piglit test: bin/arb_pipeline_statistics_query-geom -auto Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89210 Tested-by: Mark Janes <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: implement ARB_pipeline_statistics_queryBen Widawsky2015-02-171-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE: The implementation was initially one patch, this. All the history is kept here, even though all the core mesa changes were moved to the parent of this patch. This patch implements ARB_pipeline_statistics_query. This addition to GL does not add a new API. Instead, it adds new tokens to the existing query APIs. The work to hook up the new tokens is trivial due to it's similarity to the previous work done for the query APIs. I've implemented all the new tokens to some degree, but have stubbed out the untested ones at the entry point for Begin(). Doing this should allow the remainder of the code to be left in. The new tokens give GL clients a way to obtain stats about the GL pipeline. Generally, you get the number of things going in, invocations, and number of things coming out, primitives, of the various stages. There are two immediate uses for this, performance information, and debugging various types of misrendering. I doubt one can use these for debugging very complex applications, but for piglit tests, it should be quite useful. Tessellation shaders, and compute shaders are not addressed in this patch because there is no upstream implementation. I've implemented how I believe tessellation shader stats will work for Intel hardware (though there is a bit of ambiguity). Compute shaders are a bit more interesting though, and I don't yet know what we'll do there. For the lazy, here is a link to the relevant part of the spec: https://www.opengl.org/registry/specs/ARB/pipeline_statistics_query.txt Running the piglit tests http://lists.freedesktop.org/archives/piglit/2014-November/013321.html (http://cgit.freedesktop.org/~bwidawsk/piglit/log/?h=pipe_stats) yield the following results: > piglit-run.py -t stats tests/all.py output/pipeline_stats > [5/5] pass: 5 Running Test(s): 5 v2: - Don't allow pipeline_stats to be per stream (Ilia). This may (not sure) be needed for AMD_transform_feedback4, which we do not support. > If AMD_transform_feedback4 is supported then GEOMETRY_SHADER_PRIMITIVES_- > EMITTED_ARB counts primitives emitted to any of the vertex streams for > which STREAM_RASTERIZATION_AMD is enabled. - Remove comment from GL3.txt because it is only used for extensions that are part of required versions (Ilia) - Move the new tokens to a new XML doc instead of using the main GL4x.xml (Ilia) - Add a fallthrough comment (Ilia) - Only divide PS invocations by 4 on HSW+ (Ben) v3: - Add ARB_pipeline_statistics_query to relnotes.html - Add ARB_pipeline_statistics_query.xml to the Makefile.am, and master XML (Ilia) - Correct extension number (Ilia) - Add link to xml in the main GL API xml (Ilia) - remove special GS case from gen6_end_query (Ian) - Make lookup table static so gcc doesn't initialized it on every call (Ian) - Use if (_mesa_has_geometry_shaders(ctx)) instead of explicit checks (Ian) - Core mesa parts moved into a prep patch (Ilia) v4: - Change to 10.6 relnotes since we missed 10.5 window - Moved compute shader stuff into the switch statement (Jordan) - Jordan: Add compute shader support v5: - Fixed relnote style (Ilia) v6: - Rebased on master which beat me to adding the first relnotes - essentially this undoes v5 (which had a typo anyway) - Some code style fixes (Ken) - Remove some excess comments (Ken) - Unify tessellation failure style - unreachable (Ken) - Fix workaround comment for PS invocations (Ken) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/query: Cache whether the batch references the query BO.Kenneth Graunke2014-12-161-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Wilson noted that repeated calls to CheckQuery() would call drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation, which is expensive. Once we've flushed, we know that future batches won't reference query->bo, so there's no point in asking more than once. This patch adds a brw_query_object::flushed flag, which is a conservative estimate of whether the batch has been flushed. On the first call to CheckQuery() or WaitQuery(), we check if the batch references query->bo. If not, it must have been flushed for some reason (such as being full). We record that it was flushed. If it does reference query->bo, we explicitly flush, and record that we did so. Any subsequent checks will simply see that query->flushed is set, and skip the drm_intel_bo_references() call. Inspired by a patch from Chris Wilson. According to Eero, this does not affect the performance of Witcher 2 on Haswell, but approximately halves the userspace CPU usage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Use brw_bo_map to handle stall warnings.Kenneth Graunke2014-12-161-7/+1
| | | | | | | | | | This is less code and also measures the duration of the stall for us. Our old code predates the existance of brw_bo_map(). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Remove redundant drm_intel_bo_references call in CheckQuery.Kenneth Graunke2014-12-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | CheckQuery calls drm_intel_bo_references to see if the batch references the query BO, and if so, flushes. It then checks if the query BO is busy, and if not, calls gen6_queryobj_get_results(). Stupidly, gen6_queryobj_get_results() immediately did a second redundant drm_intel_bo_references check, even though we know the buffer is not referenced and in fact idle. This patch moves the batch-flush check out of gen6_queryobj_get_results and into WaitQuery() (the other caller). That way, both callers do a single batch-flush check. This should only be a minor improvement, since it would only affect the first CheckQuery call where the result is actually available. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Add query->bo == NULL early return in CheckQuery hook.Kenneth Graunke2014-12-161-2/+8
| | | | | | | | | | | If query->bo == NULL, this is a redundant CheckQuery call, and we should simply return. We didn't do anything anyway - we skipped the batch flushing block, and although we called get_results(), it has an early return and does nothing. Why bother? Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Set Ready flag in gen6_queryobj_get_results().Kenneth Graunke2014-12-161-2/+2
| | | | | | | | | | | | | | | q->Ready means that the results are in, and core Mesa is free to return them to the application. gen6_queryobj_get_results() is a natural place to set that flag; doing so means callers don't have to. The older non-hardware-context aware code couldn't do this, because we had to call brw_queryobj_get_results() to gather intermediate results when we ran out of space for snapshots in the query buffer. We only gather complete results in the Gen6+ code, however. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.Iago Toral Quiroga2014-09-111-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far we have been using CL_INVOCATION_COUNT to resolve this query but this is no good with streams, as only stream 0 reaches the clipping stage. From ARB_transform_feedback3: "When a generated primitive query for a vertex stream is active, the primitives-generated count is incremented every time a primitive emitted to that stream reaches the Discarding Rasterization stage (see Section 3.x) right before rasterization. This counter is incremented whether or not transform feedback is active." Unfortunately, we don't have any registers that provide the number of primitives written to a specific stream other than the ones that track the number of primitives written to transform feedback in the SOL stage, so we can't implement this exactly as specified. In the past we implemented this feature by activating the SOL unit even if transform feeback was disabled, but making it so that all buffers were disabled and it only recorded statistics, which gave us the right semantics (see 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5). Unfortunately, this came with a significant performance impact and had to be reverted. This new take does not intend to implement the exact semantics required by the spec, but improves what we have now, since now we return the primitive count for stream 0 in all cases. With this patch we use GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries for non-zero streams. This would return the number of primitives written to transform feedback for each stream instead. Since non-zero streams are only useful in combination with transform feedback this should not be too bad, and the only case that I think we would not be supporting would be the one in which we want to use both GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to detect buffer overflow. This patch also fixes the following piglit test: arb_gpu_shader5-xfb-streams-without-invocations This test uses both GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it does never hit the overflow case, so both queries are always expected to return the same value. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "10.3" <[email protected]>
* Revert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."Kenneth Graunke2014-07-161-9/+4
| | | | | | | | | | | | | | | | This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5. This caused GPU hangs on Ivybridge for some users and huge (80%) performance regressions across the board on multiple platforms. We need to find a better solution. I've made several attempts, but none of them have worked yet. In the meantime, we should revert this. Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but that's okay, since we don't expose GL_ARB_gpu_shader5 yet. Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated test case on Haswell.
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-011-6/+3
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.Iago Toral Quiroga2014-06-301-4/+9
| | | | | | | | | | | | | | | | | So far we have been using CL_INVOCATION_COUNT to resolve this query but this is no good with streams, as only stream 0 reaches the clipping stage. Instead we will use SO_PRIM_STORAGE_NEEDED which can keep track of the primitives sent to each individual stream. Since SO_PRIM_STORAGE_NEEDED is related to the SOL stage and according to ARB_transform_feedback3 we need to be able to query primitives generated in each stream whether transform feedback is active or not what we do is to enable the SOL unit even if transform feedback is not active but disable all output buffers in that case. This effectively disables transform feedback but permits activation of statistics enabling SO_PRIM_STORAGE_NEEDED even when transform feedback is not active. Reviewed-by: Chris Forbes <[email protected]>
* i965: Implement GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN with non-zero streams.Iago Toral Quiroga2014-06-301-4/+4
| | | | Reviewed-by: Chris Forbes <[email protected]>
* i965: Re-combine the Gen4-5 and Gen6+ write_depth_count functions.Kenneth Graunke2014-01-201-18/+2
| | | | | | | | | Now that we have a helper function that handles the PIPE_CONTROL variations between the various platforms, these are basically the same. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Create a helper function for emitting PIPE_CONTROL writes.Kenneth Graunke2014-01-201-11/+4
| | | | | | | | | | | | | | | | | | | There are a lot of places that use PIPE_CONTROL to write a value to a buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT). Creating a single function to do this seems convenient. As part of this refactor, we now set the PPGTT/GTT selection bit correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms. This is correct for Sandybridge, but actually part of the address on Ivybridge and later! Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to adjust that in substantially fewer places, giving us confidence that we've hit them all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix MI_STORE_REGISTER_MEM for Broadwell.Kenneth Graunke2014-01-201-10/+23
| | | | | | | It now takes a 48-bit address. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Delete duplicate write_timestamp function.Kenneth Graunke2014-01-101-34/+2
| | | | | | | | | | brw_queryobj.c needs a version of write_timestamp that works on all generations for the QueryCounter() driver hook. So there's no point in duplicating it in gen6_queryobj.c. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix brw_store_register_mem64 to stay within a single batch.Kenneth Graunke2013-10-311-4/+1
| | | | | | | | Previously, the write of each 32-bit half might land in separate batch buffers, which is insane. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: CS writes/reads should use I915_GEM_INSTRUCTIONDaniel Vetter2013-10-281-2/+2
| | | | | | | | | | | | | | | | Otherwise the gen6 w/a in the kernel won't kick in and the write will land nowhere. Inspired by a patch Ken pointed me at which had the same issue (but isn't yet merged and also for a gen7+ feature). An audit of the entire driver didn't reveal any other case than the one in in the write_reg helper used by the gen6 queryobj code. Acked-by: Kenneth Graunke <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
* i965: Expose write_reg() as brw_store_register_mem64().Kenneth Graunke2013-10-231-9/+9
| | | | | | | | | | | | Writing a 64-bit register value to memory is sufficiently complicated that it makes sense to reuse this function rather than duplicating it. Exposing it outside of gen6_queryobj.c means it needs a more descriptive function name. It could probably be moved to brw_util.c or somewhere else, but this works too. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move flushing out of write_reg and into the callers.Kenneth Graunke2013-10-231-4/+8
| | | | | | | | | | The current callers just want to write a single register, so combining the register read with a pipeline flush made sense. However, in the future we'll want to do multiple register reads back to back, and we'll only want to flush once. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Tidy preprocessor macros for SO_NUM_PRIMS_WRITTEN registers.Kenneth Graunke2013-08-061-2/+2
| | | | | | | | | | | | | Gen7+ supports four transform feedback streams. Using a function-like macro makes it easy to access them by stream number or loop over them. "GEN7_" prefixes are more common than "_IVB" suffixes, so we use that. Gen6 only supports a single stream, so the single #define should be fine. However, SO_NUM_PRIMS_WRITTEN was confusingly generic, as it doesn't exist on Gen7+. Add a "GEN6_" prefix for clarity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-8/+4
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::perf_debug to brw_context.Kenneth Graunke2013-07-091-2/+1
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-3/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::bufmgr to brw_context.Kenneth Graunke2013-07-091-2/+1
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke2013-07-091-21/+28
| | | | | | | | | | | | | | This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Implement transform feedback query support in hardware on Gen6+.Kenneth Graunke2013-05-201-35/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have hardware contexts and can use MI_STORE_REGISTER_MEM, we can use the GPU's pipeline statistics counters rather than going out of our way to count primitives in software. Aside from being simpler, this also paves the way for Geometry Shaders, which can output an arbitrary number of primitives on the GPU. It will also allow us to use hardware primitive restart when these queries are in use. The GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query is easy: it corresponds to the SO_NUM_PRIMS_WRITTEN/SO_NUM_PRIMS_WRITTEN0_IVB counters. The GL_PRIMITIVES_GENERATED query is trickier. Gen provides several statistics registers which /almost/ match the semantics required: - IA_PRIMITIVES_COUNT The number of primitives fetched by the VF or IA (input assembler). This undercounts when GS is enabled, as it can output many primitives. - GS_PRIMITIVES_COUNT The number of primitives output by the GS. Unfortunately, this doesn't increment unless the GS unit is actually enabled, and it usually isn't. - SO_PRIM_STORAGE_NEEDED*_IVB The amount of space needed to write primitives output by transform feedback. These naturally only work when transform feedback is on. We'd also have to add the counters for all four streams. - CL_INVOCATION_COUNT The number of primitives processed by the clipper. This doesn't work if the GS or SOL throw away primitives for rasterizer discard. However, it does increment even if the clipper is in REJECT_ALL mode. Dynamically switching between counters would be painfully complicated, especially since GS, rasterizer discard, and transform feedback can all be switched on and off repeatedly during a single query. The most usable counter is CL_INVOCATION_COUNT. The previous two patches reworked rasterizer discard support so that all primitives hit the clipper, making this work. v2: Occlusion query bug fixes removed and squashed in earlier patches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Rely on hardware contexts for query objects on Gen6+.Kenneth Graunke2013-05-201-0/+359
Hardware contexts greatly simplify the query object code. The pipeline statistics counters get saved and restored with the context, which means that we don't need to worry about other workloads polluting them. This means that we can simply write a single pair of values (one at BeginQuery and one at EndQuery) rather than a series of pairs. This also means we don't need to worry about the BO getting full. We also don't need to delay BO allocation and starting snapshot until the first draw. The generation split here is a little off: technically, Ironlake can also support hardware contexts. However, the kernel currently doesn't, and even if it were to do so someday, we'd need to wait a while before bumping the kernel requirement to take advantage of it. v2: Incorporate Paul's feedback. - Clarify which functions are Gen4/5-only via assertions and comments. - Change how driver hook initialization happens. - Update comments. - Squash a bug fix from a later commit here where it belongs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> [v1] Acked-by: Paul Berry <[email protected]>