| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nv_conditional_render piglits were sporadically failing. Moving
the control flush from the write and placing it just before the read
was sufficient to make the piglits pass a 1000/1000 times. The bspec
says that the flush enable bit "waits until all previous writes of
immediate data from post sync circles are complete before executing the
next command" - the operative word being previous!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90691
Signed-off-by: Chris Wilson <[email protected]>
Cc: Neil Roberts <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was mistaken, I thought we already had fixed this in the kernel a
couple of years ago. We had not, and the broken read (the hardware
shifts the register output on 64bit kernels, but not on 32bit kernels) is
now enshrined into the ABI. I also had the buggy architecture reversed,
believing it to be 32bit that had the shifted results. On the basis of
those mistakes, I wrote
commit c8d3ebaffc0d7d915c1c19d54dba61fd1e57b338
Author: Chris Wilson <[email protected]>
Date: Wed Apr 29 13:32:38 2015 +0100
i965: Query whether we have kernel support for the TIMESTAMP register once
Now that we do have an extended register read interface for always
reporting the full 36bit TIMESTAMP (irrespective of whether the hardware
is buggy or not), make use of it and in the process fix my reversed
detection of the buggy reads for unpatched kernels.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Martin Peres <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Cc: Michał Winiarski <[email protected]>
Cc: Daniel Vetter <[email protected]>
Tested-and-acked-by: Chris Forbes <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of
the query to determine whether to render. On Gen7+ there is a bit in
the 3DPRIMITIVE command which can be used to disable the primitive
based on the value of a state bit. This state bit can be set based on
whether two registers have different values using the MI_PREDICATE
command. We can load these two registers with the pixel count values
stored in the query begin and end to implement conditional rendering
without stalling.
Unfortunately these two source registers were not in the whitelist of
available registers in the kernel driver until v3.19. This patch uses
the command parser version from intel_screen to detect whether to
attempt to set the predicate data registers.
The predicate enable bit is currently only used for drawing 3D
primitives. For blits, clears, bitmaps, copypixels and drawpixels it
still causes a stall. For most of these it would probably just work to
call the new brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of
rendering primitives. However it's a bit trickier for blits because it
can use the BLT ring or the blorp codepath. I think these operations
are less useful for conditional rendering than rendering primitives so
it might be best to leave it for a later patch.
v2: Use the command parser version to detect whether we can write to
the predicate data registers instead of trying to execute a
register load command.
v3: Simple rebase
v4: Changes suggested by Kenneth Graunke: Split the
load_64bit_register function out to a separate patch so it can be
a shared public function. Avoid calling
_mesa_check_conditional_render if we've already determined that
there's no query object. Some styling fixes.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes a regression with timer queries introduced with
commit 3eb6258. There the pending batchbuffer is flushed
only if glEndQuery is executed. This present change adds such
a flush to glQueryCounter which also schedules a value query
just like glEndQuery does. The patch fixes GPU timer queries
going mad from within osgviewer.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Mathias Froehlich <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sandybridge requires the post-sync non-zero workaround in a ton of
places, and if you ever miss one, the GPU usually hangs.
Currently, we try to track exactly when a workaround flush is
necessary (via the brw->batch.need_workaround_flush flag). This is
tricky to get right, and we've botched it several times in the past.
This patch unconditionally performs the post-sync non-zero flush at the
start of each primitive's state upload (including BLORP). We drop the
needs_workaround_flush flag, and drop all the other callers, as the
flush has already been performed.
We have no data to indicate that simply flushing all the time will
hurt performance, and it has the potential to help stability.
v2: Add post-sync workaround to initial GPU state upload to be extra
cautious (suggested by Chad Versace).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3e4e74e9a77446bce49607433d86be3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed9187c4d4ce1e3c486b5dd0880d18ec8b
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1eca190417998d548fd140c1eca37a54
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d923fd80c84090fbf4506c9eaaffea1
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404dad009d8cff5124cf8acee7daeaceb64
Signed-off-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have a helper function that handles the PIPE_CONTROL
variations between the various platforms, these are basically the same.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a lot of places that use PIPE_CONTROL to write a value to a
buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT).
Creating a single function to do this seems convenient.
As part of this refactor, we now set the PPGTT/GTT selection bit
correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms.
This is correct for Sandybridge, but actually part of the address on
Ivybridge and later!
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to adjust that in substantially fewer
places, giving us confidence that we've hit them all.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These days, we need to emit PIPE_CONTROL flushes all over the place.
Being able to do that via a single function call seems convenient.
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to do this in substantially fewer places.
v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by
Eric Anholt). Drop unlikely() from BLT_RING check.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
brw_queryobj.c needs a version of write_timestamp that works on all
generations for the QueryCounter() driver hook. So there's no point in
duplicating it in gen6_queryobj.c.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context inherit directly from gl_context; that was the
only thing left in intel_context.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context available in every function that used
intel_context. This makes it possible to start migrating fields from
intel_context to brw_context.
Surprisingly, this actually removes some code, as functions that use
OUT_BATCH don't need to declare "intel"; they just use "brw."
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
EXT_transform_feedback isn't yet supported on Gen4-5, so none of this
query code is actually used. This also means we can remove some of the
surrounding support code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hardware contexts greatly simplify the query object code. The pipeline
statistics counters get saved and restored with the context, which means
that we don't need to worry about other workloads polluting them.
This means that we can simply write a single pair of values (one at
BeginQuery and one at EndQuery) rather than a series of pairs. This
also means we don't need to worry about the BO getting full. We also
don't need to delay BO allocation and starting snapshot until the first
draw.
The generation split here is a little off: technically, Ironlake can also
support hardware contexts. However, the kernel currently doesn't, and
even if it were to do so someday, we'd need to wait a while before
bumping the kernel requirement to take advantage of it.
v2: Incorporate Paul's feedback.
- Clarify which functions are Gen4/5-only via assertions and comments.
- Change how driver hook initialization happens.
- Update comments.
- Squash a bug fix from a later commit here where it belongs.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]> [v1]
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We don't want to set the flag for Gallium.
I think only swrast needs the flag to be set for occlusion queries.
v2: fix stats_wm updates in i965
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
The code doesn't set brw->query.obj to NULL, it sets query->bo to NULL.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This was only necessary because our bounds checking was off by one, and
thus we read an extra pair of values.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If we've written N pairs of values to the buffer, then last_index = N,
but the values are 0 .. N-1. Thus, we need to use <, not <=.
This worked anyway because we fill the buffer with zeroes, so we just
added an extra (0 - 0) to our results.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
I tried to ensure that performance in the non-debug case doesn't change
(we still just check one condition up front), and I think the impact is
small enough in the debug context case to warrant including all of it.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'll want to reuse this for non-occlusion queries in the future.
Plus, it's a single logical task, so having it as a helper function
clarifies the code somewhat.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Again, eliminating a global variable in favor of a per-query object
variable will help in a future where we have more queries in hardware.
Personally, I find this clearer: there's just the query object's BO,
rather than two variables that usually shadow each other.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The code a few lines above calls brw_emit_query_begin() if !query->bo,
and that creates query->bo. So it should always be non-NULL.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we haven't allocated a BO yet, we need to do that. Or, if there
isn't enough room to write another pair of values, we need to gather up
the existing results and start a new one. This is simple enough.
However, the old code was awkwardly split into two blocks, with a
write_depth_count() placed in the middle. The new depth count isn't
relevant to gathering the old BO's data, so that can go after the
reallocation is done. With the two blocks adjacent, we can merge them.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we already have an index in the brw_query_object, there's no need
to also keep a global variable that shadows it.
Plus, if we ever add support for more types of queries that still need
the per-batch before/after treatment we do for occlusion queries, we
won't be able to use a single global variable. In contrast, per-query
object variables will work fine.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
brw->query.index is initialized to 0 just a few lines before it's
copied to first_index.
Presumably the idea here was to reuse the query BO for subsequent
queries of the same type, but since that doesn't happen, there's no need
to have the extra code complexity.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was really difficult to follow, for a number of reasons:
- Queries were handled in four different ways (TIMESTAMP writes a single
value, TIME_ELAPSED writes a single pair of values, occlusion queries
write pairs of values for the start and end of each batch, and other
queries are done entirely in software. It turns out that there are
very good reasons each query is handled the way it is, but
insufficient comments explaining the rationale.
- It wasn't immediately obvious which functions were driver hooks
and which were helper functions. For example, brw_query_begin() is
a driver hook that implements glBeginQuery() for all query types, but
the similarly named brw_emit_query_begin() is a helper function that's
only relevant for occlusion queries.
Extra explanatory comments should save me and others from constantly
having to ask how this code works and why various query types are
handled differently.
v2: Incorporate Eric's feedback: change "as soon as possible" to "the
results will be present when mapped."
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For timestamp queries, we just write a single value to a BO. The
natural place to write that is element 0, so we should do that.
Previously, we wrote it into element 1 (the second slot) leaving
element 0 filled with garbage.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This moves the GL_TIMESTAMP handling out of EndQuery.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The specification requires that query results are processed in order, (when
one query result is returned, all previous query of the same type must also be
available). The implementation was failing this requirement in the case of
BeginQuery and EndQuery with no intervening drawing, (the result would be made
available immediately without flushing previous queries).
This fixes the following es3conform test:
occlusion_query_query_order
as well as the following piglit test:
occlusion_query_order
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
We just treat this as an alias for GL_ANY_SAMPLES_PASSED.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is a leftover from when we had to split those two functions due to
the separate BO validation step.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
"Active" is an already-used term for the query being between
glBeginQuery() and glEndQuery(), while this is tracking whether the
start of the packet pair for emitting state has been inserted into the
current batchbuffer.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Fix mangled sentence in the comment, and make the loop exit early.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|
|
|
|
|
|
|
| |
Given the usecase we have of trying to measure timestamps across individual
draw calls, flushing will totally mess up what people are trying to measure.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The theory I had when I wrote the code was that you wanted to minimize latency
on your queries because the app was going to ask soon. Only, it turns out
that everybody batches up their queries and asks for the results later (often
after the next SwapBuffers!), so this was a pessimization.
Until now, I had no workload where it mattered enough to benchmark. Recently
I started playing some Minecraft, which uses tons of queries to decide whether
to render chunks of the terrain. For that app, avoiding the flush in the
query-generation loop improves performance 22.7% +/- 4.7% (n=3) on an apitrace
capture of it (confirmed in game by watching the fps meter found by pressing
F3, 15/16 -> 20/21 fps).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Needs updated libdrm.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the CS stall on Ivybridge.
On Sandybridge, the depth stall needs to be preceded by a non-zero
post-sync op, which requires a CS stall, which needs a stall at
scoreboard. Emit the full workaround.
Reviewed-by: Daniel Vetter <[email protected]>
Cc: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware seems to use the length of the PIPE_CONTROL command to
indicate whether the write is 64-bits or 32-bits. Which makes sense
for immediate writes.
Daniel discovered this by writing a pattern into the query object bo
and noticing that the high 32-bits were left intact, even on those
pipe control writes that seemingly worked.
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This consolidates the complexity in one place, which is important
because it's about to get even more complicated.
Signed-off-by: Kenneth Graunke <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|