| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a lot of places that use PIPE_CONTROL to write a value to a
buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT).
Creating a single function to do this seems convenient.
As part of this refactor, we now set the PPGTT/GTT selection bit
correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms.
This is correct for Sandybridge, but actually part of the address on
Ivybridge and later!
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to adjust that in substantially fewer
places, giving us confidence that we've hit them all.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These days, we need to emit PIPE_CONTROL flushes all over the place.
Being able to do that via a single function call seems convenient.
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to do this in substantially fewer places.
v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by
Eric Anholt). Drop unlikely() from BLT_RING check.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Broadwell uses 48-bit addresses. The first DWord is the low 32 bits,
and the second DWord is the high 16 bits.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Nothing in i965 uses it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using an unoptimized variant of glamor spending 50% of its CPU time in
brw_draw_prims() (and hitting the cache *very* frequently):
N Min Max Median Avg Stddev
x 200 29200 40500 34900 34750 958.43256
+ 200 31000 40300 34700 34622 916.35941
No difference proven at 95.0% confidence
Similarly, no difference on GLB2.7:
N Min Max Median Avg Stddev
x 63 64.1 71.36 70.69 70.113175 1.6782026
+ 63 63.6 71.18 70.75 70.223651 1.6044186
No difference proven at 95.0% confidence
v2: Rebase on master (by anholt)
v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH
didn't have the asserts about batchbuffer usage that ADVANCE_BATCH
does, so we started assertion failing.
Signed-off-by: Kenneth Graunke <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen4+, OUT_RELOC_FENCED is equivalent to OUT_RELOC; libdrm silently
ignores the fenced flag:
/* We never use HW fences for rendering on 965+ */
if (bufmgr_gem->gen >= 4)
need_fence = false;
Thanks to Eric for noticing this.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
These are identical: main/compiler.h defines INLINE to "inline".
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to use the Observability Architecture effectively, we'll need
to take snapshots of the OA counters via MI_REPORT_PERF_COUNT at the
start and end of each batch.
Experimentation reveals that we need to flush before and after each
MI_REPORT_PERF_COUNT to get working values. For simplicitly, I chose to
use intel_batchbuffer_emit_mi_flush(), which unfortunately expands to
triple pipe controls on Sandybridge.
We may want to start computing per-generation reserved batch space to
avoid the insanity of Sandybridge's PIPE_CONTROL cost. That said, much
of this cost existed before I rewrote the query object support to use
hardware contexts, so it's at least not entirely new.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to start OA at the beginning of each batch where monitors are
active. OACONTROL isn't part of the hardware context, so to avoid
leaving counters enabled for other applications, we turn them off at the
end of the batch too.
We also need to start them at BeginPerfMonitor time (unless they've
already been started). We stop them when the monitor last ends as well.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The new intel_batchbuffer_emit_render_ring_prelude() hook will be called
when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a
place to emit state that should go at the start of each render ring
batch, with minimal overhead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we first create a batch buffer, it's empty. We don't actually
know what ring it will be targeted at until the first BEGIN_BATCH or
BEGIN_BATCH_BLT macro.
Previously, one could determine the state of the batch by checking
brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs.
unknown).
This should be functionally equivalent, but the tri-state enum is a bit
clearer.
v2: Catch three explicit require_space callers (thanks to Carl and Eric).
v3: Split the boolean -> enum change from the UNKNOWN_RING change.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more
obvious than passing true or false.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Don't go to extra work to avoid extraneous flushes. (Previous
experiments in the kernel have suggested that flushing the pipeline
when it is already empty is extremely cheap).
Cc: "10.0" <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are things that happen to be occurring because of the batch flush at
the start of the blorp op (which exists to prevent batch space or aperture
space overflow), but the intention was for this sequence of state resets at
the end of blorp to be everything necessary for the next draw call.
Found when debugging the next commit, by comparing brw_new_batch() and
intel_batchbuffer_reset() to brw_blorp_exec().
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This hasn't done anything in a long time, and it's only used in a couple
places...which means we couldn't use it without doing a bunch of work
anyway.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context available in every function that used
intel_context. This makes it possible to start migrating fields from
intel_context to brw_context.
Surprisingly, this actually removes some code, as functions that use
OUT_BATCH don't need to declare "intel"; they just use "brw."
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
brw_context.h includes intel_context.h, but additionally makes the
brw_context structure available. Switching this allows us to start
using brw_context in more places.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
maxBatchSize was only ever initialized to BATCH_SZ, and a few places
used BATCH_SZ directly anyway.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that i915's forked off, they don't need to live in a shared directory.
Acked-by: Kenneth Graunke <[email protected]>
Acked-by: Chad Versace <[email protected]>
Acked-by: Adam Jackson <[email protected]>
(and I hear second hand that idr is OK with it, too)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is currently believed to work but be a significant performance loss.
Performance recovery should be soon to follow.
The dri_bo_fake_disable_backing_store() call was added to allow backing store
disable like bufmgr_fake.c did, which is a significant performance win (though
it's missing the no-fence-subdata part).
This commit is a squash merge of the 965-ttm branch, which had some history
I wanted to avoid pulling due to noisiness and brokenness at many points
for git-bisecting.
|
|
|
|
| |
This is in preparation for 965 TTM.
|
|
|
|
|
|
|
| |
This reverts commit b2f1aa2389473ed09170713301b042661d70a48e.
Somehow I ended up with my branch's save-this-while-I-work-on-master commit
actually on master.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
rendering operations to take place after evicting all resident
buffers.
Cope better with memory allocation failures throughout the driver and
improve tracking of failures.
|
|
This driver comes from Tungsten Graphics, with a few further modifications by
Intel.
|