| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can end up with scenarios where last_fence is associated with a batch
that is flushed through some other path before needs_out_fence_fd gets
set. Resulting in returning a fence that has no backing fd.
The simplest thing is to just skip the optimization to try and avoid
no-op batches when a fence-fd is requested. This should normally be
just once a frame anyways.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The pscreen param was just there to satisfy pipe_screen::fence_reference
But some of the internal uses passed NULL for screen. Which is a bit
ugly. Instead drop the param and add a shim function to plug into the
screen.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
The `force` arg has been unused for a while.. but apparently I forgot to
garbage collect it.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We just need to do a sequence of commands to flush the cache.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glMemoryBarrier() function makes shader memory stores ordered with
respect to things specified by the given bits. Until now, st/mesa has
ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
saying that drivers should implicitly perform the needed flushing.
This seems like a pretty big assumption to make. Instead, this commit
opts to translate them to new PIPE_BARRIER bits, and adjusts existing
drivers to continue ignoring them (preserving the current behavior).
The i965 driver performs actions on these memory barriers. Shader
memory stores go through a "data cache" which is separate from the
render cache and other read caches (like the texture cache). All
memory barriers need to flush the data cache (to ensure shader memory
stores are visible), and possibly invalidate read caches (to ensure
stale data is no longer visible). The driver implicitly flushes for
most caches, but not for data cache, since ARB_shader_image_load_store
introduced MemoryBarrier() precisely to order these explicitly.
I would like to follow i965's approach in iris, flushing the data cache
on any MemoryBarrier() call, so I need st/mesa to actually call the
pipe->memory_barrier() callback.
Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate
and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on
the iris driver.
Roland said this looks reasonable to him.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First step to unify the way fd5 and fd6 blitter works. Currently a6xx
bypasses the blit API in order to also accelerate resource_copy_region()
But this approach can lead to infinite recursion:
#0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291
#1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479
#2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350
#4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
#6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864
#7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993
#8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546
#9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129
#10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326
#11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416
#12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516
#13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376
#14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
...
Instead rework the API to push the fallback back to core code, so that
we can rework resource_copy_region() to have it's own fallback path,
and then finally convert fd6 over to work in the same way.
This also makes ctx->blit() optional, and cleans up some unnecessary
callers.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Noticed that with webgl (in chromium, at least) we end up generating a
lot of no-op submits just to get a fence. Tracking the last fence and
returning that if there is no rendering since last flush avoids this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.
This patch was made with:
sed -ri '/-\*- mode:/,/^$/d' \
$(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
-exec grep -l -- '-\*- mode:' {} \+)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Don't fall over when app wants more than 32 contexts. Instead allocate
contexts on demand.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
For cases in which (after the following commit) ctx->batch may be null.
Prep work for following commit.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Following patches will be doing further cleanup after calling
fd_context_destroy() so it is easier if we move the free() into
the per-gen backend code.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead of the reading batch setting a dependency on the writing batch,
simply flush the writing batch immediately. This avoids situations
where we have to flush the context's current batch later.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Mainly for testing, FD_MESA_DEBUG=hiprio will force high priority
contexts.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
For devices (and kernels) which support different priority ringbuffers,
expose context priority support.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Basically a clone of util_blitter_blit() but with special handling to
blit PIPE_BUFFER as a PIPE_TEXTURE_1D.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Get rid of "gmem" (ie. tiling) ringbuffer, and just emit setup commands
directly to "draw" ringbuffer for compute (and in future for blits not
using the 3d pipe). This way we can have a simple flat cmdstream buffer
and bypass setup related to 3d pipe.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
ctx->last_fence isn't such a terribly clever idea, if batches can be
flushed out of order. Instead, each batch now holds a fence, which is
created before the batch is flushed (useful for next patch), that later
gets populated after the batch is actually flushed.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
To enable per-context priorities, we need to have per-context pipe's.
Unfortunately we still need to keep the global screen pipe, mostly just
for screen->get_timestamp().
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
To add context priority support we need to have an fd_pipe per context,
rather than per-screen. Which conflicts with existing ctx->pipe (which
is actually a visibility stream pipe (hw resource). So just rename it.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Prep work for later patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Android tries to create a FENCE_FD fence without any rendering. And
then falls over when that fails. So just always create an initial
batch.
Fixes: e4ad8695 ("freedreno: fix crash when flush() but no rendering")
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If we haven't created a batch, just bail in pipe->flush(), since there
is nothing to do.
Fixes crash in warsow, which creates a whole bunch of contexts used for
nothing but texture uploads.
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some queries on a4xx and all queries on a5xx can do result accumulation
on CP so we don't need to track per-tile samples. We do still need to
handle pausing/resuming while switching batches (in case the query is
active over multiple draws which are executed out of order).
So introduce new accumulated-query helpers for these sorts of queries,
since it doesn't really fit in cleanly with the original query infra-
structure.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For a5xx (and actually some queries on a4xx) we can accumulate results
in the cmdstream, so we don't need this elaborate mechanism of tracking
per-tile query results. So make it into vfuncs so generation specific
backend can use it when it makes sense.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
In this case, ctx->flush_queue would not have been initialized.
Fixes: 0b613c20 ("freedreno: enable draw/batch reordering by default")
Cc: "17.1" <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Notes:
- make sure the default size is large enough to handle all state trackers
- pipe wrappers don't receive transfer calls from stream_uploader, because
pipe_context::stream_uploader points directly to the underlying driver's
stream_uploader (to keep it simple for now)
v2: add error handling to nv50, nvc0, noop
v3: set const_uploader
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Edmondo Tommasina <[email protected]> (v1)
Tested-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If app tries to create a fence but there is no rendering to submit, we
need a dummy/no-op submit. Use a string-marker for the purpose.. mostly
since it avoids needing to realize that the packet format changes in
later gen's (so one less place to fixup for a5xx).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Prep-work for next patch, mostly move to tracking last_fence as a
pipe_fence_handle (created now only in fd_gmem_render_tiles()), and a
bit of superficial renaming.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
Since clears are more or less just normal draws, there isn't that much
benefit in having hand-rolled clear path. Add support to use u_blitter
instead if gen specific backend doesn't implement ctx->clear().
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
These are the same for a3xx and later. (a2xx could probably use them
too, but due to limited hw support and ancient downstream kernels, it
isn't so easy to test.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Some apps, like warsow, create a bazillion contexts but don't render on
most of them.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
This is also used in gmem code, which executes from the "bottom half"
(ie. from the flush_queue worker thread), so it cannot be in fd_context.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the state accessed from GMEM+submit factored out of fd_context and
into fd_batch, now it is possible to punt this off to a helper thread.
And more importantly, since there are cases where one context might
force the batch-cache to flush another context's batches (ie. when there
are too many in-flight batches), using a per-context helper thread keeps
various different flushes for a given context serialized.
TODO as with batch-cache, there are a few places where we'll need a
mutex to protect critical sections, which is completely missing at the
moment.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Push query state down to batch, and use the resource tracking to figure
out which batch(es) need to be flushed to get the query result.
This means we actually need to allocate the prsc up front, before we
know the size. So we have to add a special way to allocate an un-
backed resource, and then later allocate the backing storage.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Make it easier to track batches, to ensure things happen properly when
they are reordered.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that I originally also had a entry-point that would construct a key
and do lookup from a pipe_surface. I ended up not needing that (yet?)
but it is easy-enough to re-introduce later if we need it for the blit
path.
For now, not enabled by default, but can be enabled (on a3xx/a4xx) with
FD_MESA_DEBUG=reorder.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To flush batches out of order, the gmem code needs to not depend on
state from fd_context (since that may apply to a more recent batch).
So this all moves into batch.
The one exception is the gmem/pipe/tile state itself. But this is
only used from gmem code (and batches are flushed serially). The
alternative would be having to re-calculate GMEM layout on every
batch, even if the dimensions of the render targets are the same.
Note: This opens up the possibility of pushing gmem/submit into a
helper thread.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce the batch object, to track a batch/submit's worth of
ringbuffers and other bookkeeping. In this first step, just move
the ringbuffers into batch, since that is mostly uninteresting
churn.
For now there is just a single batch at a time. Note that one
outcome of this change is that rb's are allocated/freed on each
use. But the expectation is that the bo pool in libdrm_freedreno
will save us the GEM bo alloc/free which was the initial reason
to implement a rb pool in gallium.
The purpose of the batch is to eventually facilitate out-of-order
rendering, with batches associated to framebuffer state, and
tracking the dependencies on other batches.
Signed-off-by: Rob Clark <[email protected]>
|