| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
We're using vs and fs now, and adding hs, ds and gs soon. It's
confusing enough that we have both DS/TCS and HS/TES. At least for VS
and FS there doesn't have to be multiple names.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The driver can't determine PIPE_QUERY_PRIMITIVES_GENERATED or
PIPE_QUERY_PRIMITIVES_EMITTED once we support geometry or
tessellation, since these stages add primitives at runtime. Use the
WRITE_PRIMITIVE_COUNTS event to write back the primitive counts and
implement a hw query for this.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The GPU writes out streamout offsets as it goes to the FLUSH_BASE
pointer. We use that value with CP_MEM_TO_REG when appending to the
stream so that we don't have to track the offsets with the CPU in the
driver. This ensures that streamout continues to work once we enable
geometry and tessellation shader stages that add geometry.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
framebuffer_barrier() still depends on the ctx, but the rest can move to
screen.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
These don't need to be in context, and we'll need them in screen in a
later patch. Plus it's a good cleanup.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If no clear, and no geometry according to VSC_STATE[pipe] we can skip
the tile entirely. If there is a fast-clear, we can't skip restore
(clear) or resolve IBs, but we can still skip draw IB.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
We just need to do a sequence of commands to flush the cache.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the a2xx TGSI compiler with a NIR compiler.
It also adds several new features:
-gl_FrontFacing, gl_FragCoord, gl_PointCoord, gl_PointSize
-control flow (including loops)
-texture related features (LOD/bias, cubemaps)
-filling scalar ALU slot when possible
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First step to unify the way fd5 and fd6 blitter works. Currently a6xx
bypasses the blit API in order to also accelerate resource_copy_region()
But this approach can lead to infinite recursion:
#0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291
#1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479
#2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350
#4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
#6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864
#7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993
#8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546
#9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129
#10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326
#11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416
#12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516
#13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376
#14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
...
Instead rework the API to push the fallback back to core code, so that
we can rework resource_copy_region() to have it's own fallback path,
and then finally convert fd6 over to work in the same way.
This also makes ctx->blit() optional, and cleans up some unnecessary
callers.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On older gens, the CLIP_ADJ bitfields were actually 3.6 fixed point.
Which might make more sense. Although this formula comes up with values
pretty close to what blob does for various viewport sizes (for at least
a5xx and a6xx), and seems to work.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Just massive search/replace for the most part.
Step towards removing ir3 dependency on disasm.h which is shared by
a2xx. One step closer to being able to move ir3 out of gallium.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Noticed that with webgl (in chromium, at least) we end up generating a
lot of no-op submits just to get a fence. Tracking the last fence and
returning that if there is no rendering since last flush avoids this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.
This patch was made with:
sed -ri '/-\*- mode:/,/^$/d' \
$(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
-exec grep -l -- '-\*- mode:' {} \+)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Don't fall over when app wants more than 32 contexts. Instead allocate
contexts on demand.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
For cases in which (after the following commit) ctx->batch may be null.
Prep work for following commit.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Somehow this got lost from the initial MSAA patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Avg number of (half) regs per draw, so we can corrolate fps dips to
shader register usage.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
These never got updated in fd_context_all_dirty() so actually trying to
rely on them (in the case of fd5_emit_images()) ends up in some cases
where state is not emitted but should be. Best to just rip this out.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overall a nice 5-10% gain for most games. And more for things like
glmark2 texture benchmark.
There are some rough edges. In particular, the hardware seems to only
support tiling or component swap. (Ie. from hw PoV, ARGB/ABGR/RGBA/
BGRA are all the same format but with different component swap.) For
tiled formats, only ARGB is possible. This isn't a big problem for
*sampling* since we also have swizzle state there (and since
util_format_compose_swizzles() already takes into account the component
order, we didn't use COLOR_SWAP for sampling). But it is a problem if
you try to render to a tiled BGRA (for example) surface.
The next patch introduces a workaround for blitter, so we can generate
tiled textures in ABGR/RGBA/BGRA, but that doesn't help the render-
target case. To handle that, I think we'd need to keep track that the
tiled format is different from the linear format, which seems like it
would get extra fun with sampler views/etc.
So for now, disabled by default, enable with FD_MESA_DEBUG=ttile. In
practice it works fine for all the games I've tried, but makes piglit
grumpy.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
For dealing with indirect-draw + gl_VertexID, we'll introduce another
case where we need to use CP_MEM_TO_MEM. Rather than adding more
if(a5xx)/else make this a ctx vfunc.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Basically a clone of util_blitter_blit() but with special handling to
blit PIPE_BUFFER as a PIPE_TEXTURE_1D.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Get rid of "gmem" (ie. tiling) ringbuffer, and just emit setup commands
directly to "draw" ringbuffer for compute (and in future for blits not
using the 3d pipe). This way we can have a simple flat cmdstream buffer
and bypass setup related to 3d pipe.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
ctx->last_fence isn't such a terribly clever idea, if batches can be
flushed out of order. Instead, each batch now holds a fence, which is
created before the batch is flushed (useful for next patch), that later
gets populated after the batch is actually flushed.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It is unfortunate that image state isn't a real CSO, since (at least for
a4xx/a5xx) it is a combination of sampler and "SSBO" image state, and it
would be useful to pre-compute the state block "register" values rather
than doing it at emit time.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
To enable per-context priorities, we need to have per-context pipe's.
Unfortunately we still need to keep the global screen pipe, mostly just
for screen->get_timestamp().
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
To add context priority support we need to have an fd_pipe per context,
rather than per-screen. Which conflicts with existing ctx->pipe (which
is actually a visibility stream pipe (hw resource). So just rename it.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Prep work for later patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We probably *could* do this with blit path, but I think it would involve
clobbering settings from batch->gmem (see emit_zs()).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
The generation-independent support for binding shader buffer objects.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
It is always the draw ring. Except for a5xx queries like time-elapsed,
where we will eventually want to emit cmds into both binning and draw
rings.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some queries on a4xx and all queries on a5xx can do result accumulation
on CP so we don't need to track per-tile samples. We do still need to
handle pausing/resuming while switching batches (in case the query is
active over multiple draws which are executed out of order).
So introduce new accumulated-query helpers for these sorts of queries,
since it doesn't really fit in cleanly with the original query infra-
structure.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Move a bit more of the logic shared by all query types (active tracking,
etc) into common code. This avoids introducing a 3rd copy of that logic
for a5xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For a5xx (and actually some queries on a4xx) we can accumulate results
in the cmdstream, so we don't need this elaborate mechanism of tracking
per-tile query results. So make it into vfuncs so generation specific
backend can use it when it makes sense.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
In particular, move per-shader-stage info out to a seperate array of
enum's indexed by shader stage. This will make it easier to add more
shader stages as well as new per-stage state (like SSBOs).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Note that this involves juggling around a bit when we emit and clear
texture state. So split out from the patch that adds the helper to set
all state dirty.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
This will simplify things when we break out per-shader-stage dirty bits.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Make this an array indexed by shader stage, as is done elsewhere for
other per-shader-stage state. This will simplify things as more shader
stages are eventually added.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7f12.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7f12.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Fixes some issues at least with GMEM bypass mode, where we'd sometimes
end up with some FS quads not hitting memory.
Signed-off-by: Rob Clark <[email protected]>
|