| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
We test the condition, declare a few variables, then test the exact
same condition again. Let's not do that.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 912a9c8d
Signed-off-by: Jonathan Marek <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
| |
The big thing here is the 0x60 offset for the mem2gmem copy which I missed
in my last patch.
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes the depth/stencil clear on a20x, and adds a fast clear path.
The fast clear path is only used for a20x, needs performance tests on a22x.
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the a2xx TGSI compiler with a NIR compiler.
It also adds several new features:
-gl_FrontFacing, gl_FragCoord, gl_PointCoord, gl_PointSize
-control flow (including loops)
-texture related features (LOD/bias, cubemaps)
-filling scalar ALU slot when possible
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
On a20x, set VGT_VERTEX_REUSE_BLOCK_CNTL to 2 and don't change it. Small
rearrangement on a220 to reduce the size of draw commands.
Only set DEALLOC_CNTL on a20x because the correct a220 value is not known.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only 3 vertices are used so we can drop the data for vertex 4
It doesn't make sense to have 1.1 for some coordinates, use 1.0 instead
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.
This patch was made with:
sed -ri '/-\*- mode:/,/^$/d' \
$(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
-exec grep -l -- '-\*- mode:' {} \+)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
the format of the CLEAR_COLOR register doesn't depend on the target format
this fixes clear color when rendering to 32-bit RGBA and 16-bit targets
Signed-off-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this patch adds support for a20x, which has some differences with a220:
-no VGT_MAX_VTX_INDX register
-no CLEAR_COLOR register
-set RB_BC_CONTROL in restore (hangs without)
-different CP_DRAW_INDX format
tested with kmscube and glmark2 scenes, on par with a220
Signed-off-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We probably *could* do this with blit path, but I think it would involve
clobbering settings from batch->gmem (see emit_zs()).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
My fault for not having time to test Marek's patches while they were on
list.
Fixes: 330d0607 ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
|
| |
|
|
|
|
|
|
|
|
| |
In particular, move per-shader-stage info out to a seperate array of
enum's indexed by shader stage. This will make it easier to add more
shader stages as well as new per-stage state (like SSBOs).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
a3xx/a4xx use the generic u_blitter path, which will make state dirty
bits be set appropriately thanks to the automagic of generic code
setting generic state in the driver. And a5xx has a blit/dma engine
(actually, two) so it doesn't need these extra dirty bits set.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Note that this involves juggling around a bit when we emit and clear
texture state. So split out from the patch that adds the helper to set
all state dirty.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To flush batches out of order, the gmem code needs to not depend on
state from fd_context (since that may apply to a more recent batch).
So this all moves into batch.
The one exception is the gmem/pipe/tile state itself. But this is
only used from gmem code (and batches are flushed serially). The
alternative would be having to re-calculate GMEM layout on every
batch, even if the dimensions of the render targets are the same.
Note: This opens up the possibility of pushing gmem/submit into a
helper thread.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce the batch object, to track a batch/submit's worth of
ringbuffers and other bookkeeping. In this first step, just move
the ringbuffers into batch, since that is mostly uninteresting
churn.
For now there is just a single batch at a time. Note that one
outcome of this change is that rb's are allocated/freed on each
use. But the expectation is that the bo pool in libdrm_freedreno
will save us the GEM bo alloc/free which was the initial reason
to implement a rb pool in gallium.
The purpose of the batch is to eventually facilitate out-of-order
rendering, with batches associated to framebuffer state, and
tracking the dependencies on other batches.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Pretty much only happens if shader variant compile fails. But in this
case, if we haven't emitted cmdstream, we don't want to set needs_flush.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This appears to need the A2XX version of the point list, so select it at
draw time if necessary.
Experimentally, always using the A2XX version causes hangs when PSIZE
isn't actually emitted.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Gets rid of a namespace conflict w/ a4xx which wants an fd4_draw()
version of fd_draw()..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws. Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.
Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Still some open questions.. and at any rate, no additional piglit passes
due to various wrap modes that we need to emulate in at least some
cases :-(
But it does fix some mystery page-faults.. So add some comments in the
code where there are things that we need to emulate or do more r/e, and
push as-is.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
util_color often merely represents a collection of bytes, however it is
inconvenient if those bytes can only be accessed as floats/doubles for int
formats exceeding 32bits.
(Note that since rgba8 formats use one uint, not 4 bytes, hence the byte and
short member were left as is.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Mostly just to give an easy debug/instrumentation point.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Have a single helper that all draws come through.. mainly for a
convenient debug and instrumentation point.
Signed-off-by: Rob Clark <[email protected]>
|
|
Split the parts that are specific to adreno a2xx series GPUs from the
parts that will be in common with a3xx, so that a3xx support can be
added more cleanly.
Signed-off-by: Rob Clark <[email protected]>
|