| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
I never actually implemented the stubbed out fence stuff back in the
early days. Fix that.
We'll need a few libdrm_freedreno changes to handle timeout properly,
so ignore that for now to avoid a libdrm_freedreno dependency bump.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The optimization of avoiding restore (mem2gmem) if there was a clear
falls down a bit if you don't have a fullscreen scissor. We need to
make the decision logic a bit more clever to keep track of *what* was
cleared, so that we can (a) completely skip mem2gmem if entire buffer
was cleared, or (b) skip mem2gmem on a per-tile basis for tiles that
were completely cleared.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy(). So let it clean up before we yank the
transfer_pool from under it.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there
is potentially pending rendering. Track this better, and add fd_wfi()
calls everywhere that might potentially need CP_WAIT_FOR_IDLE.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The ctx should hold ref to dev to avoid problems if screen is destroyed
before ctx. Doesn't really fix the egl/glx issues, but at least it
prevents things from getting much worse.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Use u_primconvert to convert unsupported primitives into supported
primitive plus index buffer.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Because of how the tiling works, we can't really flush at arbitrary
points very easily. So wraparound is handled by resetting to top of
ringbuffer. Previously this would stall until current rendering is
complete. Instead cycle through multiple ringbuffers to avoid a stall.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Don't crash when no color buffer bound. Something caught when starting
to run piglit, fixes a hanful of piglit tests.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GPU (at least a3xx, but I think also a2xx) can render directly to
memory, bypassing tiling. Although it can't do this if blend, depth,
and a few other features of the pipeline are enabled. This direct
memory mode can be faster for some sorts of operations, such as simple
blits. In particular, this significantly speeds up XA by avoiding to
pull the entire dest pixmap into GMEM, render tiles, and write it all
back out again. This should also speed up resource copy-region and
blit.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Split the parts that are specific to adreno a2xx series GPUs from the
parts that will be in common with a3xx, so that a3xx support can be
added more cleanly.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It should be unsigned, not enum pipe_flush_flags.
Fixed a build error:
src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error:
invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive]
v2: replace all occurrences of enum pipe_flush_flags by unsigned
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
[olv: document the parameter now that the type is unsigned]
|
|
|
|
|
|
|
|
|
|
| |
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <[email protected]>
|
|
Currently works on a220. Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.
The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around. I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.
v1: original
v2: build file updates from review comments, and remove GPL licensed
header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
format issues, resource_transfer fixes, scissor fixes
Signed-off-by: Rob Clark <[email protected]>
|