| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some queries on a4xx and all queries on a5xx can do result accumulation
on CP so we don't need to track per-tile samples. We do still need to
handle pausing/resuming while switching batches (in case the query is
active over multiple draws which are executed out of order).
So introduce new accumulated-query helpers for these sorts of queries,
since it doesn't really fit in cleanly with the original query infra-
structure.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that I originally also had a entry-point that would construct a key
and do lookup from a pipe_surface. I ended up not needing that (yet?)
but it is easy-enough to re-introduce later if we need it for the blit
path.
For now, not enabled by default, but can be enabled (on a3xx/a4xx) with
FD_MESA_DEBUG=reorder.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce the batch object, to track a batch/submit's worth of
ringbuffers and other bookkeeping. In this first step, just move
the ringbuffers into batch, since that is mostly uninteresting
churn.
For now there is just a single batch at a time. Note that one
outcome of this change is that rb's are allocated/freed on each
use. But the expectation is that the bo pool in libdrm_freedreno
will save us the GEM bo alloc/free which was the initial reason
to implement a rb pool in gallium.
The purpose of the batch is to eventually facilitate out-of-order
rendering, with batches associated to framebuffer state, and
tracking the dependencies on other batches.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We seem to need range reduction to get sane results. Fixes glmark2
jellyfish bench, and a whole bunch of
dEQP-GLES3.functional.shaders.builtin_functions.precision.{sin,cos,tan}.*
v2: squashed in android build fixes from Rob Herring
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass. This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Right now, just provides a cleaner way to get at the gpu-id, given the
separation between compiler and context. But we will need this also to
hold the reg-set for new register allocation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Also remove ir3_flatten which was only used by tgsi f/e.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
At least for now.. right now the instruction and instruction list
printing should suffice, and the re-working of ir3_block would require
a lot of changes in that code.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
For now, completely flatten if/else blocks. That will almost certainly
change once we have flow control.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR compiler frontend is an alternative to the TGSI f/e, producing
the same ir3 IR and using the same backend passes for scheduling, etc.
It is not enabled by default yet, as there are still some regressions.
To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for
example, xonotic or supertuxkart.
With the NIR f/e, scalarizing and a number of other lowering steps
happen in NIR, so we don't have to do them in ir3. Which simplifies the
f/e and allows the lowered instructions to pass through other
optimization stages.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Now that piglit is no longer falling back to old compiler for any tests,
we can remove it. Hurray \o/
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.
NOTE: has the slight problem that it can't optimize out mov's for things
like:
MOV OUT[n], IN[m]
To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
All the "util" helpers are actually format-related
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
... or autotools will fail to pick them up for the distribution tarball.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Very initial support. Basic stuff working (es2gears, es2tri, and maybe
about half of glmark2). Expect broken stuff. Still missing: mem->gmem
(restore), queries, mipmaps (blob segfaults!), hw binning, etc.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Now that the freedreno_lowering code is moved to tgsi_lowering, remove
our private copy and switch over to using the common version.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android build
Cc: [email protected]
Cc: Rob Clark <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the bits we want to share between generations from fd3_program to
ir3_shader. So overall structure is:
fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
|- ...
\- ir3_shader_variant -> ir3
So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.
There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.
Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Split out fd_query into an abstract base class, to allow multiple
implementations. The current sw based queries are moved into
fd_sw_query.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
For the time being, keep old compiler as fallback for things that the
new compiler does not support yet. Split out as it's own commit to make
the later new-compiler commits easier to follow.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Shuffle things around to prepare for new compiler.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently lowers the following instructions:
DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4,
DP3, DPH, DP2
translating these into equivalent simpler TGSI instructions.
This probably should be moved to util so other drivers can use
it, but just adding under freedreno for now so that I can clear
out a lot of the lowering code in a3xx compiler before beginning
to add new compiler.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Nearly everything within the three Makefile.am's is identical.
Let's simplify things a little.
v2: Rebase and rewrite the commit message (Emil Velikov)
Signed-off-by: Emil Velikov <[email protected]>
|
|
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|