aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno/Makefile.sources
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: Add a6xx backendKristian H. Kristensen2018-08-161-0/+31
| | | | | | | | | | This adds a freedreno backend for the a6xx generation GPUs, which at the time of this commit is about 98% GLES2 conformant. Much remains to be done - both performance work and feature work towards more recent GLES versions, but this is a good start. Signed-off-by: Kristian H. Kristensen <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: perfmance countersRob Clark2018-07-181-0/+1
| | | | | | AMD_performance_monitor support Signed-off-by: Rob Clark <[email protected]>
* freedreno: batch query support (perfcounters)Rob Clark2018-07-181-0/+1
| | | | | | | Core infrastructure for performance counters, using gallium's batch query interface (to support AMD_performance_monitor). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove lower_if_else passRob Clark2018-02-101-1/+0
| | | | | | Now that it is unused. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: texture tilingRob Clark2018-01-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Overall a nice 5-10% gain for most games. And more for things like glmark2 texture benchmark. There are some rough edges. In particular, the hardware seems to only support tiling or component swap. (Ie. from hw PoV, ARGB/ABGR/RGBA/ BGRA are all the same format but with different component swap.) For tiled formats, only ARGB is possible. This isn't a big problem for *sampling* since we also have swizzle state there (and since util_format_compose_swizzles() already takes into account the component order, we didn't use COLOR_SWAP for sampling). But it is a problem if you try to render to a tiled BGRA (for example) surface. The next patch introduces a workaround for blitter, so we can generate tiled textures in ABGR/RGBA/BGRA, but that doesn't help the render- target case. To handle that, I think we'd need to keep track that the tiled format is different from the linear format, which seems like it would get extra fun with sampler views/etc. So for now, disabled by default, enable with FD_MESA_DEBUG=ttile. In practice it works fine for all the games I've tried, but makes piglit grumpy. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: add a5xx blitterRob Clark2017-12-171-0/+2
| | | | | | FD_MESA_DEBUG=noblit to disable Signed-off-by: Rob Clark <[email protected]>
* freedreno: add generic blitterRob Clark2017-12-171-0/+2
| | | | | | | Basically a clone of util_blitter_blit() but with special handling to blit PIPE_BUFFER as a PIPE_TEXTURE_1D. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xxIlia Mirkin2017-11-251-0/+1
| | | | | | | | | | Unfortunately Adreno A4xx hardware returns incorrect results with the GATHER4 opcodes. As a result, we have to lower to 4 individual texture calls (txl since we have to force lod to 0). We achieve this using offsets, including on cube maps which normally never have offsets. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a5xx: image supportRob Clark2017-11-121-0/+2
|
* freedreno/a5xx: compute shader supportRob Clark2017-05-041-0/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add support for hw accumulating queriesRob Clark2017-04-221-0/+2
| | | | | | | | | | | | | Some queries on a4xx and all queries on a5xx can do result accumulation on CP so we don't need to track per-tile samples. We do still need to handle pausing/resuming while switching batches (in case the query is active over multiple draws which are executed out of order). So introduce new accumulated-query helpers for these sorts of queries, since it doesn't really fit in cleanly with the original query infra- structure. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: initial supportRob Clark2016-11-301-0/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add batch-cache and batch reorderingRob Clark2016-07-301-0/+2
| | | | | | | | | | | | Note that I originally also had a entry-point that would construct a key and do lookup from a pipe_surface. I ended up not needing that (yet?) but it is easy-enough to re-introduce later if we need it for the blit path. For now, not enabled by default, but can be enabled (on a3xx/a4xx) with FD_MESA_DEBUG=reorder. Signed-off-by: Rob Clark <[email protected]>
* freedreno: introduce fd_batchRob Clark2016-07-301-0/+2
| | | | | | | | | | | | | | | | | | | Introduce the batch object, to track a batch/submit's worth of ringbuffers and other bookkeeping. In this first step, just move the ringbuffers into batch, since that is mostly uninteresting churn. For now there is just a single batch at a time. Note that one outcome of this change is that rb's are allocated/freed on each use. But the expectation is that the bo pool in libdrm_freedreno will save us the GEM bo alloc/free which was the initial reason to implement a rb pool in gallium. The purpose of the batch is to eventually facilitate out-of-order rendering, with batches associated to framebuffer state, and tracking the dependencies on other batches. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix sin/cosRob Clark2016-04-251-0/+3
| | | | | | | | | | We seem to need range reduction to get sane results. Fixes glmark2 jellyfish bench, and a whole bunch of dEQP-GLES3.functional.shaders.builtin_functions.precision.{sin,cos,tan}.* v2: squashed in android build fixes from Rob Herring Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor NIR IR handlingRob Clark2016-01-031-0/+1
| | | | | | | | | Immediately convert into NIR and do an initial key-agnostic lowering/ optimization pass. This should let us share most of the per-variant transformations between each variant, and hopefully minimize the draw- time variant creation part of the compilation process. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: introduce ir3_compiler objectRob Clark2015-06-211-0/+1
| | | | | | | | Right now, just provides a cleaner way to get at the gpu-id, given the separation between compiler and context. But we will need this also to hold the reg-set for new register allocation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove tgsi f/eRob Clark2015-06-211-2/+0
| | | | | | Also remove ir3_flatten which was only used by tgsi f/e. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop dot graph dumpingRob Clark2015-06-211-1/+1
| | | | | | | | At least for now.. right now the instruction and instruction list printing should suffice, and the re-working of ir3_block would require a lot of changes in that code. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3/nir: lower if/elseRob Clark2015-04-171-0/+2
| | | | | | | For now, completely flatten if/else blocks. That will almost certainly change once we have flow control. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add NIR compilerRob Clark2015-04-051-0/+1
| | | | | | | | | | | | | | | | The NIR compiler frontend is an alternative to the TGSI f/e, producing the same ir3 IR and using the same backend passes for scheduling, etc. It is not enabled by default yet, as there are still some regressions. To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for example, xonotic or supertuxkart. With the NIR f/e, scalarizing and a number of other lowering steps happen in NIR, so we don't have to do them in ir3. Which simplifies the f/e and allows the lowered instructions to pass through other optimization stages. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove old compilerRob Clark2015-03-151-1/+0
| | | | | | | Now that piglit is no longer falling back to old compiler for any tests, we can remove it. Hurray \o/ Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: simplify RARob Clark2015-01-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split out legalize passRob Clark2014-12-231-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: fd4_util -> fd4_formatRob Clark2014-12-041-2/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fd3_util -> fd3_formatIlia Mirkin2014-11-291-2/+2
| | | | | | | All the "util" helpers are actually format-related Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: add missing headers in Makefile.sourcesEmil Velikov2014-11-161-1/+14
| | | | | | ... or autotools will fail to pick them up for the distribution tarball. Signed-off-by: Emil Velikov <[email protected]>
* freedreno: add adreno 420 supportRob Clark2014-11-151-0/+14
| | | | | | | | Very initial support. Basic stuff working (es2gears, es2tri, and maybe about half of glmark2). Expect broken stuff. Still missing: mem->gmem (restore), queries, mipmaps (blob segfaults!), hw binning, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno: use tgsi_loweringRob Clark2014-10-141-2/+0
| | | | | | | Now that the freedreno_lowering code is moved to tgsi_lowering, remove our private copy and switch over to using the common version. Signed-off-by: Rob Clark <[email protected]>
* gallium/freedreno: ship all files in the tarballEmil Velikov2014-09-051-12/+63
| | | | | | | | | | | - include all headers in Makefile.sources - sort the list(s) - bundle the android build Cc: [email protected] Cc: Rob Clark <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]>
* freedreno/ir3: split out shader compiler from a3xxRob Clark2014-07-251-11/+14
| | | | | | | | | | | | | | | | | | | | | | Move the bits we want to share between generations from fd3_program to ir3_shader. So overall structure is: fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3 |- ... \- ir3_shader_variant -> ir3 So the ir3_shader becomes the topmost generation neutral object, which manages the set of variants each of which generates, compiles, and assembles it's own ir. There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/, etc. Keep the split between the gallium level stateobj and the shader helper object because it might be a good idea to pre-compute some generation specific register values (ie. anything that is independent of linking). Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: occlusion query supportRob Clark2014-05-131-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add support for hw queriesRob Clark2014-05-131-0/+1
| | | | | | | | | | | Real GPU queries need some infrastructure to track samples per tile and accumulate the results. But fortunately this can be shared across GPU generation. See: https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries Signed-off-by: Rob Clark <[email protected]>
* freedreno/query: allow multiple query implementationsRob Clark2014-05-131-0/+1
| | | | | | | | Split out fd_query into an abstract base class, to allow multiple implementations. The current sw based queries are moved into fd_sw_query. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: drop hand-coded blit/solid shadersRob Clark2014-02-231-0/+1
| | | | | | | | | Instead in the common code, construct these shaders from TGSI. For now we let a2xx keep it's hand coded shaders, as it's compiler isn't quite up to the job yet. All the same it is a net drop in code size and gets rid of special cases. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: new compilerRob Clark2014-02-031-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new compiler generates a dependency graph of instructions, including a few meta-instructions to handle PHI and preserve some extra information needed for register assignment, etc. The depth pass assigned a weight/depth to each node (based on sum of instruction cycles of a given node and all it's dependent nodes), which is used to schedule instructions. The scheduling takes into account the minimum number of cycles/slots between dependent instructions, etc. Which was something that could not be handled properly with the original compiler (which was more of a naive TGSI translator than an actual compiler). The register assignment is currently split out as a standalone pass. I expect that it will be replaced at some point, once I figure out what to do about relative addressing (which is currently the only thing that should cause fallback to old compiler). There are a couple new debug options for FD_MESA_DEBUG env var: optmsgs - enable debug prints in optimizer optdump - dump instruction graph in .dot format, for example: http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot At this point, thanks to proper handling of instruction scheduling, the new compiler fixes a lot of things that were broken before, and does not appear to break anything that was working before[1]. So even though it is not finished, it seems useful to merge it in it's current state. [1] Not merged in this commit, because I'm not sure if it really belongs in mesa tree, but the following commit implements a simple shader emulator, which I've used to compare the output of the new compiler to the original compiler (ie. run it on all the TGSI shaders dumped out via ST_DEBUG=tgsi with various games/apps): https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12 Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: split out old compilerRob Clark2014-02-031-0/+1
| | | | | | | | For the time being, keep old compiler as fallback for things that the new compiler does not support yet. Split out as it's own commit to make the later new-compiler commits easier to follow. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: prepare for new compilerRob Clark2014-02-031-1/+1
| | | | | | Shuffle things around to prepare for new compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add tgsi lowering passRob Clark2014-02-011-0/+1
| | | | | | | | | | | | | | | | Currently lowers the following instructions: DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4, DP3, DPH, DP2 translating these into equivalent simpler TGSI instructions. This probably should be moved to util so other drivers can use it, but just adding under freedreno for now so that I can clear out a lot of the lowering code in a3xx compiler before beginning to add new compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add basic query supportRob Clark2014-01-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add for now some simple/basic query support (ie. things not actually requiring the GPU). Might change around a bit when I actually add GPU queries, but for now this enables some useful performance info in the GALLIUM_HUD. For example: GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls The driver specific specific queries are: + draw-calls + batches - number of batches per second, sum of batches-sysmem plus batches-gmem + batches-gmem - render a set of tiles in GMEM, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve) per second + batches-sysmem - N draws to system memory (GMEM bypass) per second + restores - number of GMEM batches that required restore per second Ideally for GMEM rendering, you want batches-gmem to equal fps. If the app is doing something that triggers multiple passes (ie. requires extra round trip gmem <-> system memory) then the # of batches per second will go up relative to fps. Signed-off-by: Rob Clark <[email protected]>
* freedreno: compact a2xx and a3xx makefiles into parent onesJohannes Obermayr2013-11-161-0/+32
| | | | | | | | | Nearly everything within the three Makefile.am's is identical. Let's simplify things a little. v2: Rebase and rewrite the commit message (Emil Velikov) Signed-off-by: Emil Velikov <[email protected]>
* freedreno: consolidate C sources list into Makefile.sourcesEmil Velikov2013-10-011-0/+11
Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tom Stellard <[email protected]>