aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno/freedreno_gmem.c
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: per-generation OUT_IB packetRob Clark2016-01-181-2/+2
| | | | | | | | | | Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <[email protected]>
* freedreno: small bit of cleanup about max rendertargetsRob Clark2015-08-041-5/+10
| | | | | | | | We hard-coded 4 or 8 as the max in various places. Switch it all to a define since the limit will go up with a4xx (and maybe even again in the future?) Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix bug in tile/slot calculationRob Clark2015-05-141-5/+4
| | | | | | | | | This was causing corruption with hw binning on a306. Unlikely that it is a306 specific, but rather the smaller gmem size resulted in different tile configuration which was triggering the bug at certain resolutions. Signed-off-by: Rob Clark <[email protected]> Cc: "10.4" and "10.5" and "10.6" <[email protected]>
* freedreno/a3xx: add support for S8 and Z32F_S8Ilia Mirkin2015-04-271-10/+19
| | | | | | | | | Enables ARB_depth_buffer_float. There is no sampling support for interleaved Z32F_S8, so we store the two textures separately, one as Z32F, the other as S8. As a result, we need a lot of additional logic for restores and transfers. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: don't bother setting resource timestampsIlia Mirkin2015-04-051-9/+0
| | | | | | | Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to keep separate timestamps around. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add support for laying out MRTs in gmemIlia Mirkin2015-04-021-14/+39
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add core infrastructure support for MRTsIlia Mirkin2015-04-021-3/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add is_a3xx()/is_a4xx() helpersRob Clark2014-12-131-1/+3
| | | | | | | | A bunch of open-coded 'gpu_id > 300's seems like it will eventually cause problems with future generations. There were already a few minor problems with caps for features that still need additional work on a4xx. Signed-off-by: Rob Clark <[email protected]>
* freedreno: rename a couple debug flagsRob Clark2014-10-251-2/+2
| | | | | | | | | dscis -> noscis dbypass -> nobypass a bit more consistant w/ nobin, etc. And IMO a bit more sensible names. Signed-off-by: Rob Clark <[email protected]>
* freedreno: clear vs scissorRob Clark2014-10-211-2/+46
| | | | | | | | | | | The optimization of avoiding restore (mem2gmem) if there was a clear falls down a bit if you don't have a fullscreen scissor. We need to make the decision logic a bit more clever to keep track of *what* was cleared, so that we can (a) completely skip mem2gmem if entire buffer was cleared, or (b) skip mem2gmem on a per-tile basis for tiles that were completely cleared. Signed-off-by: Rob Clark <[email protected]>
* freedreno: "fix" problems with excessive flushesRob Clark2014-09-121-24/+1
| | | | | | | | | | | | | | | | | | | | | | | 4f338c9b introduced logic to trigger a flush rather than overflowing cmdstream buffer. But the threshold was too low, triggering flushes where they were not needed. This caused problems with games like xonotic. Part of the problem is that we need to mark all state dirty between cmdstream submit ioctls, because we cannot rely on state being preserved across ioctls. But even with that, there are still some problems that are still being debugged. For now: 1) correctly mark all state dirty 2) introduce FD_MESA_DEBUG flush flag to force rendering to be flushed between each draw, to trigger problems (so that I can debug) 3) use a more reasonable threshold so for normal usecases we don't trigger the problems This at least corrects the regression, but there is still more debugging to do. Signed-off-by: Rob Clark <[email protected]>
* freedreno: try for more squarish tile dimensionsRob Clark2014-06-131-3/+9
| | | | | | Worth about ~0.5fps in xonotic, for example. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add support for hw queriesRob Clark2014-05-131-1/+18
| | | | | | | | | | | Real GPU queries need some infrastructure to track samples per tile and accumulate the results. But fortunately this can be shared across GPU generation. See: https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries Signed-off-by: Rob Clark <[email protected]>
* WIP: freedreno/a3xx: incorrect scissor for binning passRob Clark2014-03-051-0/+2
| | | | | | | | | | | | | If scissor optimization is used (to avoid bringing scissored portions of the render target into GMEM and then back out to system memory) in combination with hw binning pass, the result would be a scissor mismatch between binning pass and rendering pass. This would cause rendering bugs in some scenarios with (for example) gnome-shell. I would have expected that simply using the correct screen-scissor during the binning pass would be enough, but seems like there is something else missing. So for now disable binning pass if scissor optimization is used.
* freedreno: better manage our WFI'sRob Clark2014-02-011-1/+5
| | | | | | | | Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there is potentially pending rendering. Track this better, and add fd_wfi() calls everywhere that might potentially need CP_WAIT_FOR_IDLE. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add basic query supportRob Clark2014-01-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add for now some simple/basic query support (ie. things not actually requiring the GPU). Might change around a bit when I actually add GPU queries, but for now this enables some useful performance info in the GALLIUM_HUD. For example: GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls The driver specific specific queries are: + draw-calls + batches - number of batches per second, sum of batches-sysmem plus batches-gmem + batches-gmem - render a set of tiles in GMEM, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve) per second + batches-sysmem - N draws to system memory (GMEM bypass) per second + restores - number of GMEM batches that required restore per second Ideally for GMEM rendering, you want batches-gmem to equal fps. If the app is doing something that triggers multiple passes (ie. requires extra round trip gmem <-> system memory) then the # of batches per second will go up relative to fps. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: support for hw binning passRob Clark2014-01-081-24/+76
| | | | | | | | | | | | | | | | | | | | | | | The binning pass sorts vertices into which bins/tiles they apply to. The visibility information generated during the binning pass can be used to speed up the rendering pass by filtering out vertices which do not apply to the current tile. See: https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach This brings a significant fps boost. A rough assortment of tests (supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems to yield a ~35-45% fps improvement. For now, to be conservative, the binning pass is not enabled yet by default. To enable it use: FD_MESA_DEBUG=binning So far I haven't found anything that breaks with binning enabled, but I'd like a bit more testing before I enable it as default. Signed-off-by: Rob Clark <[email protected]>
* freedreno: be more clever about gmem usageRob Clark2014-01-081-9/+17
| | | | | | Only need to leave room for depth/stencil if it is actually used, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix blend state corruption issueRob Clark2013-12-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | Using RMW on banked context registers is not safe. The value read could be the wrong one. So if there has been a DRAW_IDX launched, the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part of RMW sees the correct value. To avoid unnecessary WFI's, keep track if there is a need for WFI, and only emit one if needed. Furthermore, keep track if we even need to update the register in the first place. And to cut down on the amount of RMW to avoid excessive WFI's, at the tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the state at beginning of draw/clear cmds (which we IB to) is always undefined. In the draw/clear commands, we always still use RMW (with WFI if needed), but only if the register value actually changes. (At points where the current value cannot be known, the saved value is reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there never is chance for confusion.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: prepare for hw binningRob Clark2013-12-261-29/+69
| | | | | | | | | | | | Actually assign VSC_PIPE's properly, which will be needed for tiling. And introduce fd_tile for per-tile state (including the assignment of tile to VSC_PIPE). This gives us the proper pipe setup that we'll need for hw binning pass, and also cleans things up a bit by not having to pass so many parameters around. And will also make it easier to introduce different tiling patterns (since we may no longer render tiles in a simple left-to-right top-to-bottom pattern). Signed-off-by: Rob Clark <[email protected]>
* freedreno: add debug option to disable GMEM bypassRob Clark2013-09-141-1/+1
| | | | | | Useful for debugging. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix segfault when no color buffer boundRob Clark2013-08-241-7/+11
| | | | | | | Don't crash when no color buffer bound. Something caught when starting to run piglit, fixes a hanful of piglit tests. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add debug option to disable scissor optimizationRob Clark2013-08-241-10/+16
| | | | | | Useful for testing and debugging. Signed-off-by: Rob Clark <[email protected]>
* freedreno: gmem bypassRob Clark2013-06-081-15/+47
| | | | | | | | | | | | | The GPU (at least a3xx, but I think also a2xx) can render directly to memory, bypassing tiling. Although it can't do this if blend, depth, and a few other features of the pipeline are enabled. This direct memory mode can be faster for some sorts of operations, such as simple blits. In particular, this significantly speeds up XA by avoiding to pull the entire dest pixmap into GMEM, render tiles, and write it all back out again. This should also speed up resource copy-region and blit. Signed-off-by: Rob Clark <[email protected]>
* freedreno: prepare for a3xxRob Clark2013-06-081-317/+11
| | | | | | | | Split the parts that are specific to adreno a2xx series GPUs from the parts that will be in common with a3xx, so that a3xx support can be added more cleanly. Signed-off-by: Rob Clark <[email protected]>
* freedreno: clear fixes and debuggingRob Clark2013-04-241-0/+3
| | | | | | | | Set a few extra registers to make sure we are in proper state for clearing. And also add some debug options to mark all state dirty in clear and gmem operations to aid in debugging. Signed-off-by: Rob Clark <[email protected]>
* freedreno: use u_math macros/helpers moreRob Clark2013-04-241-7/+7
| | | | | | | | | Get rid of a few self-defined macros: ALIGN() -> align() min() -> MIN2() max() -> MAX2() Signed-off-by: Rob Clark <[email protected]>
* freedreno: set SWAP bit based on formatRob Clark2013-04-241-7/+19
| | | | | | | | Really this should be set based on buffer format, not on color vs depth/stencil. Probably there should be more formats that set the bit as we add support for more render target formats. Signed-off-by: Rob Clark <[email protected]>
* freedreno: use autogenerated register defsRob Clark2013-04-051-123/+111
| | | | | | | | | | | | | Switch to use the envytools generated headers for register/bitfield definitions. This is the first step in preparing to add a3xx support, since it avoids having conflicting names for a3xx and a2xx registers. And since I'm using envytools for a3xx it is simpler to just use it for everything. This shouldn't cause any functional change, it is really just a lot of renaming. Signed-off-by: Rob Clark <[email protected]>
* freedreno: track maximal scissor boundsRob Clark2013-03-251-84/+124
| | | | | | | | | | Optimize out parts of the render target that are scissored out by taking into account maximal scissor bounds in fd_gmem_render_tiles(). This is a big win on things like gnome-shell which frequently do partial screen updates. Signed-off-by: Rob Clark <[email protected]>
* freedreno: clear fixesRob Clark2013-03-191-1/+1
| | | | | | Some fixes for clearing only depth or only stencil. Signed-off-by: Rob Clark <[email protected]>
* freedreno: gallium driver for adrenoRob Clark2013-03-111-0/+491
Currently works on a220. Others in the a2xx family look pretty similar and should be pretty straightforward to support with the same driver. The a3xx has a new shader ISA, and while many registers appear similar, the register addresses have been completely shuffled around. I am not sure yet whether it is best to support with the same driver, but different compiler, or whether it should be split into a different driver. v1: original v2: build file updates from review comments, and remove GPL licensed header files from msm kernel v3: smarter temp/pred register assignment, fix clear and depth/stencil format issues, resource_transfer fixes, scissor fixes Signed-off-by: Rob Clark <[email protected]>