summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERSMarek Olšák2014-02-041-2/+0
| | | | | | | This can be derived from the shader caps. All GPUs from ATI/AMD, NVIDIA, and INTEL have separate texture slots for each shader stage.
* freedreno: enabling binning and opt by defaultRob Clark2014-02-033-16/+11
| | | | | | | | | Hw binning pass doesn't seem to have broken anything. And optimizing compiler fixes a lot of shaders and doesn't seem to break anything. So re-org slightly FD_MESA_DEBUG params and make both hw binning and optimizer enabled by default. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: new compilerRob Clark2014-02-0317-209/+2777
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new compiler generates a dependency graph of instructions, including a few meta-instructions to handle PHI and preserve some extra information needed for register assignment, etc. The depth pass assigned a weight/depth to each node (based on sum of instruction cycles of a given node and all it's dependent nodes), which is used to schedule instructions. The scheduling takes into account the minimum number of cycles/slots between dependent instructions, etc. Which was something that could not be handled properly with the original compiler (which was more of a naive TGSI translator than an actual compiler). The register assignment is currently split out as a standalone pass. I expect that it will be replaced at some point, once I figure out what to do about relative addressing (which is currently the only thing that should cause fallback to old compiler). There are a couple new debug options for FD_MESA_DEBUG env var: optmsgs - enable debug prints in optimizer optdump - dump instruction graph in .dot format, for example: http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot At this point, thanks to proper handling of instruction scheduling, the new compiler fixes a lot of things that were broken before, and does not appear to break anything that was working before[1]. So even though it is not finished, it seems useful to merge it in it's current state. [1] Not merged in this commit, because I'm not sure if it really belongs in mesa tree, but the following commit implements a simple shader emulator, which I've used to compare the output of the new compiler to the original compiler (ie. run it on all the TGSI shaders dumped out via ST_DEBUG=tgsi with various games/apps): https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12 Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: split out old compilerRob Clark2014-02-036-1/+1531
| | | | | | | | For the time being, keep old compiler as fallback for things that the new compiler does not support yet. Split out as it's own commit to make the later new-compiler commits easier to follow. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: prepare for new compilerRob Clark2014-02-039-146/+308
| | | | | | Shuffle things around to prepare for new compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: remove useless reg tracking in disasm-a3xxRob Clark2014-02-031-174/+0
| | | | | | | Not really used for anything anymore. So strip it out and avoid conflicting symbols with upcoming new-compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno: better manage our WFI'sRob Clark2014-02-017-24/+36
| | | | | | | | Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there is potentially pending rendering. Track this better, and add fd_wfi() calls everywhere that might potentially need CP_WAIT_FOR_IDLE. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add logicopRob Clark2014-02-013-6/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: handle frag z writeRob Clark2014-02-017-25/+53
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2014-02-014-9/+39
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix const confusionRob Clark2014-02-012-9/+9
| | | | | | | | | | | | Gallium can leave const buffers bound above what is used by the current shader. Which can have a couple bad effects: 1) write beyond const space assigned, which can trigger HLSQ lockup 2) double emit of immed consts, first with bound const buffer vals followed by with actual immed vals. This seems to be a sort of undefined condition. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: compiler cleanupsRob Clark2014-02-017-145/+198
| | | | | | Drop color/pos/psize_regid, plus a few compiler and IR cleanups. Signed-off-by: Rob Clark <[email protected]>
* freedreno/compiler/a3xx: remove lowered instructionsRob Clark2014-02-011-354/+0
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add tgsi lowering passRob Clark2014-02-014-2/+1229
| | | | | | | | | | | | | | | | Currently lowers the following instructions: DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4, DP3, DPH, DP2 translating these into equivalent simpler TGSI instructions. This probably should be moved to util so other drivers can use it, but just adding under freedreno for now so that I can clear out a lot of the lowering code in a3xx compiler before beginning to add new compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: add CLAMPRob Clark2014-02-011-7/+24
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: various fixesRob Clark2014-02-011-14/+34
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: ctx should hold ref to devRob Clark2014-02-016-2/+8
| | | | | | | | The ctx should hold ref to dev to avoid problems if screen is destroyed before ctx. Doesn't really fix the egl/glx issues, but at least it prevents things from getting much worse. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add prims-emitted driver queryRob Clark2014-02-011-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: Set PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT to 64Ian Romanick2014-01-291-0/+3
| | | | | | | | Allocations actually have page alignment, but 64 is still a reasonable value. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* gallium: remove PIPE_CAP_SCALED_RESOLVEMarek Olšák2014-01-231-1/+0
| | | | | | | If any driver doesn't support this, it can use a blit after resolving the samples. Reviewed-by: Brian Paul <[email protected]>
* freedreno: add basic query supportRob Clark2014-01-088-1/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add for now some simple/basic query support (ie. things not actually requiring the GPU). Might change around a bit when I actually add GPU queries, but for now this enables some useful performance info in the GALLIUM_HUD. For example: GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls The driver specific specific queries are: + draw-calls + batches - number of batches per second, sum of batches-sysmem plus batches-gmem + batches-gmem - render a set of tiles in GMEM, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve) per second + batches-sysmem - N draws to system memory (GMEM bypass) per second + restores - number of GMEM batches that required restore per second Ideally for GMEM rendering, you want batches-gmem to equal fps. If the app is doing something that triggers multiple passes (ie. requires extra round trip gmem <-> system memory) then the # of batches per second will go up relative to fps. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: use cs patch instead of RFI+RMWRob Clark2014-01-088-52/+46
| | | | | | | | Since we now have the cmdstream patch mechanism needed for hw binning, might as well also use it for RB_RENDER_CONTROL updates. This avoids the need to use RMW (and associated WFI) to update RB_RENDER_CONTROL. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: support for hw binning passRob Clark2014-01-0815-158/+706
| | | | | | | | | | | | | | | | | | | | | | | The binning pass sorts vertices into which bins/tiles they apply to. The visibility information generated during the binning pass can be used to speed up the rendering pass by filtering out vertices which do not apply to the current tile. See: https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach This brings a significant fps boost. A rough assortment of tests (supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems to yield a ~35-45% fps improvement. For now, to be conservative, the binning pass is not enabled yet by default. To enable it use: FD_MESA_DEBUG=binning So far I haven't found anything that breaks with binning enabled, but I'd like a bit more testing before I enable it as default. Signed-off-by: Rob Clark <[email protected]>
* freedreno: be more clever about gmem usageRob Clark2014-01-082-9/+18
| | | | | | Only need to leave room for depth/stencil if it is actually used, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2014-01-085-24/+214
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix blend state corruption issueRob Clark2013-12-268-33/+66
| | | | | | | | | | | | | | | | | | | | | | Using RMW on banked context registers is not safe. The value read could be the wrong one. So if there has been a DRAW_IDX launched, the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part of RMW sees the correct value. To avoid unnecessary WFI's, keep track if there is a need for WFI, and only emit one if needed. Furthermore, keep track if we even need to update the register in the first place. And to cut down on the amount of RMW to avoid excessive WFI's, at the tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the state at beginning of draw/clear cmds (which we IB to) is always undefined. In the draw/clear commands, we always still use RMW (with WFI if needed), but only if the register value actually changes. (At points where the current value cannot be known, the saved value is reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there never is chance for confusion.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: prepare for hw binningRob Clark2013-12-269-142/+159
| | | | | | | | | | | | Actually assign VSC_PIPE's properly, which will be needed for tiling. And introduce fd_tile for per-tile state (including the assignment of tile to VSC_PIPE). This gives us the proper pipe setup that we'll need for hw binning pass, and also cleans things up a bit by not having to pass so many parameters around. And will also make it easier to introduce different tiling patterns (since we may no longer render tiles in a simple left-to-right top-to-bottom pattern). Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2013-12-266-76/+131
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: dummy-draw workaround for a320Rob Clark2013-12-142-1/+17
| | | | | | Fixes gpu lockups in supertuxkart. Signed-off-by: Rob Clark <[email protected]>
* gallium/winsys/drm: Prepare for passing prime fds in winsys_handleChristopher James Halse Rogers2013-12-101-0/+5
| | | | | | Signed-off-by: Christopher James Halse Rogers <[email protected]> Reviewed-by: Thomas Hellstrom <[email protected]> Signed-off-by: Maarten Lankhorst <[email protected]>
* freedreno/a3xx: add adreno 330 supportRob Clark2013-12-072-4/+7
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: add ROUNDRob Clark2013-12-071-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium: add support for AMD_vertex_shader_layerMarek Olšák2013-12-031-0/+1
|
* freedreno: Add a few texture formatsAndreas Heider2013-12-021-0/+3
|
* gallium: new shader cap bit for the amount of sampler viewsRoland Scheidegger2013-11-281-0/+1
| | | | | | | | | Ever since introducing separate sampler and sampler view max this was really missing. Every driver but llvmpipe reports the same number as number of samplers for now, so nothing should break. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/drivers: compact compiler flags into Automake.incEmil Velikov2013-11-161-6/+4
| | | | | | | | | | * minimise flags duplication * distingush between VISIBILITY C and CXX flags * set only required flags - C and/or CXX v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash) Signed-off-by: Emil Velikov <[email protected]>
* gallium/drivers: enable automake subdir-objectsEmil Velikov2013-11-161-0/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* freedreno: compact a2xx and a3xx makefiles into parent onesJohannes Obermayr2013-11-166-66/+36
| | | | | | | | | Nearly everything within the three Makefile.am's is identical. Let's simplify things a little. v2: Rebase and rewrite the commit message (Emil Velikov) Signed-off-by: Emil Velikov <[email protected]>
* freedreno/a3xx/texture: min/max lodRob Clark2013-11-011-5/+3
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: update envytools headersRob Clark2013-11-014-8/+22
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix VS out / FS in linkingRob Clark2013-11-013-7/+47
| | | | | | | | Actually link VS out / FS in based on semantic info, keeping in mind that position/pointsize can also be an input to the FS. This fixes a few fragment shaders which were using gl_Position. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: allow num_samplers != num_texturesRob Clark2013-11-012-56/+55
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: highp frag shaderRob Clark2013-11-014-12/+14
| | | | | | | | | | | | | | Fixes use of full-precision in fragment shader (ie. don't clobber r0.x since that can be used by future bary instructions for varying fetch). And makes use of full-precision the default in fragment shader (but can be overriden via FD_MESA_DEBUG=fraghalf). Seems like half precision is often not enough for texture coordinates. The blob compiler is clever enough to keep texture coords in full precision registers while using half precision for everything else. But we aren't quite that clever yet, so better to default to full precision. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: relative addressing fixes.Rob Clark2013-11-011-28/+48
| | | | | | | Handle some relative addressing constraints: cannot handle const or relative in cat5 and src2 of cat3. Signed-off-by: Rob Clark <[email protected]>
* freedreno: we do actually support sqrtRob Clark2013-11-012-0/+8
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: emulated unsupported primitive typesRob Clark2013-10-295-25/+74
| | | | | | | Use u_primconvert to convert unsupported primitives into supported primitive plus index buffer. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2013-10-296-125/+238
| | | | | | pull in some fixes to draw-initiator/prim-type. Signed-off-by: Rob Clark <[email protected]>
* gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZESIlia Mirkin2013-10-261-0/+1
| | | | | | | | | This CAP will determine whether ARB_framebuffer_object can be enabled. The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf textures. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* freedreno/a3xx/compiler: relative addressingRob Clark2013-10-241-1/+123
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix const/rel/const-rel encodingRob Clark2013-10-244-88/+300
| | | | | | | | | | | | | | | | | | | | | | | | The encoding of constant, relative, and relative-const src registers is a bit more complex than originally thought, which gives an extra bit to encode const reg # at expense of taking a bit from relative offset. In most cases a3xx seems to actually use a scheme whereby it can encode an extra bit for const register. You have three possible encodings in thirteen bits: register: (11 bits for N.c) 00........... rN.c relative: (10 bits for N) 010.......... r<a0.x + N> 011.......... c<a0.x + N> const: (12 bits for N.c) 1............ cN.c Which means we can deal w/ more consts than previously thought. Signed-off-by: Rob Clark <[email protected]>