summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: unify code setting dirty_level_mask for fast clearMarek Olšák2017-11-291-14/+11
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up si_do_fast_color_clear parametersMarek Olšák2017-11-291-10/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove r600_common_context::clear_bufferMarek Olšák2017-11-295-20/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move r600_test_dma.c into si_test_dma.cMarek Olšák2017-11-298-20/+20
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_pipe_clear_buffer into si_cp_dma.cMarek Olšák2017-11-292-61/+61
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move all clear() code into si_clear.cMarek Olšák2017-11-299-719/+764
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable DCC with MSAA for VIMarek Olšák2017-11-295-2/+15
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement fast color clear for DCC with MSAA for VIMarek Olšák2017-11-291-5/+30
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a workaround for blending with DCC and MSAAMarek Olšák2017-11-291-8/+23
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clear PIPE_IMAGE_ACCESS_WRITE when it's invalid to be on the safe sideMarek Olšák2017-11-291-0/+10
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface: enable DCC computation for MSAAMarek Olšák2017-11-291-1/+2
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix layered DCC fast clearMarek Olšák2017-11-291-1/+4
| | | | | Cc: 17.2 17.3 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: lds load cleanups.Dave Airlie2017-11-291-6/+8
| | | | | | This is just some cleanups on top of the last patch from my compute branch. Signed-off-by: Dave Airlie <[email protected]>
* r600_shader: only load from LDS what is really usedGert Wollny2017-11-291-7/+26
| | | | | | | | Use the destination write mask to determine which values are really to be read from LDS and load only these. Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Gert Wollny <[email protected]>
* r600/sb: handle jump after target to end of program. (v2)Dave Airlie2017-11-291-0/+5
| | | | | | | | | | | | | | | | This fixes hangs on cayman with tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test This has a single if/else in it, and when this peephole activated, it would set the jump target to NULL if there was no instruction after the final POP. This adds a NOP if we get a jump in this case, and seems to fix the hangs, so we have a valid target for the ELSE instruction to go to, instead of 0 (which causes infinite loops). v2: update last_cf correctly. (I had some other patches hide this) Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* meson: build virgl driverDylan Baker2017-11-281-0/+39
| | | | | | | Build tested only. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: build svga driver on linuxDylan Baker2017-11-281-0/+88
| | | | | | | Build tested only. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: build r600 driverDylan Baker2017-11-281-0/+128
| | | | | | | | | v4: - Ensure inc_amd_common defined when radeonsi is disabled (needed by r600) Signed-off-by: Dylan Baker <[email protected]> Tested-by: Aaron Watry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: build r300 driverDylan Baker2017-11-281-0/+156
| | | | | | | This is build tested only Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: build i915g driverDylan Baker2017-11-281-0/+70
| | | | | | | Build tested only. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* svga: move svga_is_format_supported() to svga_format.cBrian Paul2017-11-283-121/+129
| | | | | | where the other format-related functions live. Reviewed-by: Charmaine Lee <[email protected]>
* svga: s/unsigned/SVGA3dDevCapIndex/Brian Paul2017-11-281-3/+6
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* vc4: check preprocessor token existence using #ifdef instead of #ifEric Engestrom2017-11-281-3/+3
| | | | | | | (other uses of USE_VC4_SIMULATOR are already correct) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi/gfx9: simplify condition for on-chip ESGSNicolai Hähnle2017-11-281-3/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES partNicolai Hähnle2017-11-281-1/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use si_shader_context instead of lp_build_context in more placesNicolai Hähnle2017-11-281-27/+23
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: cleanup si_initialize_color_surfaceNicolai Hähnle2017-11-281-12/+12
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: avoid attempting to create CMASK if the tiling mode doesn't have itNicolai Hähnle2017-11-281-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: check that we don't leak fine.buf referencesNicolai Hähnle2017-11-281-0/+2
| | | | | | Just as an added precaution. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: sid.h cleanupsNicolai Hähnle2017-11-281-1/+1
| | | | | | | Fix a bunch of labels indicating when registers were added/removed and normalize the SI-class GRBM_GFX_INDEX. Reviewed-by: Marek Olšák <[email protected]>
* ac: change legacy_surf_level::slice_size to dword unitsMarek Olšák2017-11-278-28/+30
| | | | | | | | | The next commit will reduce the size even more. v2: typecast to uint64_t manually v3: add more typecasts, add asserts Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: pack ac_surface betterMarek Olšák2017-11-272-7/+7
| | | | | | r600_texture: 1736 -> 1488 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always initialize max_forced_staging_uploadsMarek Olšák2017-11-271-0/+2
| | | | | | | | | r600_resource is malloc'd. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808 Fixes: 4b0dc098b256 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly") Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove an old hack for evergreenMarek Olšák2017-11-271-10/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST when profitableMarek Olšák2017-11-271-1/+16
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <[email protected]>
* r600/eg: dump event type in dumpsDave Airlie2017-11-271-0/+1
| | | | | | This just makes it easier to debug some things. Signed-off-by: Dave Airlie <[email protected]>
* nouveau/compiler: Allow to omit line numbers when printing instructionsTobias Klausmann2017-11-265-4/+13
| | | | | | | | | | | | | | | | This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! V2: - Use environmental variable (Karol Herbst) V3: - Use the already populated nv50_ir_prog_info to forward information to the print pass (Pierre Moreau) V4: - get rid of default value in PrintPass constructor Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: try flushing unflushed fences in si_fence_finish even when timeout ↵Nicolai Hähnle2017-11-261-3/+3
| | | | | | | | | | | | | | | | | == 0 Under certain conditions, waiting on a GL sync objects should act like a flush, regardless of the timeout. Portal 2, CS:GO, and presumably other Source engine games rely on this behavior and hang during loading without this fix. Fixes: bc65dcab3bc4 ("radeonsi: avoid syncing the driver thread in si_fence_finish") Signed-off-by: Marek Olšák <[email protected]> Tested-by: Kai Wasserbäch <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
* nv50/ir: move LateAlgebraicOpt to the very endIlia Mirkin2017-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | Memory loads can take offsets, but the SHLADD will often attempt to consume the offsets too. As there may be multiple memory loads with the same base but different offsets, those would end up in a SHLADD instead of the offset of the memory operation. This moves the pass after we've had a chance to attempt to propagate immediate adds into the indirect offset. total instructions in shared programs : 6580681 -> 6567716 (-0.20%) total gprs used in shared programs : 944261 -> 943375 (-0.09%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60339896 -> 60221504 (-0.20%) local shared gpr inst bytes helped 0 0 555 2698 2698 hurt 0 0 138 336 336 Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: when merging immediates/consts, load directlyIlia Mirkin2017-11-261-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | When a MERGE operation gets its constraint moves added, it susbstantially extends live ranges to be reusing an immediate from earlier in the program (not to mention the silliness of loading an immediate into a register, and then moving into another register). We detect these scenarios and insert moves that take the immediate or constbuf load directly into the register. If it's the last use, then we can just move that operation to the closer location. With SM35 (255 regs) we get these results: total instructions in shared programs : 6583670 -> 6580681 (-0.05%) total gprs used in shared programs : 950818 -> 944261 (-0.69%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60367456 -> 60339896 (-0.05%) local shared gpr inst bytes helped 0 0 4584 3186 3186 hurt 0 0 55 968 968 I suspect they will be better for SM20 and SM30. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add optimization for modulo by a non-power-of-2 valueIlia Mirkin2017-11-261-0/+15
| | | | | | | | We can still use the optimized division methods which make use of multiplication with overflow. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]>
* nv50/ir: optimize signed integer modulo by pow-of-2Ilia Mirkin2017-11-252-10/+29
| | | | | | | | | It's common to use signed int modulo in GLSL. As it happens, the GLSL specs allow the result to be undefined, but that seems fairly surprising. It's not that much more effort to get it right, at least for positive modulo operators. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: add ARB_framebuffer_no_attachments supportIlia Mirkin2017-11-252-1/+6
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a4xx: add indirect draw supportIlia Mirkin2017-11-252-0/+33
| | | | | | | | This is a copy of the a5xx logic. Fails a few tests, but basic functionality is there. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: regenerate pm4 header, adjust code for new namesIlia Mirkin2017-11-253-114/+171
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a4xx: add stencil texturing supportIlia Mirkin2017-11-253-12/+35
| | | | | | | Copied from a5xx, should be identical. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xxIlia Mirkin2017-11-257-3/+151
| | | | | | | | | | Unfortunately Adreno A4xx hardware returns incorrect results with the GATHER4 opcodes. As a result, we have to lower to 4 individual texture calls (txl since we have to force lod to 0). We achieve this using offsets, including on cube maps which normally never have offsets. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* radeonsi: expose all CB performance counters on StoneyMarek Olšák2017-11-251-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: handle imported textures with DCC robustlyMarek Olšák2017-11-251-1/+1
| | | | | | | now you can hack the driver to enable DCC for displayable textures and Glamor that doesn't enable that by default won't crash anymore. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix a typo in creating monolithic ES-GSMarek Olšák2017-11-251-1/+1
| | | | | | This has no effect because both occupy the same memory in a union. Reviewed-by: Nicolai Hähnle <[email protected]>