summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* gallium: Add missing includesThomas Helland2017-06-072-0/+2
| | | | | | | | These will need to be in place to avoid regressions when removing these includes from the u_dynarray Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radeonsi: update clip_regs on shader state changes only when it's neededMarek Olšák2017-06-071-3/+32
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selectorMarek Olšák2017-06-074-16/+25
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a new helper si_get_vsMarek Olšák2017-06-072-17/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: isolate real framebuffer changes from the decompression passes (v3)Samuel Pitoiset2017-06-073-2/+28
| | | | | | | | | | | | | | When a stencil buffer is part of the framebuffer state, it is decompressed but because it's bindless, all draw calls set stencil_dirty_level_mask to 1. v2: Marek - set the flags outside the loop - also clear and set framebuffer.do_update_surf_dirtiness there - do it in the DB->CB copy path too v3: Marek - save and restore the do_update_surf_dirtiness flag Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do EarlyCSEMemSSA LLVM passMarek Olšák2017-06-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove 8 bytes from si_shader_keyMarek Olšák2017-06-073-14/+17
| | | | | | | We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERICMarek Olšák2017-06-072-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* svga: Always set the alpha value to 1 when sampling using an XRGB viewThomas Hellstrom2017-06-071-13/+30
| | | | | | | | | | If the XRGB view is sampling from an ARGB svga format, change PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels. Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which could be both insufficient and incorrect. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Fix imported surface view creationThomas Hellstrom2017-06-074-11/+33
| | | | | | | | | | When deciding to create a view with or without an alpha channel we need to look at the SVGA3D format and not the PIPE format. This fixes the glx-tfp piglit test for dri3/xa. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Set alpha to 1 for non-alpha viewsThomas Hellstrom2017-06-071-0/+18
| | | | | | | | | | | Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those and similar cases we need to set the alpha value to 1 when sampling. Fixes piglit glx::glx-tfp Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Allow format differences in 16-bit RGBA surface sharingThomas Hellstrom2017-06-071-1/+5
| | | | | | | | | | | | | For the purpose of surface sharing, treat SVGA3D_R5G6B5 and SVGA3D_B5G6R5_UNORM as identical formats. This fixes the following piglit tests with dri3/xa: glx@glx-visuals-depth -pixmap glx@glx-visuals-stencil -pixmap Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Deepak Singh Rawat <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* radeonsi: clean up decompress blend state namesMarek Olšák2017-06-074-10/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up a misleading statement from the old daysMarek Olšák2017-06-071-4/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't use 1D tiling for Z/S on VI to get TC-compatible HTILEMarek Olšák2017-06-071-3/+13
| | | | | | It's always good to have fewer decompress blits. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable TC-compatible stencil compression on VIMarek Olšák2017-06-073-5/+8
| | | | | | | Most things are in place. Ideally we won't see decompress blits for stencil anymore. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: prevent a race when the previous shader's main part is missingMarek Olšák2017-06-071-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shadersMarek Olšák2017-06-071-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9Marek Olšák2017-06-071-3/+18
| | | | | | | | | | LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move streamout state update out of si_update_shadersMarek Olšák2017-06-072-16/+25
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove dead code in declare_input_fsMarek Olšák2017-06-071-5/+0
| | | | | | Colors are interpolated in the PS prolog. This was never used. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_keyMarek Olšák2017-06-071-4/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a compiler queue with a low priority for optimized shadersMarek Olšák2017-06-073-8/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* util/u_queue: add an option to set the minimum thread priorityMarek Olšák2017-06-072-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: decrease the number of compiler threads to num CPUs - 1Marek Olšák2017-06-071-1/+4
| | | | | | Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: drop unfinished shader compilations when destroying shadersMarek Olšák2017-06-072-3/+5
| | | | | | | If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a5xx: set SP_BLEND_CONTROL properlyRob Clark2017-06-073-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: LRZ supportRob Clark2017-06-0714-14/+234
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: drop timestamp fieldRob Clark2017-06-072-3/+0
| | | | | | unused. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: refactor out helper for LRZ flushRob Clark2017-06-073-11/+19
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: reshuffle FD_MESA_DEBUG bitmaskRob Clark2017-06-071-3/+3
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2017-06-077-17/+31
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium/u_blitter: use 2D_ARRAY for cubemap blits if possibleMarek Olšák2017-06-073-3/+3
| | | | | | | | so that we can use TXF. The cubemap blit pixel shader code size: 148 -> 92 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* tree-wide: remove trailing backslashEric Engestrom2017-06-073-4/+4
| | | | | | | | | Simple search for a backslash followed by two newlines. If one of the newlines were to be removed, this would cause issues, so let's just remove these trailing backslashes. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* radeonsi: fix a GPU hang with tessellation on 2-CU configsMarek Olšák2017-06-061-1/+5
| | | | | | | | | Only harvested Stoney has 2 CUs. Tested on 2-CU Stoney and Fiji forced to 2 CUs. Cc: 17.0 17.1 <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeon: remove out of date LLVM_REVISION.txtEmil Velikov2017-06-052-4/+0
| | | | | | | | | | | | The file was introduced to track which LLVM revision was required, yet that has quickly gone out of shape. It has seen no updates since 2013. Cc: Nicolai Hähnle <[email protected]> Cc: Marek Olšák <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Aaron Watry <[email protected]>
* r600: refactor out some compressed resource state code.Dave Airlie2017-06-061-24/+28
| | | | | | | This just takes this out to a separate function as it will get more complex with images. Reviewed-by: Glenn Kennard <[email protected]>
* r600: document some of the missing shader constants.Dave Airlie2017-06-061-0/+4
| | | | | | These are used for fragment shader thread calculations. Reviewed-by: Glenn Kennard <[email protected]>
* r600: add register info for atomic counters.Dave Airlie2017-06-062-0/+51
| | | | | | | | | The atomic counters on evergreen are implemented via append/consume UAV counters. This just adds the register info for them. The EOS packets are used to get the atomic totals extracted post shader execution for storing into a buffer. Reviewed-by: Glenn Kennard <[email protected]>
* r600: add missing RAT registers and operations.Dave Airlie2017-06-063-0/+59
| | | | | | | | | | This just documents in the headers the RAT operation list, and the RAT encoding for exports. The immediate registers are used to point to buffers for the RAT return values (_RTN instructions). Reviewed-by: Glenn Kennard <[email protected]>
* r600/sb: fix typo in field definitionsDave Airlie2017-06-061-1/+1
| | | | Pointed out by glennk.
* r600: fix incorrect and missing bit field in register headers.Dave Airlie2017-06-051-3/+4
| | | | | The compression field was incorrect, and we were missing the depth before shader field.
* nvc0: Add support for ARB_post_depth_coverageLyude2017-06-028-1/+15
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: Add a cap to check if the driver supports ARB_post_depth_coverageLyude2017-06-0215-0/+15
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: disable BGRA8 images on FermiLyude2017-06-021-5/+14
| | | | | | | | | | | | BGRA8 image stores on Fermi don't work, which results in breaking PBO downloads, such that they always return 0x0. Discovered this through a glamor bug, and confirmed it does indeed break a good number of piglit tests such as spec/arb_pixel_buffer_object/pbo-read-argb8888 Fixes: 8e7893eb53213 ("nvc0: add support for BGRA8 images") Signed-off-by: Lyude <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* etnaviv: always do cpu_fini in transfer_unmapLucas Stach2017-06-011-3/+6
| | | | | | | | | | | | | | | | | The cpu_fini() call pushes the buffer back into the GPU domain, which needs to be done for all buffers, not just the ones with CPU written content. The etnaviv kernel driver currently doesn't validate this, but may start to do so at a later point in time. If there is a temporary resource the fini needs to happen before the RS uses this one as the source for the upload. Also remove an invalid comment about flushing CPU caches, cpu_fini takes care of everything involved in this. Fixes: c9e8b49b885 ("etnaviv: gallium driver for Vivante GPUs") Cc: [email protected] Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Philipp Zabel <[email protected]> Reviewed-By: Wladimir J. van der Laan <[email protected]>
* nvc0: Clean up unnecessary includes from gallium/auxiliary/vl/Rhys Kidd2017-06-011-3/+0
| | | | | Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* r600/eg: add support for tracing IBs after a hang.Dave Airlie2017-06-0111-7/+785
| | | | | | | | | This is a poor man's version of radeonsi ddebug stuff, this should get hooked into that infrastructure, and grow more stuff, but for now, just create R600_TRACE var that points to a file that you want to dump the last IB to. Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: remove unused si_pm4_state::compute_pktSamuel Pitoiset2017-05-312-4/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove chip_class define from si_pm4.hSamuel Pitoiset2017-05-311-1/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>