summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: call LLVMAddEarlyCSEMemSSAPass only for LLVM >= 4.0Juan A. Suarez Romero2017-06-081-0/+2
| | | | | | | | LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0. Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass) Signed-off-by: Marek Olšák <[email protected]>
* gallium/radeon: don't allocate HTILE in a separate bufferMarek Olšák2017-06-088-59/+41
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename depth decompress functionsMarek Olšák2017-06-081-16/+15
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename shader resource decompress masks to their true meaningMarek Olšák2017-06-083-28/+28
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename is_compressed_colortex -> color_needs_decompressionMarek Olšák2017-06-081-5/+5
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable the patch ID workaround on SI when the patch ID isn't used ↵Marek Olšák2017-06-082-15/+21
| | | | | | | | | | | | | | (v2) The workaround causes a massive performance decrease on 1-SE parts. (Cape Verde, Hainan, Oland) The performance regression is already part of 17.0 and 17.1. v2: check tess_uses_prim_id Cc: 17.0 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't update dependent states if it has no effect (v2)Marek Olšák2017-06-083-12/+76
| | | | | | | | | | | | | | This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <[email protected]>
* i915g: Add blitter_context argument.Vinson Lee2017-06-081-1/+1
| | | | | | | | | | | | | | | | | | | Fix build error. CC i915_surface.lo i915_surface.c:108:63: error: too few arguments to function call, expected 4, have 3 util_blitter_default_src_texture(&src_templ, src, src_level); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ../../../../src/gallium/auxiliary/util/u_blitter.h:271:1: note: 'util_blitter_default_src_texture' declared here void util_blitter_default_src_texture(struct blitter_context *blitter, ^ Fixes: a893c9169733 ("gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101340 Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* etnaviv: flush resource when binding as sampler viewLucas Stach2017-06-081-0/+3
| | | | | | | | As TS is also allowed on sampler resources, we need to make sure to resolve to self when binding the resource as a texture, to avoid stale content being sampled. Signed-off-by: Lucas Stach <[email protected]>
* etnaviv: don't flush resource to self without TSLucas Stach2017-06-081-1/+1
| | | | | | | | A resolve to self is only necessary if the resource is fast cleared, so there is never a need to do so if there is no TS allocated. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: upgrade DISCARD_RANGE to DISCARD_WHOLE_RESOURCE if possibleLucas Stach2017-06-081-0/+14
| | | | | | | | Stolen from VC4. As we don't do any fancy reallocation tricks yet, it's possible to upgrade also coherent mappings and shared resources. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: simplify transfer tiling handlingLucas Stach2017-06-081-41/+29
| | | | | | | | | There is no need to special case compressed resources, as they are already marked as linear on allocation. With that out of the way, there is room to cut down on the number of if clauses used. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: don't read back resource if transfer discards contentsLucas Stach2017-06-081-1/+3
| | | | | | | | | Reduces bandwidth usage of transfers which discard the buffer contents, as well as skipping unnecessary command stream flushes and CPU/GPU synchronization. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: honor PIPE_TRANSFER_UNSYNCHRONIZED flagLucas Stach2017-06-081-12/+23
| | | | | | | | | | | | This gets rid of quite a bit of CPU/GPU sync on frequent vertex buffer uploads and I haven't seen any of the issues mentioned in the comment, so this one seems stale. Ignore the flag if there exists a temporary resource, as those ones are never busy. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: slim down resource waitingLucas Stach2017-06-084-23/+6
| | | | | | | | cpu_prep() already does all the required waiting, so the only thing that needs to be done is flushing the commandstream, if a GPU write is pending. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* radeonsi: Use libdrm to get chipset nameSamuel Li2017-06-073-1/+20
| | | | | | | | v2: Add a func pointer to radeon_winsys to support radeon later. Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f Signed-off-by: Samuel Li <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* util: Port nir_array functionality to u_dynarrayThomas Helland2017-06-076-8/+8
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* util: Move u_dynarray to src/utilThomas Helland2017-06-072-115/+0
| | | | | | | This will be used as the basis for unification Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* gallium: Add missing includesThomas Helland2017-06-074-0/+4
| | | | | | | | These will need to be in place to avoid regressions when removing these includes from the u_dynarray Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radeonsi: update clip_regs on shader state changes only when it's neededMarek Olšák2017-06-071-3/+32
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selectorMarek Olšák2017-06-074-16/+25
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a new helper si_get_vsMarek Olšák2017-06-072-17/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: isolate real framebuffer changes from the decompression passes (v3)Samuel Pitoiset2017-06-073-2/+28
| | | | | | | | | | | | | | When a stencil buffer is part of the framebuffer state, it is decompressed but because it's bindless, all draw calls set stencil_dirty_level_mask to 1. v2: Marek - set the flags outside the loop - also clear and set framebuffer.do_update_surf_dirtiness there - do it in the DB->CB copy path too v3: Marek - save and restore the do_update_surf_dirtiness flag Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do EarlyCSEMemSSA LLVM passMarek Olšák2017-06-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove 8 bytes from si_shader_keyMarek Olšák2017-06-073-14/+17
| | | | | | | We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERICMarek Olšák2017-06-072-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* svga: Always set the alpha value to 1 when sampling using an XRGB viewThomas Hellstrom2017-06-071-13/+30
| | | | | | | | | | If the XRGB view is sampling from an ARGB svga format, change PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels. Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which could be both insufficient and incorrect. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Fix imported surface view creationThomas Hellstrom2017-06-074-11/+33
| | | | | | | | | | When deciding to create a view with or without an alpha channel we need to look at the SVGA3D format and not the PIPE format. This fixes the glx-tfp piglit test for dri3/xa. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Set alpha to 1 for non-alpha viewsThomas Hellstrom2017-06-071-0/+18
| | | | | | | | | | | Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those and similar cases we need to set the alpha value to 1 when sampling. Fixes piglit glx::glx-tfp Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Allow format differences in 16-bit RGBA surface sharingThomas Hellstrom2017-06-071-1/+5
| | | | | | | | | | | | | For the purpose of surface sharing, treat SVGA3D_R5G6B5 and SVGA3D_B5G6R5_UNORM as identical formats. This fixes the following piglit tests with dri3/xa: glx@glx-visuals-depth -pixmap glx@glx-visuals-stencil -pixmap Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Deepak Singh Rawat <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/dri: Allow gallium drivers to turn off two GLX extensionsThomas Hellstrom2017-06-071-0/+2
| | | | | | | | Allow gallium drivers to turn off GLX_EXT_buffer_age and GLX_OML_sync_control if needed, using driconf. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/dri: Allow dri users to query also driver optionsThomas Hellstrom2017-06-071-1/+64
| | | | | | | | | There will be situations where we want to control, for example, the GLX behaviour based on applications and drivers. So allow DRI users access to the driver options. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up decompress blend state namesMarek Olšák2017-06-074-10/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up a misleading statement from the old daysMarek Olšák2017-06-071-4/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't use 1D tiling for Z/S on VI to get TC-compatible HTILEMarek Olšák2017-06-071-3/+13
| | | | | | It's always good to have fewer decompress blits. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable TC-compatible stencil compression on VIMarek Olšák2017-06-073-5/+8
| | | | | | | Most things are in place. Ideally we won't see decompress blits for stencil anymore. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: prevent a race when the previous shader's main part is missingMarek Olšák2017-06-071-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shadersMarek Olšák2017-06-071-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9Marek Olšák2017-06-071-3/+18
| | | | | | | | | | LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move streamout state update out of si_update_shadersMarek Olšák2017-06-072-16/+25
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove dead code in declare_input_fsMarek Olšák2017-06-071-5/+0
| | | | | | Colors are interpolated in the PS prolog. This was never used. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_keyMarek Olšák2017-06-071-4/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a compiler queue with a low priority for optimized shadersMarek Olšák2017-06-073-8/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* util/u_queue: add an option to set the minimum thread priorityMarek Olšák2017-06-075-5/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: decrease the number of compiler threads to num CPUs - 1Marek Olšák2017-06-071-1/+4
| | | | | | Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: drop unfinished shader compilations when destroying shadersMarek Olšák2017-06-072-3/+5
| | | | | | | If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a5xx: set SP_BLEND_CONTROL properlyRob Clark2017-06-073-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: LRZ supportRob Clark2017-06-0714-14/+234
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: drop timestamp fieldRob Clark2017-06-072-3/+0
| | | | | | unused. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: refactor out helper for LRZ flushRob Clark2017-06-073-11/+19
| | | | Signed-off-by: Rob Clark <[email protected]>