summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* gallium: Add missing includesThomas Helland2017-06-074-0/+4
| | | | | | | | These will need to be in place to avoid regressions when removing these includes from the u_dynarray Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radeonsi: update clip_regs on shader state changes only when it's neededMarek Olšák2017-06-071-3/+32
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selectorMarek Olšák2017-06-074-16/+25
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a new helper si_get_vsMarek Olšák2017-06-072-17/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: isolate real framebuffer changes from the decompression passes (v3)Samuel Pitoiset2017-06-073-2/+28
| | | | | | | | | | | | | | When a stencil buffer is part of the framebuffer state, it is decompressed but because it's bindless, all draw calls set stencil_dirty_level_mask to 1. v2: Marek - set the flags outside the loop - also clear and set framebuffer.do_update_surf_dirtiness there - do it in the DB->CB copy path too v3: Marek - save and restore the do_update_surf_dirtiness flag Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do EarlyCSEMemSSA LLVM passMarek Olšák2017-06-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | so that LLVM IR looks like CSE has been run on it. It's also recommended by the instruction combining pass. This also fixes: - GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash) - piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail) The code size decrease is positive, the register usage isn't. There is a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown and GRID Autosport. EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE. SGPRS: 1935420 -> 1938076 (0.14 %) VGPRS: 1645504 -> 1645988 (0.03 %) Spilled SGPRs: 2493 -> 2651 (6.34 %) Spilled VGPRs: 107 -> 115 (7.48 %) Private memory VGPRs: 1332 -> 1332 (0.00 %) Scratch size: 1512 -> 1516 (0.26 %) dwords per thread Code Size: 61981592 -> 61890012 (-0.15 %) bytes Max Waves: 371847 -> 371798 (-0.01 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove 8 bytes from si_shader_keyMarek Olšák2017-06-073-14/+17
| | | | | | | We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERICMarek Olšák2017-06-072-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* svga: Always set the alpha value to 1 when sampling using an XRGB viewThomas Hellstrom2017-06-071-13/+30
| | | | | | | | | | If the XRGB view is sampling from an ARGB svga format, change PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels. Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which could be both insufficient and incorrect. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Fix imported surface view creationThomas Hellstrom2017-06-074-11/+33
| | | | | | | | | | When deciding to create a view with or without an alpha channel we need to look at the SVGA3D format and not the PIPE format. This fixes the glx-tfp piglit test for dri3/xa. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Set alpha to 1 for non-alpha viewsThomas Hellstrom2017-06-071-0/+18
| | | | | | | | | | | Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those and similar cases we need to set the alpha value to 1 when sampling. Fixes piglit glx::glx-tfp Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Allow format differences in 16-bit RGBA surface sharingThomas Hellstrom2017-06-071-1/+5
| | | | | | | | | | | | | For the purpose of surface sharing, treat SVGA3D_R5G6B5 and SVGA3D_B5G6R5_UNORM as identical formats. This fixes the following piglit tests with dri3/xa: glx@glx-visuals-depth -pixmap glx@glx-visuals-stencil -pixmap Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Deepak Singh Rawat <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* dri/vmwgfx: Disable a couple of glx extensions also for Ubuntu unity / compizThomas Hellstrom2017-06-071-0/+4
| | | | | | | | | | | It appears like the GLX_EXT_buffer_age extension also prevents Compiz / Ubuntu Unity from performing partial buffer swaps when it otherwise feels like doing so. So try to get them back again. We also disable GLX_OML_sync_control since it appears it had a favourable impact on gnome-shell. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Sinclair Yeh <[email protected]>
* dri: Turn of a couple of glx extensions for gnome-shell on vmwgfx.Thomas Hellstrom2017-06-071-0/+7
| | | | | | | | Increases performance on vmwgfx since we're avoiding full buffer damage and since we can't sync to vertical retrace anyway. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/dri: Allow gallium drivers to turn off two GLX extensionsThomas Hellstrom2017-06-071-0/+2
| | | | | | | | Allow gallium drivers to turn off GLX_EXT_buffer_age and GLX_OML_sync_control if needed, using driconf. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* dri: Optionally turn off a couple of GLX extensions based on driconf optionsThomas Hellstrom2017-06-073-3/+25
| | | | | | | | | | | | With GLX_EXT_buffer_age turned on, gnome-shell will use full-screen damage with GLX, which severely hurts performance with architectures that emulate page-flips with copies. Like vmware. We would like to be able to turn off that extension. Similarly, typically the GLX_OML_sync_control doesn't make much sense on a virtual architecture since we don't really sync to the host's vertical retrace. We'd like to be able to turn it off as well. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/dri: Allow dri users to query also driver optionsThomas Hellstrom2017-06-071-1/+64
| | | | | | | | | There will be situations where we want to control, for example, the GLX behaviour based on applications and drivers. So allow DRI users access to the driver options. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up decompress blend state namesMarek Olšák2017-06-074-10/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: clean up a misleading statement from the old daysMarek Olšák2017-06-071-4/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't use 1D tiling for Z/S on VI to get TC-compatible HTILEMarek Olšák2017-06-071-3/+13
| | | | | | It's always good to have fewer decompress blits. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable TC-compatible stencil compression on VIMarek Olšák2017-06-073-5/+8
| | | | | | | Most things are in place. Ideally we won't see decompress blits for stencil anymore. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: don't keep framebuffer state in st_contextMarek Olšák2017-06-077-51/+48
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: cache pipe_surface for GL_FRAMEBUFFER_SRGB changesMarek Olšák2017-06-074-29/+41
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: use gl_driver_flags::NewFramebufferSRGBMarek Olšák2017-06-071-3/+6
| | | | | | | also call st_init_driver_flags when st_context is initialized. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: add gl_driver_flags::NewFramebufferSRGBMarek Olšák2017-06-072-1/+7
| | | | | | | _NEW_BUFFERS updates too much stuff. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: prevent a race when the previous shader's main part is missingMarek Olšák2017-06-071-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shadersMarek Olšák2017-06-071-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9Marek Olšák2017-06-071-3/+18
| | | | | | | | | | LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move streamout state update out of si_update_shadersMarek Olšák2017-06-072-16/+25
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove dead code in declare_input_fsMarek Olšák2017-06-071-5/+0
| | | | | | Colors are interpolated in the PS prolog. This was never used. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_keyMarek Olšák2017-06-071-4/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use a compiler queue with a low priority for optimized shadersMarek Olšák2017-06-073-8/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* util/u_queue: add an option to set the minimum thread priorityMarek Olšák2017-06-078-8/+29
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: decrease the number of compiler threads to num CPUs - 1Marek Olšák2017-06-071-1/+4
| | | | | | Reserve one core for other things (like draw calls). Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: drop unfinished shader compilations when destroying shadersMarek Olšák2017-06-072-3/+5
| | | | | | | If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <[email protected]>
* util/u_queue: add a way to remove a job when we just want to destroy itMarek Olšák2017-06-072-6/+49
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a5xx: set SP_BLEND_CONTROL properlyRob Clark2017-06-073-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: LRZ supportRob Clark2017-06-0714-14/+234
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: drop timestamp fieldRob Clark2017-06-072-3/+0
| | | | | | unused. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: refactor out helper for LRZ flushRob Clark2017-06-073-11/+19
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: reshuffle FD_MESA_DEBUG bitmaskRob Clark2017-06-071-3/+3
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2017-06-077-17/+31
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium/u_blitter: use 2D_ARRAY for cubemap blits if possibleMarek Olšák2017-06-075-22/+39
| | | | | | | | so that we can use TXF. The cubemap blit pixel shader code size: 148 -> 92 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: use TXF if possibleMarek Olšák2017-06-071-102/+190
| | | | | | | | | | | | | | | | | | This fixes piglit: arb_texture_view-rendering-r32ui TEX (image_sample) flushes denorms to 0 with FP32 textures on GCN, but such a texture can contain integer data written using an integer render view. If we do a transfer blit with TEX, denorms are flushed to 0. Luckily, TXF (image_load) doesn't do that. TXF also doesn't need to load the sampler state, so blit shaders don't have to do s_load_dwordx4. TXF doesn't do CLAMP_TO_EDGE, so it can only be used if the src box is in bounds, or if we clamp manually (this commit doesn't). Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: use TEX_LZ if it's supportedMarek Olšák2017-06-071-4/+8
| | | | | | | The sampler views always have first_level == last_level. Now radeonsi doesn't have to use the WQM. (a few SALU removed) Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/util: add _LZ and TXF options to simple shadersMarek Olšák2017-06-076-33/+77
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/ureg: add TEX/TXF_LZ opcodes to uregMarek Olšák2017-06-071-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Use BLORP for all HiZ opsJason Ekstrand2017-06-073-162/+10
| | | | | | | | BLORP has been capable of doing gen8-style HiZ ops for a while now. We might as well start using it. The one downside is that this may cause a bit more state emission since we still re-emit most things for BLORP. Reviewed-by: Topi Pohjolainen <[email protected]>
* blorp: Use FullSurfaceDepthandStencilClear for blorp_hiz_opJason Ekstrand2017-06-073-0/+5
| | | | | | | The blorp_hiz_op entrypoint always acts on a full subresource of a HiZ buffer so we can just set the flag unconditionally. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move the post-HiZ-clear flush/stall to intel_hiz_execJason Ekstrand2017-06-072-16/+18
| | | | | | | | This also changes it to be predicated so we only do the flush/stall on clears and HiZ resolves. The docs only say it's needed for clears but empirical evidence says it's also needed for HiZ resolves. Reviewed-by: Topi Pohjolainen <[email protected]>