summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* st/mesa/glsl/i965: move ShaderStorageBlocks to gl_programTimothy Arceri2017-01-066-10/+11
| | | | | | | | | | | | Having it here rather than in gl_linked_shader allows us to simplify the code. Also it is error prone to depend on the gl_linked_shader for programs in current use because a failed linking attempt will free infomation about the current program. In i965 we could be trying to recompile a shader variant but may have lost some required fields. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* st/mesa/glsl/i965: set num_ssbos directly in shader_infoTimothy Arceri2017-01-069-23/+24
| | | | | | | Here we also remove the duplicate field in gl_linked_shader and always get the value from shader_info instead. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* st/mesa/glsl/i965: move per stage UniformBlocks to gl_programTimothy Arceri2017-01-067-21/+19
| | | | | | | This will help allow us to store pointers to gl_program structs in the CurrentProgram array resulting in a bunch of code simplifications. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* st/mesa/glsl/i965: set num_ubos directly in shader_infoTimothy Arceri2017-01-069-21/+17
| | | | | | | This also removes the duplicate field in gl_linked_shader, and gets num_ubos from shader_info instead. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* st/mesa/glsl/i965: move ImageUnits and ImageAccess fields to gl_programTimothy Arceri2017-01-0612-67/+50
| | | | | | | | | | | | | | | Having it here rather than in gl_linked_shader allows us to simplify the code. Also it is error prone to depend on the gl_linked_shader for programs in current use because a failed linking attempt will free infomation about the current program. In i965 we could be trying to recompile a shader variant but may have lost some required fields. We drop the memset on ImageUnits because gl_program is already created using rzalloc(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* i965: get InfoLog and LinkStatus via the pointer in gl_programTimothy Arceri2017-01-061-4/+4
| | | | Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* i965: get shared_size from shader_info rather than gl_shader_programTimothy Arceri2017-01-061-2/+2
| | | | Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* i965: stop depending on gl_shader_program for brw_compute_vue_map() paramsTimothy Arceri2017-01-061-1/+1
| | | | | | | | This removes another dependency on gl_shader_program from the codegen functions, this will help allow us to use gl_program for the CurrentProgram array rather than gl_shader_program. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* i965: pass gl_program to the brw_*_debug_recompile() functionsTimothy Arceri2017-01-067-138/+125
| | | | | | | | | | | | | | Rather then passing gl_shader_program. The only field use was Name which is the same as the Id field in gl_program. For wm and vs we also make the functions static and move them before the codegen functions. This change reduces the codegen functions dependency on gl_shader_program. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* gallivm: (trivial) fix typo bug with small AoS format unpackingRoland Scheidegger2017-01-061-1/+1
| | | | | | | Fix typo using wrong (uninitialized) build context introduced by 4634cb5921b985f04f2daf00cda2d28036143bd3. (This only affects very rare small packed formats which have a PIPE_SWIZZLE_0 channel, such as r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
* gallivm: implement aos unpack (to unorm8) for small unorm formatsRoland Scheidegger2017-01-052-17/+155
| | | | | | | | | | | | | | | | | | | Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss with AVX). With SoA it's just shift/mask/cvt/mul for getting the colors, whereas there's still roughly 3 shifts, 3 or/and per channel for AoS (i.e. for SoA it's exactly the same as it would be for a rgba8 format, whereas the extra effort for AoS is significant). The filtering might still be faster (albeit with FMA the instruction count gets down quite a bit there on the SoA float filtering path on new cpus). And those small unorm formats often don't have an alpha channel (which makes things worse relatively for AoS path). (This also fixes a trivial bug in the llvmpipe fs code this was derived from, albeit it was only relevant for 4-bit channels.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: optimize lp_build_unpack_arith_rgba_aos slightlyRoland Scheidegger2017-01-051-19/+97
| | | | | | | | | | | | | | | | | This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said, this path is taken for AoS sampling for small unorm (smaller than rgba8) formats, and it is completely hopeless even with those changes, with or without AVX. (Probably should have some code similar to the one in the llvmpipe fs backend code, using bit replication to extend to rgba8888 - rounding is not quite 100% accurate but if it's good enough there it should be here as well.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_autoRoland Scheidegger2017-01-051-2/+19
| | | | | | | | | | | | | | | | | If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source vectors at a time, afterwards we immediately reassemble the vectors). For AVX though this isn't really possible, since we expect 128bit output already for a single 256bit input. (One day we should handle AVX2 which again would need multiple inputs, however there's the problem that we get different ordered output there and we don't want to reorder, so would need to be able to tell build_conv to handle upper and lower halfs independently.) A similar strategy would probably work for 32->8bit too (if it doesn't hit the special case) but I'm going to try something different for that... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* llvmpipe: (trivial) minimally simplify mask constructionRoland Scheidegger2017-01-053-17/+53
| | | | | | | | | | | | | | | | simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we definitely shouldn't do. There's problems with this though with llvm select interaction, so it's disabled (basically using llvm select instead of intrinsics may still produce atrocious code, even in cases where we figured it should not, albeit I think this could probably be fixed with some better selection of optimization passes, but I have zero idea there really). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* anv: fix multiple creation with internal failureLionel Landwerlin2017-01-051-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The specification section 9.4 says : When an application attempts to create many pipelines in a single command, it is possible that some subset may fail creation. In that case, the corresponding entries in the pPipelines output array will be filled with VK_NULL_HANDLE values. If any pipeline fails creation (for example, due to out of memory errors), the vkCreate*Pipelines commands will return an error code. The implementation will attempt to create all pipelines, and only return VK_NULL_HANDLE values for those that actually failed. Fixes : dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline v2: C is hard let's go shopping (Lionel) v3: Remove unnecessary condition in for loops (Lionel) v4: Document why we return on first failure (Eduardo) Move i declaration inside for() (Eduardo) v5: Move array cleanup out of loop (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* swr: [rasterizer core/common/jitter] gl_double supportTim Rowley2017-01-059-33/+341
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
* dri3: Fix MakeCurrent without a default framebufferFredrik Höglund2017-01-051-4/+10
| | | | | | | | | | In OpenGL 3.0 and later it is legal to make a context current without a default framebuffer. This has been broken since DRI3 support was introduced. Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: turn SDMA IBs into de-facto preambles of GFX IBsMarek Olšák2017-01-052-6/+18
| | | | | | | | | | Draw calls no longer flush SDMA IBs. r600_need_dma_space is responsible for synchronizing execution between both IBs. Initial buffer clears and fast clears will stay unflushed in the SDMA IB (up to 64 MB) as long as the GFX IB isn't flushed either. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement SDMA-based buffer clearing for SIMarek Olšák2017-01-052-1/+41
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: do all math in bytes in SI DMA codeMarek Olšák2017-01-052-19/+21
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_spaceMarek Olšák2017-01-058-36/+27
| | | | | | | | Call r600_dma_emit_wait_idle only when there is a possibility of a read-after-write hazard. Buffers not yet used by the SDMA IB don't have to wait. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: move unrelated code from dma_emit_wait_idle to need_dma_spaceMarek Olšák2017-01-051-18/+15
| | | | | | | | r600_dma_emit_wait_idle is going away in its current form. The only difference is that the moved code is executed before DMA calls instead of after them. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: inline cik_sdma_do_copy_bufferMarek Olšák2017-01-051-28/+16
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: also wait for SDMA in the clear_buffer CPU fallbackMarek Olšák2017-01-051-3/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: simplify r600_resource typecasts in si_clear_bufferMarek Olšák2017-01-051-5/+5
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: always use SDMA for big buffer clears and first buffer usesMarek Olšák2017-01-051-0/+20
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use SDMA in rvid_buffer_clear on CIK-VIMarek Olšák2017-01-051-2/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VIMarek Olšák2017-01-053-8/+6
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement SDMA-based buffer clearing for CIK-VIMarek Olšák2017-01-053-0/+54
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/hud: increase the vertex buffer size for textMarek Olšák2017-01-051-1/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/hud: add an option to sort items below graphsMarek Olšák2017-01-052-5/+33
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/hud: add an option to reset the color counterMarek Olšák2017-01-052-3/+19
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/hud: allow more data sources per paneMarek Olšák2017-01-051-13/+15
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/hud: add an option to rename each data sourceMarek Olšák2017-01-051-2/+35
| | | | | | | | useful for radeonsi performance counters v2: allow specifying both : and = Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-0533-202/+82
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-0523-96/+41
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* st/nine: Remove all usage of ureg_SUB in nine_shaderAxel Davy2017-01-051-8/+8
| | | | | | | This is required to drop gallium SUB. Signed-off-by: Axel Davy <axel.davy@ens.fr> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* st/nine: Remove all usage of ureg_SUB in nine_ffAxel Davy2017-01-051-20/+20
| | | | | | | This is required to remove gallium SUB. Signed-off-by: Axel Davy <axel.davy@ens.fr> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* st/nine: Do not map SUB and ABS to their gallium equivalent.Axel Davy2017-01-051-2/+23
| | | | | | | This is required for gallium SUB and ABS to be removed. Signed-off-by: Axel Davy <axel.davy@ens.fr> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* st/mesa: fix a segfault when prog->sh.data is NULLMarek Olšák2017-01-051-1/+3
| | | | | | | Broken by: st/mesa: get Version from gl_program rather than gl_shader_program Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* st/va: fix incorrect argument in vl_compositor_cleanupNayan Deshmukh2017-01-051-1/+1
| | | | | | | | This fixes the mistake introduced in commit b6737a8bcd03ea68952799144c0c6e6e6679bee9 Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com>
* swr: remove unneeded llvm version checkTim Rowley2017-01-051-4/+0
| | | | | | | Old test caused breakage with llvm-svn (4.0.0svn), and not needed as the minimum required llvm version for swr is 3.6. Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
* swr: fix windows build breakGeorge Kyriazis2017-01-052-4/+7
| | | | | | | | | | wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* radeonsi: update clip_regs if clip_disable changes to fix a hangMarek Olšák2017-01-051-0/+5
| | | | | | | | | | | | | | This seems to fix the GPU hangs caused by: commit ed3190b3f3a776fc8c75b1e6130a88079166d115 Author: Marek Olšák <marek.olsak@amd.com> Date: Sun Nov 13 18:41:43 2016 +0100 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219 Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* st/mesa: enable GLSLOptimizeConservatively for drivers that want itMarek Olšák2017-01-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLSL compilation now takes 24% less time with the Gallium noop driver. I used my shader-db for the measurement. The difference for the whole radeonsi driver can be ~10%. The generated TGSI is mostly the same. For example, the compilation success rate with a TGSI->GCN bytecode converter without any optimizations is the same. Note that glsl_to_tgsi does its own copy propagation and simple register allocation. shader-db GCN report: - Talos spills fewer SGPRs. - DOTA 2 spills more SGPRs. - The average shader-db score is better, but it's just due to randomness. 29045 shaders in 17564 tests Totals: SGPRS: 1325929 -> 1325017 (-0.07 %) VGPRS: 1010808 -> 1010172 (-0.06 %) Spilled SGPRs: 1432 -> 1399 (-2.30 %) Spilled VGPRs: 93 -> 92 (-1.08 %) Private memory VGPRs: 688 -> 688 (0.00 %) Scratch size: 2540 -> 2484 (-2.20 %) dwords per thread Code Size: 39336732 -> 39342936 (0.02 %) bytes Max Waves: 217937 -> 217969 (0.01 %) Reviewed-by: Eric Anholt <eric@anholt.net>
* glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservativelyMarek Olšák2017-01-051-9/+67
| | | | Reviewed-by: Eric Anholt <eric@anholt.net>
* mesa: add gl_constants::GLSLOptimizeConservativelyMarek Olšák2017-01-054-10/+37
| | | | | | to reduce the amount of GLSL optimizations for drivers that can do better. Reviewed-by: Eric Anholt <eric@anholt.net>
* gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELYMarek Olšák2017-01-0517-0/+19
| | | | | | Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <eric@anholt.net>
* glsl: run do_lower_jumps properly in do_common_optimizationsMarek Olšák2017-01-053-10/+3
| | | | | | so that backends don't have to run it manually Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Print VS output VUE map in Vulkan too.Kenneth Graunke2017-01-052-3/+5
| | | | | | | We need to move this to the shared layer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>