aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add st_context_iface::flush_resource to call FLUSH_VERTICESMarek Olšák2020-01-171-0/+7
|
* gallium/swr: Disable showing detected arch message.Krzysztof Raszkowski2020-01-173-16/+29
| | | | | | | | | When swr driver is in use it print detected architecture message to std::err. It can be harmfull when swr is using in multinodes environments. It can be enabled setting env var SWR_PRINT_INFO to 1. Reviewed-by: Jan Zielinski <[email protected]>
* util: call bind_sampler_states before setting sampler_viewsPierre-Eric Pelloux-Prayer2020-01-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the following valgrind error: Invalid read of size 16 at 0x28F458A1: si_set_sampler_view_desc (in radeonsi_drv_video.so) by 0x28F4657E: si_set_sampler_views (in radeonsi_drv_video.so) by 0x28D62BF5: util_compute_blit (in radeonsi_drv_video.so) by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so) by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so) by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0) Address 0x18142a10 is 0 bytes inside a block of size 48 free'd at 0x48369AB: free (vg_replace_malloc.c:540) by 0x28D62D51: util_compute_blit (in radeonsi_drv_video.so) by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so) by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so) by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0) Block was alloc'd at at 0x4837B65: calloc (vg_replace_malloc.c:762) by 0x28EFB2EC: si_create_sampler_state (in radeonsi_drv_video.so) by 0x28D62C30: util_compute_blit (in radeonsi_drv_video.so) by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so) by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so) by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0) Fixes: 69430d7e59e ("va: use a compute shader for the blit") Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2321 Reviewed-by: Marek Olšák <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
* lima: Fix alpha blendingAndreas Baierl2020-01-161-23/+101
| | | | | | | | | | | | | | | | | | | Introduce separate helper functions to set the blendfactor bits. Lima uses bits 0-2 for the type, bit 3 sets the inverted function and bit 4 is set if alpha is used. alpha_src_factor and alpha_dst_factor don't need the alpha bit, so they are masked with 0xf. There is only place for 4 bits anyway. If alpha_src_factor is PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE, we need to change it to PIPE_BLENDFACTOR_ONE first. This is exactly what the blob does and we pass all dEQP-GLES2.functional.fragment_ops.blend.* tests now. Better than the blob btw... Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Andreas Baierl <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
* iris: set depth stall enabled when depth flush enabled on gen12Tapani Pälli2020-01-161-0/+9
| | | | | | | | | This implements HW workaround #1409600907 for iris driver. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3378>
* iris: implement another workaround for non pipelined statesLionel Landwerlin2020-01-161-1/+14
| | | | | | | | v2: add comment (Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>
* iris: handle new PIPE_CONTROL fieldLionel Landwerlin2020-01-162-1/+6
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>
* lima: fix handling of reverse depth rangeVasily Khoruzhick2020-01-162-4/+16
| | | | | | | | | | | | | Looks like we need to handle cases when near > far and near == far. In first case we just need to swap near and far, and in second we need subtract epsilon from near if it's not zero. Fixes 10 tests in dEQP-GLES2.functional.depth_range.* Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3400> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3400>
* nvc0: disable xfb's which don't have a strideIlia Mirkin2020-01-151-4/+4
| | | | | | | | | | No stride / no attributes means that nothing is being written to the buffer. However it might still prevent primitives from being written out to the other buffers. Disabling it entirely seems to fix it. Fixes GTF-GL45.gtf30.GL3Tests.transform_feedback.transform_feedback_overflow Signed-off-by: Ilia Mirkin <[email protected]>
* lima/ppir: implement full liveness analysis for regallocErico Nunes2020-01-156-166/+359
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing liveness analysis in ppir still ultimately relies on a single continuous live_in and live_out range per register and was observed to be the bottleneck for register allocation on complicated examples with several control flow blocks. The use of live_in and live_out ranges was fine before ppir got control flow, but now it ends up creating unnecessary interferences as live_in and live_out ranges may span across entire blocks after blocks get placed sequentially. This new liveness analysis implementation generates a set of live variables at each program point; before and after each instruction and beginning and end of each block. This is a global analysis and propagates the sets of live registers across blocks independently of their sequence. The resulting sets optimally represent all variables that cannot share a register at each program point, so can be directly translated as interferences to the register allocator. Special care has to be taken with non-ssa registers. In order to properly define their live range, their alive components also need to be tracked. Therefore ppir can't use simple bitsets to keep track of live registers. The algorithm uses an auxiliary set data structure to keep track of the live registers. The initial implementation used only trivial arrays, however regalloc execution time was then prohibitive (>1minute on Cortex-A53) on extreme benchmarks with hundreds of instructions, hundreds of registers and several spilling iterations, mostly due to the n^2 complexity to generate the interferences from the live sets. Since the live registers set are only a very sparse subset of all registers at each instruction, iterating only over this subset allows it to run very fast again (a couple of seconds for the same benchmark). Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
* lima/ppir: remove orphan load node after cloningErico Nunes2020-01-153-1/+27
| | | | | | | | | | | | | | | There are some cases in shades using control flow where the varying load is cloned to every block, and then the original node is left orphan. This is not harmful for program execution, but it complicates analysis for register allocation as there is now a case of writing to a register that is never read. While ppir doesn't have a dead code elimination pass for its own optimizations and it is not hard to detect when we cloned the last load, let's remove it early. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
* iris: Print warning and return *out = NULL when fd to syncobj failsKristian H. Kristensen2020-01-151-1/+6
| | | | | Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Advertise PIPE_CAP_NATIVE_FENCE_FDKristian H. Kristensen2020-01-151-0/+1
| | | | | | Enables EGL_ANDROID_native_fence_sync. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Fix export of fences that have already completed.Kenneth Graunke2020-01-151-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | After flushing batches, iris_fence_flush() asks the kernel whether each batch's last_syncpt has already signalled or not. (The idea is that either the compute or render batch may not have actually had any work queued up, so last_syncpt there might have been signalled a long time ago.) If it's already completed, we don't bother to record it. A strange corner is the case of repeated flushes. For example, we might flush for some reason, and hit a glFlush(), and hit SwapBuffers. It's possible for all the batches to have been flushed previously, -and- for them to have actually completed. In this case, we'll see that there are no syncobj's to wait on, and record fence->count == 0. This works fine internally - fence_finish can see count == 0 and realize that it doesn't need to wait, for example. But when working with native FDs, we may be asked to export a fence with count == 0. So we need an actual synchronization primitive we can hand off. Because all of the relevant batches had been signalled when creating the fence, we want the new dummy fence to be signalled as well. So we just make a signalled syncobj and export it. Reviewed-by: Kristian H. Kristensen <[email protected]>
* android: Fix whitespace issueRobert Foss2020-01-151-1/+1
| | | | | Signed-off-by: Robert Foss <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* radeonsi: merge si_compile_llvm and si_llvm_compile functionsMarek Olšák2020-01-154-109/+81
| | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: remove useless #includesMarek Olšák2020-01-157-18/+0
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: move code for shader resources into si_shader_llvm_resources.cMarek Olšák2020-01-157-302/+327
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: move geometry shader code into si_shader_llvm_gs.cMarek Olšák2020-01-157-812/+865
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: remove llvm_type_is_64bitMarek Olšák2020-01-153-17/+7
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: move tessellation shader code into si_shader_llvm_tess.cMarek Olšák2020-01-156-1290/+1343
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: move si_insert_input_* functionsMarek Olšák2020-01-152-28/+28
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
* radeonsi: work around an LLVM crash when using llvm.amdgcn.icmp.i64.i1Marek Olšák2020-01-151-0/+1
| | | | | | | Cc: 19.2 19.3 <[email protected]> Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3338> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3338>
* radeonsi: fix si_build_wrapper_function for compute-based primitive cullingMarek Olšák2020-01-151-1/+14
| | | | | | | Fixes: 3b143369a55 "ac/nir, radv, radeonsi: Switch to using ac_shader_args" Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3338>
* radeonsi/gfx10: separate code for determining the number of vertices for NGGMarek Olšák2020-01-151-25/+41
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* radeonsi/gfx10: separate code for getting edgeflags from the ↵Marek Olšák2020-01-151-9/+13
| | | | | | gs_invocation_id VGPR Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* radeonsi: move VS_STATE.LS_OUT_PATCH_SIZE a few bits higher to make space thereMarek Olšák2020-01-153-5/+8
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* radeonsi: make si_insert_input_* functions non-staticMarek Olšák2020-01-152-9/+12
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* radeonsi: separate code computing info for small primitive cullingMarek Olšák2020-01-153-40/+54
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* radeonsi: drop the negation from fmask_is_not_identityPierre-Eric Pelloux-Prayer2020-01-154-5/+5
| | | | | | | | | | | This change eases code reading ("fmask_is_identity = true" is clearer than "fmask_is_not_identity = false"). Initialization is not changed so fmask_is_identity is false when a texture is created. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3174> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3174>
* radeonsi: unbind image before compute clearPierre-Eric Pelloux-Prayer2020-01-151-0/+5
| | | | | | | It's not used and avoid infinite recursion when used from si_compute_expand_fmask Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3174>
* radeonsi: make sure fmask expand is done if neededPierre-Eric Pelloux-Prayer2020-01-151-1/+2
| | | | | | | Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2248 Fixes: 095a58204d9 ("radeonsi: expand FMASK before MSAA image stores are used") Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3174>
* radeonsi: fix fmask expand compute shaderPierre-Eric Pelloux-Prayer2020-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'coord' variable was using TGSI_WRITEMASK_XYZ so subsequent uses of TGSI_WRITEMASK_W were dropped. The result for a 2 samples program was: 0: UMAD TEMP[0].xy, SV[1].xyyy, IMM[0].xxxx, SV[0].xyyy 1: STORE IMAGE[0], TEMP[0], TEMP[1], RESTRICT, 2D_MSAA 2: STORE IMAGE[0], TEMP[0], TEMP[2], RESTRICT, 2D_MSAA 3: END instead of the expected: 0: UMAD TEMP[0].xy, SV[1].xyyy, IMM[0].xxxx, SV[0].xyyy 1: MOV TEMP[0].w, IMM[0].yyyy 2: LOAD TEMP[1], IMAGE[0], TEMP[0], RESTRICT, 2D_MSAA 3: MOV TEMP[0].w, IMM[0].zzzz 4: LOAD TEMP[2], IMAGE[0], TEMP[0], RESTRICT, 2D_MSAA 5: MOV TEMP[0].w, IMM[0].yyyy 6: STORE IMAGE[0], TEMP[0], TEMP[1], RESTRICT, 2D_MSAA 7: MOV TEMP[0].w, IMM[0].zzzz 8: STORE IMAGE[0], TEMP[0], TEMP[2], RESTRICT, 2D_MSAA 9: END This fixes half of https://gitlab.freedesktop.org/mesa/mesa/issues/2248 Fixes: 095a58204d9 ("radeonsi: expand FMASK before MSAA image stores are used") Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3174>
* v3d: fix bug when checking result of syncobj fence importIago Toral Quiroga2020-01-151-1/+1
| | | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3383> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3383>
* st/mesa: don't lower YUV when driver supports it nativelyJonathan Marek2020-01-151-14/+14
| | | | | | | | | | This fixes YUYV support on etnaviv. Fixes: 7404833c "gallium: add handling for YUV planar surfaces" Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1896>
* st/dri: track if image is created by a dmabufGurchetan Singh2020-01-154-0/+12
| | | | | | | Will be used by EXT_EGL_image_storage later. Acked-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3375>
* radeonsi: move PS LLVM code into si_shader_llvm_ps.cMarek Olšák2020-01-147-1283/+1317
| | | | | | | | This is an attempt to clean up si_shader.c. v2: don't move code that is not specific to LLVM Reviewed-by: Timothy Arceri <[email protected]> (v1)
* radeonsi: remove always constant ballot_mask_bits from si_llvm_context_initMarek Olšák2020-01-143-10/+6
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: fold si_create_function into si_llvm_create_funcMarek Olšák2020-01-144-43/+30
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: rename si_shader_create -> si_create_shader_variant for clarityMarek Olšák2020-01-144-8/+10
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: rename si_compile_tgsi_main -> si_build_main_functionMarek Olšák2020-01-141-5/+5
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: clean up si_shader_infoMarek Olšák2020-01-143-131/+45
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: merge si_tessctrl_info into si_shader_infoMarek Olšák2020-01-144-23/+10
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: fork tgsi_shader_info and tgsi_tessctrl_infoMarek Olšák2020-01-1412-56/+205
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: rename si_shader_info -> si_shader_binary_infoMarek Olšák2020-01-141-2/+2
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: remove TGSI from commentsMarek Olšák2020-01-144-11/+9
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: rename DBG_NO_TGSI -> DBG_NO_NIRMarek Olšák2020-01-143-3/+3
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: don't adjust depth and stencil PS output locationsMarek Olšák2020-01-142-11/+3
| | | | | | this was for compatibility with TGSI Reviewed-by: Timothy Arceri <[email protected]>
* panfrost: Fix linear depth texturesAlyssa Rosenzweig2020-01-144-20/+26
| | | | | | | | | | | | | | | | As pointed out by Boris, what we were calling PAN_LINEAR depth textures was in fact u-interleaved tiled (!), but we never noticed since we flipped the flag used for sampling, leading to all sorts of fun bugs when attempting to directly acess depth textures from the CPU. Which begs the question -- if what we called LINEAR was tiled, how do we actually render linear depth textures? It turns out the flags for AFBC form a mali_block_format 2-bit code just like their render-target counterparts, so we can render to any of the above. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reported-by: Boris Brezillon <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
* radeonsi: actually enable VBOs in user SGPRsMarek Olšák2020-01-141-1/+1
| | | | | Fixes: 363b4027fcb - radeonsi: put up to 5 VBO descriptors into user SGPRs Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>