summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: try blitter for fd_resource_copy_region()Rob Clark2019-01-031-0/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: rework blit APIRob Clark2019-01-038-27/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First step to unify the way fd5 and fd6 blitter works. Currently a6xx bypasses the blit API in order to also accelerate resource_copy_region() But this approach can lead to infinite recursion: #0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291 #1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479 #2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243 #3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350 #4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 #6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864 #7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993 #8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546 #9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129 #10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326 #11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416 #12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516 #13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376 #14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 ... Instead rework the API to push the fallback back to core code, so that we can rework resource_copy_region() to have it's own fallback path, and then finally convert fd6 over to work in the same way. This also makes ctx->blit() optional, and cleans up some unnecessary callers. Signed-off-by: Rob Clark <[email protected]>
* freedreno: skip depth resolve if not writtenRob Clark2019-01-033-4/+14
| | | | | | | | | | | | For multi-pass rendering, it is common to keep the same depth buffer from previous pass, to discard geometry that would be hidden by later draws. In the later passes with depth-test enabled, but depth-write disabled, there is no reason to do gmem2mem resolve. TODO probably do something similar for stencil.. although stencil buffer isn't used as commonly these days Signed-off-by: Rob Clark <[email protected]>
* v3d: Refactor compiler entrypoints.Eric Anholt2019-01-021-26/+6
| | | | | | Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.
* v3d: Don't forget to include RT writes in precompiles.Eric Anholt2019-01-021-0/+10
| | | | | Looking at some assembly dumps for an optimization, we were clearly missing important parts of the shader!
* v3d: Fix segfault when failing to compile a program.Eric Anholt2019-01-021-2/+4
| | | | | | | We'll still fail at draw time, but this avoids a regression in shader-db execution once I enable TLB writes in precompiles. Fixes: b38e4d313fc2 ("v3d: Create a state uploader for packing our shaders together.")
* radeonsi: always unmap texture CPU mappings on 32-bit CPU architecturesMarek Olšák2019-01-021-0/+16
| | | | | | Team Fortress 2 32-bit version runs out of the CPU address space. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: remove unused variables in si_insert_input_ptrMarek Olšák2019-01-021-3/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: use u_decomposed_prims_for_vertices instead of u_prims_for_verticesMarek Olšák2019-01-021-1/+3
| | | | | | | It seems to be the same, but this doesn't use integer division with a variable divisor. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: make si_cp_wait_mem more configurableMarek Olšák2019-01-025-8/+8
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: call si_fix_resource_usage for the GS copy shader as wellMarek Olšák2019-01-021-0/+4
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: don't emit redundant PKT3_NUM_INSTANCES packetsMarek Olšák2019-01-022-2/+10
| | | | Tested-by: Dieter Nützel <[email protected]>
* st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the stTimothy Arceri2019-01-021-2/+0
| | | | | | | | | This will help the new opt introduced in the following patches allowing us to remove extra duplicate varyings. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: make use of ac_are_tessfactors_def_in_all_invocs()Timothy Arceri2019-01-021-8/+2
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove unrequired param in si_nir_scan_tess_ctrl()Timothy Arceri2019-01-023-3/+1
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: correctly walk instructions in tgsi_scan_tess_ctrl()Timothy Arceri2019-01-021-29/+43
| | | | | | | | | | | | The previous code used a do while loop and continues after walking a nested loop/if-statement. This means we end up evaluating the last instruction from the nested block against the while condition and potentially exit early if it matches the exit condition of the outer block. Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes") Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: fix loop exit point in tgsi_scan_tess_ctrl()Timothy Arceri2019-01-021-1/+1
| | | | | | | | | | | | | This just happened not to crash/assert because all loops have at least 1 if-statement and due to a second bug we end up matching the same ENDIF to exit both the iteration over the if-statment and the loop. The second bug is fixed in the following patch. Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes") Reviewed-by: Marek Olšák <[email protected]>
* nv30: disable rendering to 3D texturesIlia Mirkin2019-01-011-0/+6
| | | | | | | | | | | There's no way to tell the 3D engine about swizzling on such textures. While rendering to NPOT ones may be possible, there's no great way to expose that in gallium, nor would there be any practical benefit. Fixes the non-compressed-format "copyteximage 3D" failures. Something odd going on with the compressed formats. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix some s3tc layout issuesIlia Mirkin2018-12-302-7/+26
| | | | | | | | | | | | | | s3tc layouts are a bit finicky - they're packed, but not swizzled. Adjust logic to allow for that case: - Don't set a uniform pitch for POT-sized compressed textures - Adjust define_rect API to be less confused about block sizes - Only mark a texture as linear if it has a uniform pitch set This has been tested to fix xonotic (as well as the s3tc-* piglits) on nv3x and keeps it working on nv4x. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: use correct helper to get blocks in y directionIlia Mirkin2018-12-301-1/+1
| | | | | | | This doesn't matter since all compressed formats supported by this hardware use square blocks, but best to use the correct helper. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: add support for multi-layer transfersIlia Mirkin2018-12-301-4/+35
| | | | | | | | This logic mirrors what we do on nv50. The relatively new texture_subdata callback can cause this to happen with 3D textures, which is triggered at least by xonotic, and probably many piglits. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: fix rare issue with fp unbinding not finding the bufctxIlia Mirkin2018-12-301-1/+1
| | | | | | | | | | | | | If the last-active context gets deleted, the pushbuf doesn't have a bufctx to reference. Then there could be a sequence of binds which would trigger a reset on that bin before validation was done. Instead we just pass in the bufctx in question directly. All other instances of PUSH_RESET happen strictly after a validation is run. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: avoid setting user_priv without setting cur_ctxIlia Mirkin2018-12-301-3/+1
| | | | | | | | | | | | The whole user_priv thing is a mess, but as long as it's there, it basically has to map 1:1 to the cur_ctx. Unfortunately we were setting user_priv to some context, then that context could get deleted without any draws/validations in it, leading user_priv to become NULL, with cur_ctx still pointing at some old context. Then we wouldn't run the switch logic, which in turn led to a NULL bufctx being dereferenced. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <[email protected]>
* v3d: Add support for requesting the sample offsets.Eric Anholt2018-12-301-0/+22
|
* v3d: Hook up some shader-db output to GL_ARB_debug_output.Eric Anholt2018-12-301-0/+12
| | | | | | | This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.
* v3d: Add a "precompile" debug flag for shader-db.Eric Anholt2018-12-291-0/+76
| | | | | | | | | I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.
* meson: Override C++ standard to gnu++11 when building with altivec on ppc64Dylan Baker2018-12-281-0/+3
| | | | | | | | | | Otherwise there will be symbol collisions for the vector name. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108943 Distro Bug: https://bugs.gentoo.org/673622 Fixes: 42ea0631f108d82554339530d6c88aa1b448af1e ("meson: build clover") Acked-by: Matt Turner <[email protected]>
* radeonsi: Enable adaptive_sync by default for radeonNicholas Kazlauskas2018-12-281-0/+4
| | | | | | | | | It's better to let most applications make use of adaptive sync by default. Problematic applications can be placed on the blacklist or the user can manually disable the feature. Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]>
* etnaviv: Consolidate buffer references from framebuffersTomeu Vizoso2018-12-283-10/+9
| | | | | | | | | | | | | | | | | | We were leaking surfaces because the references taken in etna_set_framebuffer_state weren't being released on context destroy. Instead of just directly releasing those references in etna_context_destroy, use the util_copy_framebuffer_state helper. Take the chance to remove the duplicated buffer references in compiled_framebuffer_state to avoid confusion. The leak can be reproduced with a client that continuously creates and destroys contexts. Signed-off-by: Tomeu Vizoso <[email protected]> Reported-by: Sjoerd Simons <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* virgl/vtest: fix front buffer flush with protocol version 0.Dave Airlie2018-12-281-1/+1
| | | | | | | | Older versions of virglrenderer before 33da7361aec486290df0aec4ad8dfa8ff6adde2c in vtest mode, misrender gears. Fixes: 9d81cd8e7c (virgl: Pass resource size and transfer offsets) Reviewed-By: Gert Wollny <[email protected]>
* nv50,nvc0: add missing CAPs for unsupported featuresIlia Mirkin2018-12-262-0/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: enable GL_NV_shader_atomic_float on pre-MaxwellIlia Mirkin2018-12-261-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add support for converting ATOMFADD to proper irIlia Mirkin2018-12-261-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_TGSI_ATOMFADD to indicate supportIlia Mirkin2018-12-263-0/+4
| | | | | | | | ATOMFADD is a little special -- make drivers have to specify it explicitly. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add ATOMFADD operationIlia Mirkin2018-12-266-2/+23
| | | | | | | | This is supported by at least NVIDIA hardware, and exposeable via GL extensions. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/ttn: Fix setup of outputs_written.Eric Anholt2018-12-261-1/+1
| | | | | | | | | We need a 64-bit value, otherwise we only handle the low 32, and happen to sign-extend to claim to write all varying slots if VARYING_SLOT_VAR2 was used. Fixes: 4d0b2c7aaac3 ("ttn: Update shader->info as we generate code.") Reviewed-by: Rob Clark <[email protected]>
* st/nine: Increase the limit of cached ff shadersAxel Davy2018-12-231-2/+2
| | | | | | | | 100 is too small for some games, which triggers recompilations every frame. Increase to 1024. Signed-off-by: Axel Davy <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* st/nine: Add src reference to nine_context_range_uploadAxel Davy2018-12-233-1/+8
| | | | | | | | | | Just like nine_context_box_upload, nine_context_range_upload should reference the src, which holds the ram source buffer. Fixes: https://github.com/iXit/Mesa-3D/issues/327 Signed-off-by: Axel Davy <[email protected]> Tested-by: Dieter Nützel <[email protected]> Cc: [email protected]
* st/nine: Bind src not dst in nine_context_box_uploadAxel Davy2018-12-234-6/+6
| | | | | | | | | | | | nine_context_box_upload uploads a ram buffer (from src) to a pipe_resource (dst). We already have a refcount on the pipe_resource, what needs to be protected from release is the ram buffer, thus a reference to src. Signed-off-by: Axel Davy <[email protected]> Tested-by: Dieter Nützel <[email protected]> Cc: [email protected]
* st/nine: Fix volumetexture dtor on ctor failureAxel Davy2018-12-231-1/+2
| | | | | | | | | | The dtor is called on allocation failure, thus we must check the volumes are allocated before trying to release them. Signed-off-by: Axel Davy <[email protected]> Tested-by: Dieter Nützel <[email protected]> Cc: [email protected]
* st/nine: Switch to presentation buffer if resize is detectedAxel Davy2018-12-231-1/+36
| | | | | | | | | This enables to match the window size on resize on all cases, as it only works currently with presentation buffers. Signed-off-by: Axel Davy <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* st/nine: Use helper to release swapchain buffers laterAxel Davy2018-12-232-8/+42
| | | | | | | | | | | This patch introduces a structure to release the present_handles only when they are fully released by the server, thus making "DestroyD3DWindowBuffer" actually release the buffer right away when called. Signed-off-by: Axel Davy <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* freedreno/a6xx: fix 3d texture layoutRob Clark2018-12-223-3/+15
| | | | | | | Maybe not 100% perfect, but seems to be a pretty good approximation of that. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-12-227-21/+28
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: improve setup_slices() debug msgsRob Clark2018-12-221-6/+5
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: simplify special case for 3d layoutRob Clark2018-12-221-9/+10
| | | | | | | | This logic can be re-written as the two cases for 3d (ie. before/after the miplevel sizes start reducing) vs everything else. I think it is easier to read this way. Signed-off-by: Rob Clark <[email protected]>
* freedreno: combine fd_resource_layer_offset()/fd_resource_offset()Rob Clark2018-12-221-13/+2
| | | | | | We really only need this logic in one place. Signed-off-by: Rob Clark <[email protected]>
* gallivm: abort when trying to use non-existing intrinsicRoland Scheidegger2018-12-211-0/+10
| | | | | | | | | | | Whenever llvm removes an intrinsic (we're using), we're hitting segfaults due to llvm doing calls to address 0 in the jitted code instead. However, Jose figured out we can actually detect this with LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder what got broken. (Of course, someone still needs to fix the code to no longer use this intrinsic.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use pavg.b intrinsic on llvm >= 6.0Roland Scheidegger2018-12-212-51/+95
| | | | | | | | | | | | | | | | | | | | | This intrinsic disppeared with llvm 6.0, using it ends up in segfaults (due to llvm issuing call to NULL address in the jited shaders). Add code doing the same thing as the autoupgrade code in llvm so it can be matched and replaced back with a pavgb. While here, also improve lp_test_format, so it tests both with and without cache (as it was, it tested the cache versions only, whereas cache is actually disabled in llvmpipe, and in any case even with it enabled vertex and geometry shaders wouldn't use it). (Although at least for the unorm8 uncached fetch, the code is still quite different to what llvmpipe is using, since that would use unorm8x16 type, whereas the test code is using unorm8x4 type, hence disabling some intrinsic paths.) Fixes: 6f4083143bb8 ("gallivm: use llvm jit code for decoding s3tc") Reviewed-by: Jose Fonseca <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* pipe-loader: meson: reference correct libraryEmil Velikov2018-12-131-1/+1
| | | | | | | | The library is called libgalliumvl_stub - note singular. Fixes: 42ea0631f10 ("meson: build clover") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]>