summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* swr/rast: Remove hardcoded clip/cull slot from clipperTim Rowley2017-09-131-14/+21
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Start to remove hardcoded clipcull_dist vertex attrib slotTim Rowley2017-09-133-8/+15
| | | | | | | | Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the start of the clip/cull section of the vertex header. Removed use of hardcoded slot from binner. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Move clip/cull enables in APITim Rowley2017-09-139-40/+40
| | | | | | Moved from from SWR_RASTSTATE to SWR_BACKEND_STATE. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add new API SwrStallBETim Rowley2017-09-132-0/+17
| | | | | | | | SwrStallBE stalls the backend threads until all work submitted before the stall has finished. The frontend threads can continue to make forward progress. Reviewed-by: Bruce Cherniak <[email protected]>
* Revert "winsys/amdgpu: disable local BOs on Raven"Marek Olšák2017-09-121-2/+1
| | | | | | This reverts commit 1cda9a2fee05effd9c64bd773bc6005281593662. It works now.
* radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák2017-09-115-30/+89
| | | | | | | | | | This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: add a new pass that analyzes tess factor writes (v2)Marek Olšák2017-09-112-0/+235
| | | | | | | | | | | | | | | | | | | The pass tries to deduce whether tess factors are always written by all shader invocations. The implication for radeonsi is that it doesn't have to use a barrier near the end of TCS, and doesn't have to use LDS for passing the tess factors to the epilog. v2: Handle barriers and do the analysis pass for each code segment surrounded by barriers separately, and AND results from all such segments writing tess factors. The change is trivial in the main switch statement. Also, the result is renamed to "tessfactors_are_def_in_all_invocs" to make the name accurate. Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: use the new raw CS APIMarek Olšák2017-09-112-77/+93
| | | | | | This also cleans things up. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement pipe_context::fence_server_syncMarek Olšák2017-09-113-0/+68
| | | | | | This will be more useful once we have sync_file support. Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: factor out some fence dependency code into separate functionsMarek Olšák2017-09-111-21/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: rename fence_dependency functionsMarek Olšák2017-09-111-12/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add a proper fail path for calloc in r600_flush_from_stMarek Olšák2017-09-111-3/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: don't allow interprocess resource sharing for IBsMarek Olšák2017-09-111-1/+2
| | | | | | | Now we should get IB submissions with bo_list == NULL when DRI buffers aren't referenced. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix interprocess resource sharing on RavenMarek Olšák2017-09-111-1/+3
| | | | | | This kinda fragiile, but it at least unbreaks the driver. Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: handle the non-TXF_LZ support path.Dave Airlie2017-09-111-1/+1
| | | | | | | it appears that texcoord.z/w will be 0 in all cases already, so just put them into the vbo always. Signed-off-by: Dave Airlie <[email protected]>
* gallium/u_blitter: use UTIL_BLITTER_ATTRIB_NONE (0) instead of 0 directlyMarek Olšák2017-09-111-2/+2
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: don't pass GENERIC in VS if it's not neededMarek Olšák2017-09-111-17/+45
| | | | | | | Now, depth-only clears and custom passes don't read memory in VS. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: use draw_rectangle for all blits except cubemapsMarek Olšák2017-09-114-92/+107
| | | | | | | Add ZW coordinates to the draw_rectangle callback and use it. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: use draw_rectangle callback for layered clearsMarek Olšák2017-09-116-36/+47
| | | | | | | They are done with instancing. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: add new union blitter_attrib to replace pipe_color_unionMarek Olšák2017-09-116-71/+72
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/radeon: use rectangles for 1D and 2D texture blitsMarek Olšák2017-09-111-7/+13
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* llvmpipe, draw: improve shader cache debuggingRoland Scheidegger2017-09-093-31/+59
| | | | | | | | | | | | | | | | | With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage whenever we have to evict shader variants. Also add some output when shaders are deleted (but not with the perf setting to keep this one less noisy). While here, also don't delete that many shaders when we have to evict. For fs, there's potentially some cost if we have to evict due to the required flush, however certainly shader recompiles have a high cost too so I don't think evicting one quarter of the cache size makes sense (and, if we're evicting based on IR count, we probably typically evict only very few or just one shader too). For vs, I'm not sure it even makes sense to evict more than one shader at a time, but keep the logic the same for now. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: enable PIPE_CAP_QUERY_PIPELINE_STATISTICSRoland Scheidegger2017-09-091-1/+1
| | | | | | | | | | | | | | | | This was implemented since forever, but not enabled. It passes all piglit tests except one, arb_pipeline_statistics_query-frag. The reason is that the test (for drawing a 10x10 rect) expects between 100 and 150 pixel shader invocations. But since llvmpipe counts this with 4x4 granularity (and due to the rect being 2 tris) we end up with 224 invocations. I believe however what llvmpipe is doing violates neither the spirit nor the letter of the spec (our fragment shader granularity really is 4x4 pixels, albeit we will bail out early on 2x2 or 4x2 (the latter if AVX is available) granularity), the spec allows to count additional invocations due to implementation reasons. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix gather implementation a bitRoland Scheidegger2017-09-091-10/+48
| | | | | | | | | | | | | | | | | | gather is defined in terms of bilinear filtering, just without the filtering part. However, there's actually some subtle differences required in our implementation, because we use some tricks to simplify coord wrapping for the two coords per direction. For bilinear filtering, we don't care if we end up with an incorrect texel, as long as the filter weight is 0.0 for it. Likewise, the order of the texels doesn't actually matter (as long as they still have the correct filter weight). But for gather, these tricks lead to incorrect results. Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix right now, and noone really seems to care...). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: abort shader translation upon indirect indexing of temporariesCharmaine Lee2017-09-081-0/+6
| | | | | | | | | | This patch aborts shader translation upon indirect indexing of temporary register on non-vgpu10 device. This prevents non-supported feature sending to the device. Tested wth MTT-piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* gallium/tests: use ARRAY_SIZE macroEric Engestrom2017-09-083-3/+9
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r300: use ARRAY_SIZE macroEric Engestrom2017-09-081-1/+3
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: move the guts of ARB_shader_group_vote emission to acConnor Abbott2017-09-081-21/+3
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move si_emit_ballot() to acConnor Abbott2017-09-081-32/+6
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move emit_optimization_barrier() to acConnor Abbott2017-09-081-43/+2
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: move llvm_get_type_size() to acConnor Abbott2017-09-081-34/+9
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* Revert "st/va: add enviromental variable to disable interlace"Leo Liu2017-09-071-4/+0
| | | | | | | | This reverts commit 10dec2de2d9f568675d66d736b48701fa26f7b50. The environment variable is no longer needed with the previous change Reviewed-by: Christian König <[email protected]>
* st/va: move YUV content to deinterlaced buffer when reallocated for encoderLeo Liu2017-09-071-1/+10
| | | | | | | | v2: use deinterlace common function v3: make sure deinterlace only Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: reallocate the buffer if the layout isn't supportedLeo Liu2017-09-071-9/+12
| | | | | | | | So that it makes more clear for buffer reallocation based on buffers layout for both decoder and encoder. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl/compositor: make vl_compositor_set_yuv_layer() staticLeo Liu2017-09-072-44/+28
| | | | | | | Since it's no longer being called outside of compositor Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/omx: use vl/compositor helper function for YUV deinterlacingLeo Liu2017-09-071-30/+2
| | | | | | | v2: separate helper function in different patch Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl/compositor: make a helper function for YUV deinterlacingLeo Liu2017-09-072-0/+40
| | | | | | | | The similar function is in OMX, and only used by OMX. Now have it moved to vl/compositor for other state tracker to use later. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* ac/surface: add radeon_surf::has_stencil for convenienceMarek Olšák2017-09-078-11/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPRMarek Olšák2017-09-071-6/+14
| | | | | | | Same as before, writing TCS outputs to LDS is rare. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPRMarek Olšák2017-09-073-6/+20
| | | | | | | TCS outputs are usually not written to LDS, so no stats here. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HSMarek Olšák2017-09-072-1/+11
| | | | | | | -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read the LS output vertex stride from an SGPR in LSMarek Olšák2017-09-071-4/+21
| | | | | | | | | Now it's able to generate ds_write2_b64 instead of ds_write2_b32. -20 bytes in one shader binary. (having only 1 output) Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't read the number of TCS out vertices from an SGPR in TCSMarek Olšák2017-09-071-2/+15
| | | | | | | -16 bytes in one shader binary. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't always apply the PrimID instancing bug workaround on SIMarek Olšák2017-09-071-1/+1
| | | | | | | | It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should have fixed the perf regression didn't really change much if anything. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove 2 callbacks from si_shader_contextMarek Olšák2017-09-073-17/+13
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: disable local BOs on RavenMarek Olšák2017-09-071-1/+2
| | | | | | It hangs with a high degree of reproducibility. Acked-by: Nicolai Hähnle <[email protected]>
* llvmpipe, tgsi: hook up dx10 gather4 opcodeRoland Scheidegger2017-09-072-8/+25
| | | | | | | | | Trivial. We already support tg4 for legacy tex opcodes, so the actual texture sampling code already handles it. (Just like TG4, we don't handle additional capabilities and always sample red channel.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe, draw: increase shader cache limitsRoland Scheidegger2017-09-072-4/+2
| | | | | | | | | | | | | | | We're not particularly concerned with memory usage, if the tradeoff is shader recompiles. And it's common for apps to have a lot of shaders nowadays (and, since our shaders include a LOT of context state of course we may create quite a bit more shaders even). So quadruple the amount of shaders draw will cache (from 128 to 512). For llvmpipe (fs shaders) quadruple the number of instructions, keep the number of variants the same for now (only with very simple, non-texturing shaders the variant limit could really be reached), and simplify the definition, it's probably easier to just have one different definition per branch... Reviewed-by: Jose Fonseca <[email protected]>
* radeon/uvd: fix the assertion check for YUYV formatLeo Liu2017-09-061-3/+5
| | | | | | | Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer") Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib typesTim Rowley2017-09-063-1189/+446
| | | | Reviewed-by: Bruce Cherniak <[email protected]>