aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add PIPE_FORMAT_R10G10B10X2_UNORMNicolai Hähnle2017-10-022-0/+7
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/vl: don't use the template keywordMarek Olšák2017-09-301-14/+14
| | | | | | for C++ editors Reviewed-by: Brian Paul <[email protected]>
* gallium: add new LOD opcodeRoland Scheidegger2017-09-303-4/+59
| | | | | | | | | | The operation performed is all the same as LODQ, but with the usual differences between dx10 and GL texture opcodes, that is separate resource and sampler indices (plus result swizzling, and setting z/w channels to zero). Reviewed-by: Jose Fonseca <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-295-2/+20
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: infer that dst[1] of DFRACEXP is an integerNicolai Hähnle2017-09-294-5/+8
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallivm: add support for TGSI instructions with two outputsNicolai Hähnle2017-09-293-1/+31
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallivm: add dst register index to lp_build_tgsi_context::emit_storeNicolai Hähnle2017-09-293-9/+9
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: clarify the semantics of DFRACEXPNicolai Hähnle2017-09-292-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The status quo is quite the mess: 1. tgsi_exec will do a per-channel computation, and store the dst[0] result (significand) correctly for each channel. The dst[1] result (exponent) will be written to the first bit set in the writemask. So per-component calculation only works partially. 2. r600 will only do a single computation. It will replicate the exponent but not the significand. 3. The docs pretend that there's per-component calculation, but even get dst[0] and dst[1] confused. 4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions, and kind-of assumes that everything is replicated, generating this for the dvec4 case: DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw Settle on the simplest behavior, which is single-component calculation with replication, document it, and adjust tgsi_exec and r600. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: infer that DLDEXP's second source has an integer typeNicolai Hähnle2017-09-294-7/+11
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium/util: use new util_vasprintf() functionBrian Paul2017-09-281-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Weaken assertion about u_mm's align2 field.Eric Anholt2017-09-261-1/+4
| | | | | | | | | vc5 MMU mappings are access-controlled at a 128kb boundary, so the 4kb here was too small for that purpose. Allowing any valid align2 value that u_mm's 32-bit addressing can represent will still catch most cases of people passing in a byte alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* vl/compositor: convert RGB buffer to YUV with color conversionLeo Liu2017-09-252-0/+81
| | | | Acked-by: Christian König <[email protected]>
* vl/csc: add a RGB to YUV CSC matrixLeo Liu2017-09-252-1/+11
| | | | Acked-by: Christian König <[email protected]>
* vl/compositor: create RGB to YUV fragment shaderLeo Liu2017-09-252-0/+51
| | | | Acked-by: Christian König <[email protected]>
* vl/compositor: add Bob top and bottom to YUV deint functionLeo Liu2017-09-251-6/+28
| | | | Acked-by: Christian König <[email protected]>
* vl/compositor: remove vl_compositor_yuv_deint() functionLeo Liu2017-09-252-40/+0
| | | | | | No longer used. Acked-by: Christian König <[email protected]>
* vl/compositor: add a new function for YUV deintLeo Liu2017-09-252-0/+42
| | | | | | | It will replace previous deint function with abilities of scaling and field deinterlacing Acked-by: Christian König <[email protected]>
* vl/compositor: extend YUV deint function to do field deintLeo Liu2017-09-252-12/+26
| | | | | | It will add Bob deint ability to interlaced video for HW encoder Acked-by: Christian König <[email protected]>
* vl/compositor: separate YUV part from shader video buffer functionLeo Liu2017-09-251-13/+18
| | | | | | So that it can be re-used Acked-by: Christian König <[email protected]>
* gallium/util: Remove unused keymapThomas Helland2017-09-213-388/+0
| | | | | | | | | | This is not used anywhere in the codebase. It's a hashtable implementation that is based around cso_hash, and is therefore (and as mentioned in a comment in the source) quite similar to u_hash_table. CC: Brian Paul<[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-212-0/+2
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe, gallivm: implement lod queries (LODQ opcode)Roland Scheidegger2017-09-204-57/+143
| | | | | | | | | | | | | | | | | | | | | | This uses all the existing code to calculate lod values for mip linear filtering. Though we'll have to disable the simplifications (if we know some parts of the lod calculation won't actually matter for filtering purposes due to mip clamps etc.). For better or worse, we'll also disable lod calculation hacks (mostly should make a difference for cube maps) always - the issue with per-pixel lod being difficult is mostly because we then have different mipmaps needed for the actual texel fetch, which isn't a problem with lodq. We still use approximation for the log2 - for that reason I believe the float part of the lod is only accurate to about 4-5 bits (and one bit less with 1d textures actually) which is hopefully good enough (though d3d10 technically requires 6 bits - could use quadratic interpolation instead of linear to get 8 bits or so). Since lodq requires unclamped lod, we also have to move some sampler key calculations to texture sampling code - even if we know we're going to access mipmap 0 we still have to calculate lod and apply lod_bias for lodq. Passes piglit ARB_texture_query_lod tests (after having fixed the test). Reviewed-by: Jose Fonseca <[email protected]>
* ttn: Fix out-of-bounds accesses since the always-2D-constants change.Eric Anholt2017-09-181-2/+3
| | | | | | | | | | Only one of the three checks for dim was updated, so we would try to set a UBO buffer index source value on a nir_load_uniform, and wouldn't actually declare non-UBO uniforms. Fixes: 37dd8e8dee1d ("gallium: all drivers should accept two-dimensional constant buffer indexing") Tested-by: Derek Foreman <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-182-0/+4
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVENicolai Hähnle2017-09-182-0/+2
| | | | | | | | | | | | | | | | | To be able to properly distinguish between GL_ANY_SAMPLES_PASSED and GL_ANY_SAMPLES_PASSED_CONSERVATIVE. This patch goes through all drivers, having them treat the two query types identically, except: 1. radeon incorrectly enabled conservative mode on PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE. 2. st/mesa uses the new query type. Fixes dEQP-GLES31.functional.fbo.no_attachments.* Reviewed-by: Marek Olšák <[email protected]>
* gallium: add CONSTBUF type to tgsi_file_typeTimothy Arceri2017-09-151-0/+1
| | | | | | | This will be use to distinguish between load types when using the TGSI_OPCODE_LOAD opcode. Reviewed-by: Marek Olšák <[email protected]>
* gallium/{r600, radeonsi}: Fix segfault with color format (v2)Denis Pauk2017-09-141-0/+4
| | | | | | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552 v2: Patch cleanup proposed by Nicolai Hähnle. * deleted changes in si_translate_texformat. Cc: Nicolai Hähnle <[email protected]> Cc: Ilia Mirkin <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák2017-09-111-2/+0
| | | | | | | | | | This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: add a new pass that analyzes tess factor writes (v2)Marek Olšák2017-09-112-0/+235
| | | | | | | | | | | | | | | | | | | The pass tries to deduce whether tess factors are always written by all shader invocations. The implication for radeonsi is that it doesn't have to use a barrier near the end of TCS, and doesn't have to use LDS for passing the tess factors to the epilog. v2: Handle barriers and do the analysis pass for each code segment surrounded by barriers separately, and AND results from all such segments writing tess factors. The change is trivial in the main switch statement. Also, the result is renamed to "tessfactors_are_def_in_all_invocs" to make the name accurate. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: use UTIL_BLITTER_ATTRIB_NONE (0) instead of 0 directlyMarek Olšák2017-09-111-2/+2
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: don't pass GENERIC in VS if it's not neededMarek Olšák2017-09-111-17/+45
| | | | | | | Now, depth-only clears and custom passes don't read memory in VS. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: use draw_rectangle for all blits except cubemapsMarek Olšák2017-09-112-88/+98
| | | | | | | Add ZW coordinates to the draw_rectangle callback and use it. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: use draw_rectangle callback for layered clearsMarek Olšák2017-09-112-28/+29
| | | | | | | They are done with instancing. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: add new union blitter_attrib to replace pipe_color_unionMarek Olšák2017-09-112-52/+53
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* llvmpipe, draw: improve shader cache debuggingRoland Scheidegger2017-09-092-22/+43
| | | | | | | | | | | | | | | | | With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage whenever we have to evict shader variants. Also add some output when shaders are deleted (but not with the perf setting to keep this one less noisy). While here, also don't delete that many shaders when we have to evict. For fs, there's potentially some cost if we have to evict due to the required flush, however certainly shader recompiles have a high cost too so I don't think evicting one quarter of the cache size makes sense (and, if we're evicting based on IR count, we probably typically evict only very few or just one shader too). For vs, I'm not sure it even makes sense to evict more than one shader at a time, but keep the logic the same for now. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: fix gather implementation a bitRoland Scheidegger2017-09-091-10/+48
| | | | | | | | | | | | | | | | | | gather is defined in terms of bilinear filtering, just without the filtering part. However, there's actually some subtle differences required in our implementation, because we use some tricks to simplify coord wrapping for the two coords per direction. For bilinear filtering, we don't care if we end up with an incorrect texel, as long as the filter weight is 0.0 for it. Likewise, the order of the texels doesn't actually matter (as long as they still have the correct filter weight). But for gather, these tricks lead to incorrect results. Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix right now, and noone really seems to care...). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* vl/compositor: make vl_compositor_set_yuv_layer() staticLeo Liu2017-09-072-44/+28
| | | | | | | Since it's no longer being called outside of compositor Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl/compositor: make a helper function for YUV deinterlacingLeo Liu2017-09-072-0/+40
| | | | | | | | The similar function is in OMX, and only used by OMX. Now have it moved to vl/compositor for other state tracker to use later. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* llvmpipe, tgsi: hook up dx10 gather4 opcodeRoland Scheidegger2017-09-072-8/+25
| | | | | | | | | Trivial. We already support tg4 for legacy tex opcodes, so the actual texture sampling code already handles it. (Just like TG4, we don't handle additional capabilities and always sample red channel.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe, draw: increase shader cache limitsRoland Scheidegger2017-09-071-1/+1
| | | | | | | | | | | | | | | We're not particularly concerned with memory usage, if the tradeoff is shader recompiles. And it's common for apps to have a lot of shaders nowadays (and, since our shaders include a LOT of context state of course we may create quite a bit more shaders even). So quadruple the amount of shaders draw will cache (from 128 to 512). For llvmpipe (fs shaders) quadruple the number of instructions, keep the number of variants the same for now (only with very simple, non-texturing shaders the variant limit could really be reached), and simplify the definition, it's probably easier to just have one different definition per branch... Reviewed-by: Jose Fonseca <[email protected]>
* gallium/tests: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-2/+2
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* pp: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-10/+10
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/hud: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-4/+4
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* tgsi/build: always generate two-dimensional constant file accessesNicolai Hähnle2017-09-042-31/+45
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* tgsi/ureg: always emit constants (and their decls) as 2DNicolai Hähnle2017-09-041-15/+7
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium: all drivers should accept two-dimensional constant buffer indexingNicolai Hähnle2017-09-041-1/+1
| | | | | | | | | Most older drivers seem to just ignore the Dimension setting, so virtually no changes should be needed. Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* llvmpipe: lp_build_gather_elem_vec BE fix for 3x16 loadBen Crocker2017-09-011-2/+28
| | | | | | | | | | | | | | | | | | | | | | Fix loading of a 3x16 vector as a single 48-bit load on big-endian systems (PPC64, S390). Roland Scheidegger's commit e827d9175675aaa6cfc0b981e2a80685fb7b3a74 plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000. This patch fixes three of the four regressions observed by Ray: - draw-vertices - draw-vertices-half-float - draw-vertices-half-float_gles2 One regression remains: - draw-vertices-2101010 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613 Cc: "17.2" "17.1" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: correct channel shift logic on big endianRay Strode2017-09-011-1/+7
| | | | | | | | | | | | | | | | | | | lp_build_fetch_rgba_soa fetches a texel from a texture. Part of that process involves first gathering the element together from memory into a packed format, and then breaking out the individual color channels into separate, parallel arrays. The code fails to account for endianess when reading the packed values. This commit attempts to correct the problem by reversing the order the packed values are read on big endian systems. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613 Cc: "17.2" "17.1" <[email protected]> Signed-off-by: Ray Strode <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/u_threaded: rename IGNORE_VALID_RANGE -> NO_INFER_UNSYNCHRONIZEDMarek Olšák2017-08-282-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_threaded: disallow discard_range if map_buffer is unsynchronizedMarek Olšák2017-08-281-1/+3
| | | | | | | The discard range codepath takes precedence, so if we get both unsynchronized and discard_range, choose unsynchronized. Reviewed-by: Nicolai Hähnle <[email protected]>