aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* gallivm/ppc64le: adjust VSX code generation control.Ben Crocker2017-10-051-7/+30
| | | | | | | | | | | | | | | | | | | | | | | In lp_build_create_jit_compiler_for_module(), advance the minimum version of LLVM for VSX code generation to 4.0; this is the minimum revision at which several known VSX code generation bugs are fixed: https://llvm.org/bugs/show_bug.cgi?id=25503 (fixed in 3.8.1) https://llvm.org/bugs/show_bug.cgi?id=26775 (fixed in 3.8.1) https://llvm.org/bugs/show_bug.cgi?id=33531 (fixed in 4.0) An llc performance bug introduced in LLVM 4.0, https://llvm.org/bugs/show_bug.cgi?id=34647 is still pending as of LLVM 5.0, but only has a pronounced effect on one of the Piglit tests: ext_transform_feedback-max-varyings. All changes tested via Piglit. Cc: "17.2" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* gallivm: allow additional llc optionsBen Crocker2017-10-051-0/+23
| | | | | | | | | | | | | | | | In init_native_targets, allow the passing of additional options to the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control. This option is available only #ifdef DEBUG, initially. At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions() declaration. v2: Fix compile error with old llvm versions (sroland) Cc: "17.2" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix typo in debug_printf messageBen Crocker2017-10-051-1/+1
| | | | | | | | | | In gallivm_compile_module, fix a typo in the debug_printf("Invoke as \"llc ..." message. Cc: "17.2" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: silence 'variable may be used uninitialized' warningsBrian Paul2017-10-031-1/+1
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* gallium/u_tests: fix ifdef for sync_file fencesGeorge Kyriazis2017-10-031-1/+1
| | | | | | include libsync.h only when libdrm is compiled in Reviewed-by: Marek Olšák <[email protected]>
* mesa: Remove force_s3tc_enable driconf variableMatt Turner2017-10-021-1/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: Remove util_format_s3tc_init()Matt Turner2017-10-022-78/+5
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: Remove util_format_s3tc_enabledMatt Turner2017-10-023-59/+1
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium/u_tests: test sync_file fencesMarek Olšák2017-10-031-0/+101
| | | | | | | This should be sufficient for testing all kernel/libdrm/radeonsi codepaths that are used by radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_FORMAT_R10G10B10X2_UNORMNicolai Hähnle2017-10-022-0/+7
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/vl: don't use the template keywordMarek Olšák2017-09-301-14/+14
| | | | | | for C++ editors Reviewed-by: Brian Paul <[email protected]>
* gallium: add new LOD opcodeRoland Scheidegger2017-09-303-4/+59
| | | | | | | | | | The operation performed is all the same as LODQ, but with the usual differences between dx10 and GL texture opcodes, that is separate resource and sampler indices (plus result swizzling, and setting z/w channels to zero). Reviewed-by: Jose Fonseca <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-295-2/+20
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: infer that dst[1] of DFRACEXP is an integerNicolai Hähnle2017-09-294-5/+8
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallivm: add support for TGSI instructions with two outputsNicolai Hähnle2017-09-293-1/+31
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallivm: add dst register index to lp_build_tgsi_context::emit_storeNicolai Hähnle2017-09-293-9/+9
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: clarify the semantics of DFRACEXPNicolai Hähnle2017-09-292-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The status quo is quite the mess: 1. tgsi_exec will do a per-channel computation, and store the dst[0] result (significand) correctly for each channel. The dst[1] result (exponent) will be written to the first bit set in the writemask. So per-component calculation only works partially. 2. r600 will only do a single computation. It will replicate the exponent but not the significand. 3. The docs pretend that there's per-component calculation, but even get dst[0] and dst[1] confused. 4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions, and kind-of assumes that everything is replicated, generating this for the dvec4 case: DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw Settle on the simplest behavior, which is single-component calculation with replication, document it, and adjust tgsi_exec and r600. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: infer that DLDEXP's second source has an integer typeNicolai Hähnle2017-09-294-7/+11
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium/util: use new util_vasprintf() functionBrian Paul2017-09-281-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Weaken assertion about u_mm's align2 field.Eric Anholt2017-09-261-1/+4
| | | | | | | | | vc5 MMU mappings are access-controlled at a 128kb boundary, so the 4kb here was too small for that purpose. Allowing any valid align2 value that u_mm's 32-bit addressing can represent will still catch most cases of people passing in a byte alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* vl/compositor: convert RGB buffer to YUV with color conversionLeo Liu2017-09-252-0/+81
| | | | Acked-by: Christian König <[email protected]>
* vl/csc: add a RGB to YUV CSC matrixLeo Liu2017-09-252-1/+11
| | | | Acked-by: Christian König <[email protected]>
* vl/compositor: create RGB to YUV fragment shaderLeo Liu2017-09-252-0/+51
| | | | Acked-by: Christian König <[email protected]>
* vl/compositor: add Bob top and bottom to YUV deint functionLeo Liu2017-09-251-6/+28
| | | | Acked-by: Christian König <[email protected]>
* vl/compositor: remove vl_compositor_yuv_deint() functionLeo Liu2017-09-252-40/+0
| | | | | | No longer used. Acked-by: Christian König <[email protected]>
* vl/compositor: add a new function for YUV deintLeo Liu2017-09-252-0/+42
| | | | | | | It will replace previous deint function with abilities of scaling and field deinterlacing Acked-by: Christian König <[email protected]>
* vl/compositor: extend YUV deint function to do field deintLeo Liu2017-09-252-12/+26
| | | | | | It will add Bob deint ability to interlaced video for HW encoder Acked-by: Christian König <[email protected]>
* vl/compositor: separate YUV part from shader video buffer functionLeo Liu2017-09-251-13/+18
| | | | | | So that it can be re-used Acked-by: Christian König <[email protected]>
* gallium/util: Remove unused keymapThomas Helland2017-09-213-388/+0
| | | | | | | | | | This is not used anywhere in the codebase. It's a hashtable implementation that is based around cso_hash, and is therefore (and as mentioned in a comment in the source) quite similar to u_hash_table. CC: Brian Paul<[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-212-0/+2
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe, gallivm: implement lod queries (LODQ opcode)Roland Scheidegger2017-09-204-57/+143
| | | | | | | | | | | | | | | | | | | | | | This uses all the existing code to calculate lod values for mip linear filtering. Though we'll have to disable the simplifications (if we know some parts of the lod calculation won't actually matter for filtering purposes due to mip clamps etc.). For better or worse, we'll also disable lod calculation hacks (mostly should make a difference for cube maps) always - the issue with per-pixel lod being difficult is mostly because we then have different mipmaps needed for the actual texel fetch, which isn't a problem with lodq. We still use approximation for the log2 - for that reason I believe the float part of the lod is only accurate to about 4-5 bits (and one bit less with 1d textures actually) which is hopefully good enough (though d3d10 technically requires 6 bits - could use quadratic interpolation instead of linear to get 8 bits or so). Since lodq requires unclamped lod, we also have to move some sampler key calculations to texture sampling code - even if we know we're going to access mipmap 0 we still have to calculate lod and apply lod_bias for lodq. Passes piglit ARB_texture_query_lod tests (after having fixed the test). Reviewed-by: Jose Fonseca <[email protected]>
* ttn: Fix out-of-bounds accesses since the always-2D-constants change.Eric Anholt2017-09-181-2/+3
| | | | | | | | | | Only one of the three checks for dim was updated, so we would try to set a UBO buffer index source value on a nir_load_uniform, and wouldn't actually declare non-UBO uniforms. Fixes: 37dd8e8dee1d ("gallium: all drivers should accept two-dimensional constant buffer indexing") Tested-by: Derek Foreman <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-182-0/+4
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVENicolai Hähnle2017-09-182-0/+2
| | | | | | | | | | | | | | | | | To be able to properly distinguish between GL_ANY_SAMPLES_PASSED and GL_ANY_SAMPLES_PASSED_CONSERVATIVE. This patch goes through all drivers, having them treat the two query types identically, except: 1. radeon incorrectly enabled conservative mode on PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE. 2. st/mesa uses the new query type. Fixes dEQP-GLES31.functional.fbo.no_attachments.* Reviewed-by: Marek Olšák <[email protected]>
* gallium: add CONSTBUF type to tgsi_file_typeTimothy Arceri2017-09-151-0/+1
| | | | | | | This will be use to distinguish between load types when using the TGSI_OPCODE_LOAD opcode. Reviewed-by: Marek Olšák <[email protected]>
* gallium/{r600, radeonsi}: Fix segfault with color format (v2)Denis Pauk2017-09-141-0/+4
| | | | | | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552 v2: Patch cleanup proposed by Nicolai Hähnle. * deleted changes in si_translate_texformat. Cc: Nicolai Hähnle <[email protected]> Cc: Ilia Mirkin <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák2017-09-111-2/+0
| | | | | | | | | | This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: add a new pass that analyzes tess factor writes (v2)Marek Olšák2017-09-112-0/+235
| | | | | | | | | | | | | | | | | | | The pass tries to deduce whether tess factors are always written by all shader invocations. The implication for radeonsi is that it doesn't have to use a barrier near the end of TCS, and doesn't have to use LDS for passing the tess factors to the epilog. v2: Handle barriers and do the analysis pass for each code segment surrounded by barriers separately, and AND results from all such segments writing tess factors. The change is trivial in the main switch statement. Also, the result is renamed to "tessfactors_are_def_in_all_invocs" to make the name accurate. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: use UTIL_BLITTER_ATTRIB_NONE (0) instead of 0 directlyMarek Olšák2017-09-111-2/+2
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: don't pass GENERIC in VS if it's not neededMarek Olšák2017-09-111-17/+45
| | | | | | | Now, depth-only clears and custom passes don't read memory in VS. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: use draw_rectangle for all blits except cubemapsMarek Olšák2017-09-112-88/+98
| | | | | | | Add ZW coordinates to the draw_rectangle callback and use it. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: use draw_rectangle callback for layered clearsMarek Olšák2017-09-112-28/+29
| | | | | | | They are done with instancing. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallium/u_blitter: add new union blitter_attrib to replace pipe_color_unionMarek Olšák2017-09-112-52/+53
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Brian Paul <[email protected]>
* llvmpipe, draw: improve shader cache debuggingRoland Scheidegger2017-09-092-22/+43
| | | | | | | | | | | | | | | | | With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage whenever we have to evict shader variants. Also add some output when shaders are deleted (but not with the perf setting to keep this one less noisy). While here, also don't delete that many shaders when we have to evict. For fs, there's potentially some cost if we have to evict due to the required flush, however certainly shader recompiles have a high cost too so I don't think evicting one quarter of the cache size makes sense (and, if we're evicting based on IR count, we probably typically evict only very few or just one shader too). For vs, I'm not sure it even makes sense to evict more than one shader at a time, but keep the logic the same for now. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: fix gather implementation a bitRoland Scheidegger2017-09-091-10/+48
| | | | | | | | | | | | | | | | | | gather is defined in terms of bilinear filtering, just without the filtering part. However, there's actually some subtle differences required in our implementation, because we use some tricks to simplify coord wrapping for the two coords per direction. For bilinear filtering, we don't care if we end up with an incorrect texel, as long as the filter weight is 0.0 for it. Likewise, the order of the texels doesn't actually matter (as long as they still have the correct filter weight). But for gather, these tricks lead to incorrect results. Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix right now, and noone really seems to care...). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* vl/compositor: make vl_compositor_set_yuv_layer() staticLeo Liu2017-09-072-44/+28
| | | | | | | Since it's no longer being called outside of compositor Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl/compositor: make a helper function for YUV deinterlacingLeo Liu2017-09-072-0/+40
| | | | | | | | The similar function is in OMX, and only used by OMX. Now have it moved to vl/compositor for other state tracker to use later. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* llvmpipe, tgsi: hook up dx10 gather4 opcodeRoland Scheidegger2017-09-072-8/+25
| | | | | | | | | Trivial. We already support tg4 for legacy tex opcodes, so the actual texture sampling code already handles it. (Just like TG4, we don't handle additional capabilities and always sample red channel.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe, draw: increase shader cache limitsRoland Scheidegger2017-09-071-1/+1
| | | | | | | | | | | | | | | We're not particularly concerned with memory usage, if the tradeoff is shader recompiles. And it's common for apps to have a lot of shaders nowadays (and, since our shaders include a LOT of context state of course we may create quite a bit more shaders even). So quadruple the amount of shaders draw will cache (from 128 to 512). For llvmpipe (fs shaders) quadruple the number of instructions, keep the number of variants the same for now (only with very simple, non-texturing shaders the variant limit could really be reached), and simplify the definition, it's probably easier to just have one different definition per branch... Reviewed-by: Jose Fonseca <[email protected]>
* gallium/tests: always use two-dimensional constant referencesNicolai Hähnle2017-09-041-2/+2
| | | | | | Acked-by: Roland Scheidegger <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>