summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: add coroutine pass manager supportDave Airlie2019-09-043-1/+32
| | | | | | | coroutines require a proper pass manager, so add the passes to the correct places Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add coroutine support files to gallivm.Dave Airlie2019-09-042-0/+269
| | | | | | | These wrap the coroutine intrinsics and also add some higher level wrappers around coroutine begin, end and suspend procedures Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm/flow: add counter reset for loopsDave Airlie2019-09-042-0/+20
| | | | | | This allows the counter value to be forced to a certain value Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: disable accurate cube corner for integer textures.Dave Airlie2019-08-301-1/+6
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111511 Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: use fallback code for mul_hi with llvm >= 7.0Roland Scheidegger2019-08-291-1/+6
| | | | | | | | | | | | | LLVM 7.0 ditched the pmulu intrinsics. This is only a trivial patch to use the fallback code instead. It'll likely produce atrocious code since the pattern doesn't match what llvm itself uses in its autoupgrade paths, hence the pattern won't be recognized. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=111496 Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallivm: fix appveyor build after images changesDave Airlie2019-08-271-1/+2
|
* llvmpipe: enable ARB_shader_image_load_storeDave Airlie2019-08-271-1/+2
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add memory barrier supportDave Airlie2019-08-271-0/+11
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add support for fences api on older llvmDave Airlie2019-08-272-0/+16
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add image load/store/atomic supportDave Airlie2019-08-277-10/+684
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm/tgsi: add image interface to tgsi builderDave Airlie2019-08-272-1/+20
| | | | | | | This adds the callbacks for the driver/gallium binding for image operations. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add a basic image limitDave Airlie2019-08-271-0/+2
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle helper invocation (v2)Dave Airlie2019-08-271-0/+5
| | | | | | | | Just invert the exec_mask to get if this is a helper or not. v2: get the bld mask (Roland) Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: make lp_build_float_to_r11g11b10 take a const srcDave Airlie2019-08-272-2/+2
| | | | | | This allows using it with a const src later. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix atomic compare-and-swapDave Airlie2019-08-271-0/+2
| | | | | | | | Not sure how I missed this before, but compswap was hitting an assert here as it is it's own special case. Fixes: b5ac381d8f ("gallivm: add buffer operations to the tgsi->llvm conversion.") Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix issue with AtomicCmpXchg wrapper on llvm 3.5-3.8Roland Scheidegger2019-08-021-1/+3
| | | | | | | | | | | | | These versions still need wrapper but already have both success and failure ordering. (Compile tested on llvm 3.3, 3.7, 3.8.) v2: don't duplicate whole function (suggested by Brian). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111102 Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: rework lp_build_tgsi_soa to take a structDave Airlie2019-07-242-46/+38
| | | | | | | The parameters were getting messy and I have to add a few more for compute shaders, so clean it up before proceeding. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix warning: ‘op’ may be used uninitializedMarek Olšák2019-07-221-0/+3
| | | | Reviewed-by: Dave Airlie <[email protected]>
* util: use standard name for vsnprintf()Eric Engestrom2019-07-191-1/+1
| | | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util: use standard name for snprintf()Eric Engestrom2019-07-199-18/+18
| | | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util: use standard name for strncat()Eric Engestrom2019-07-191-3/+3
| | | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* llvmpipe: enable ARB_shader_storage_buffer_objectDave Airlie2019-07-071-1/+2
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add buffer operations to the tgsi->llvm conversion.Dave Airlie2019-07-073-4/+315
| | | | | | | | | | | | This adds load, store and atomic operations. These operations have to respect the exec_mask, and can't operate in lanes where the execute is off. This is needed to avoid side effects seen outside the shaders. There is also bounds checking on the ssbo accesses vs the size ptr. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: move mask_vec function up higher so it can be reused.Dave Airlie2019-07-071-14/+15
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add some basic SSBO limits. (v2)Dave Airlie2019-07-071-0/+4
| | | | | | v2: update ssbo size Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add ssbo pointers to the soa build api.Dave Airlie2019-07-072-2/+11
| | | | | | Need to pass ssbo + ssbo size pointers just like constants. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add compare exchange wrapperDave Airlie2019-07-073-1/+39
| | | | | | This just pulls the wrapper from LLVM for older versions Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Improve lp_build_rcp_refine.Jose Fonseca2019-06-281-6/+6
| | | | | | | | | | | Use the alternative more accurate expression from https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division v2: Use lp_build_fmuladd as suggested by Roland Tested by enabling this code path, and running lp_test_arit. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix default cbuf info.Roland Scheidegger2019-05-241-1/+1
| | | | | | | | | The default null_output really needs to be static, otherwise the values we'll eventually get later are doubly random (they are not initialized, and even if they were it's a pointer to a local stack variable). VMware bug 2349556. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix broken 8-wide s3tc decodingRoland Scheidegger2019-05-071-17/+15
| | | | | | | | | | | | | | | | | | Brian noticed there was an uninitialized var for the 8-wide case and 128 bit blocks, which made it always crash. Likewise, the 64bit block case had another crash bug due to type mismatch. Color decode (used for all s3tc formats) also had a bogus shuffle for this case, leading to decode artifacts. Fix these all up, which makes the code actually work 8-wide. Note that it's still not used - I've verified it works, and the generated assembly does look quite a bit simpler actually (20-30% less instructions for the s3tc decode part with avx2), however in practice it still seems to be sligthly slower for some unknown reason (tested with openarena) on my haswell box, so for now continue to split things into 4-wide vectors before decoding. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: init some vars to NULL to silence MinGW compiler warningsBrian Paul2019-05-011-2/+2
| | | | Reviewed-by: Neha Bhende <[email protected]>
* gallivm: disable NEON instructions if they are not supportedLubomir Rintel2019-04-221-0/+7
| | | | | | | | | | | | | | The LLVM project made some questionable decisions about defaults for armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell platforms). On top of that, getHostCPUFeatures() doesn't disable missing machine attributes. Finally, -neon alone is not sufficient to disable emmision of NEON instructions. Signed-off-by: Lubomir Rintel <[email protected]> Cc: <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* gallivm: guess CPU features also on ARMLubomir Rintel2019-04-221-7/+6
| | | | | | | | | | | getHostCPUFeatures() is also available on ARM, for even longer time than for x86. Use it -- it potentially enables instructions that may speed things up. Signed-off-by: Lubomir Rintel <[email protected]> Cc: <[email protected]> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518 Reviewed-by: Matt Turner <[email protected]>
* Add no_aos_sampling GALLIVM_PERF optionDominik Drees2019-04-173-4/+11
| | | | | This forces using general sampling and should improve precision and performance in some cases.
* gallivm: fix saturated signed add / sub with llvm 9Roland Scheidegger2019-04-171-0/+14
| | | | | | | | | | | | | | | | llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specific intrinsics, there's now arch-independent intrinsics for saturated add / sub, both for signed and unsigned, so use these. They should have only advantages (work with arbitrary vector sizes, optimal code for all archs), although I don't know how well they work in practice for other archs (at least for x86 they do the right thing). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454 Reviewed-by: Brian Paul <[email protected]>
* gallivm: fix bogus assert in get_indirect_indexRoland Scheidegger2019-04-161-1/+1
| | | | | | | | | | | 0 is a valid value as max index, and the code handles it fine. This isn't commonly seen, as it will only happen with array declarations of size 1. Fixes piglit tests/shaders/complex-loop-analysis-bug.shader_test Fixes: a3c898dc97ec "gallivm: fix improper clamping of vertex index when fetching gs inputs" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110441 Reviewed-by: Brian Paul <[email protected]>
* gallivm: Return true from arch_rounding_available() if NEON is availableMatt Turner2019-01-241-1/+3
| | | | | | | | LLVM uses the single instruction "FRINTI" to implement llvm.nearbyint. Fixes the rounding tests of lp_test_arit. Bug: https://bugs.gentoo.org/665570 Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: abort when trying to use non-existing intrinsicRoland Scheidegger2018-12-211-0/+10
| | | | | | | | | | | Whenever llvm removes an intrinsic (we're using), we're hitting segfaults due to llvm doing calls to address 0 in the jitted code instead. However, Jose figured out we can actually detect this with LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder what got broken. (Of course, someone still needs to fix the code to no longer use this intrinsic.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use pavg.b intrinsic on llvm >= 6.0Roland Scheidegger2018-12-211-9/+46
| | | | | | | | | | | | | | | | | | | | | This intrinsic disppeared with llvm 6.0, using it ends up in segfaults (due to llvm issuing call to NULL address in the jited shaders). Add code doing the same thing as the autoupgrade code in llvm so it can be matched and replaced back with a pavgb. While here, also improve lp_test_format, so it tests both with and without cache (as it was, it tested the cache versions only, whereas cache is actually disabled in llvmpipe, and in any case even with it enabled vertex and geometry shaders wouldn't use it). (Although at least for the unorm8 uncached fetch, the code is still quite different to what llvmpipe is using, since that would use unorm8x16 type, whereas the test code is using unorm8x4 type, hence disabling some intrinsic paths.) Fixes: 6f4083143bb8 ("gallivm: use llvm jit code for decoding s3tc") Reviewed-by: Jose Fonseca <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* gallivm: use llvm jit code for decoding s3tcRoland Scheidegger2018-12-205-381/+2237
| | | | | | | | | | | | This is (much) faster than using the util fallback. (Note that there's two methods here, one would use a cache, similar to the existing code (although the cache was disabled), except the block decode is done with jit code, the other directly decodes the required pixels. For now don't use the cache (being direct-mapped is suboptimal, but it's difficult to come up with something better which doesn't have too much overhead.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove unused float coord wrapping for aos samplingRoland Scheidegger2018-12-121-507/+23
| | | | | | | | | | | | | | | | | | | | | | | AoS sampling tries to use integers for coord wrapping when possible, as it should be faster. However, for AVX, this was suboptimal, because only floats can use 8x32bit vectors, whereas integers have to be split into 4x32bit vectors. (I believe part of why it was slower was also that at least earlier llvm versions had trouble optimizing it properly, since you can still do simple bit ops with 8x32bit vectors, so a sequence of int add / and / int add / and with such vectors would actually end up doing 128bit inserts/extracts between the operations instead of just doing the cheap 128bit ands.) Hence, a special float coord wrapping path was added to AoS sampling. But this path was actually disabled for a long time already, since we found that just splitting everything before entering the AoS path was still sligthly faster usually, so none of this float coord wrapping code was used anymore (AoS sampling code, when avx2 isn't supported, never sees vectors with length > 4). I thought it might be useful some day again, but I'm not interested anymore in optimizing for very weird instruction sets which have support for 256bit vectors for floats but not for ints, so just drop it. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use nextafterf(0.5, 0.0) as rounding constantMatt Turner2018-11-281-1/+1
| | | | | | | | | | | The common truncf(x + 0.5) fails for the floating-point value just less than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after rounding is 1.0, thus truncf does not produce the desired value. The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before truncating. This works for all values. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix improper clamping of vertex index when fetching gs inputsRoland Scheidegger2018-11-091-10/+31
| | | | | | | | | | | | | | | | | | | | Because we only have one file_max for the (2d) gs input file, the value actually represents the max of attrib and vertex index (although I'm not entirely sure if we really want the max, since the max valid value of the vertex dimension can be easily deduced from the input primitive). Thus in cases where the number of inputs is higher than the number of vertices per prim, we did not properly clamp the vertex index, which would result in out-of-bound fetches, potentially causing segfaults (the segfaults seemed actually difficult to trigger, but valgrind certainly wasn't happy). This might have happened even if the shader did not actually try to fetch bogus vertices, if the fetching happened in non-active conditional clauses. To fix simply use the correct max vertex index value (derived from the input prim type) instead when clamping for this case. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Make it possible to disable some optimization shortcuts in release ↵Gert Wollny2018-10-064-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | builds For testing it is of interest that all tests of dEQP pass, e.g. to test virglrenderer on a host only providing software rendering like in a CI. Hence make it possible to disable certain optimizations that make tests fail. While we are there also add some documentation to the flags to make it clear that this is opt-out. Setting the environment variable "GALLIVM_PERF=no_filter_hacks" can be used to make the following tests pass in release mode: dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_* dEQP-GLES2.functional.texture.mipmap.cube.generate.* dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_* dEQP-GLES2.functional.texture.vertex.2d.wrap.* Related: https://bugs.freedesktop.org/show_bug.cgi?id=94957 v2: rename optimization disabling flag to 'safemath' and also move the nopt flag to the perf flags. v3: rename flag "safemath" to "no_filter_hacks" since safemath is usually associated with floating point operations (Roland) Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: ensure string is null-terminated instead of assert()ingEric Engestrom2018-09-251-3/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* gallivm: Detect VSX separately from AltivecVicki Pfau2018-08-301-18/+3
| | | | | | | | | | Previously gallivm would attempt to use VSX instructions on all systems where it detected that Altivec is supported; however, VSX was added to POWER long after Altivec, causing lots of crashes on older POWER/PPC hardware, e.g. PPC Macs. By detecting VSX separately from Altivec we can automatically disable it on hardware that supports Altivec but not VSX Signed-off-by: Vicki Pfau <[email protected]>
* gallivm: allow to pass two swizzles into fetches.Dave Airlie2018-08-302-30/+65
| | | | | | | | | | | | This hijacks the top 16-bits of swizzle, to pass in the swizzle for the second channel. This fixes handling .yx swizzles of 64-bit values. This should fixup radeonsi and llvmpipe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524 Reviewed-by: Marek Olšák <[email protected]>
* gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0Roland Scheidegger2018-08-241-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add scalar isa shader capChristian Gmeiner2018-06-201-0/+2
| | | | | | | | | | | | | | | | v1 -> v2: - nv30 is _NOT_ scalar as suggested by Ilia Mirkin. - Change from a screen cap to a shader cap as suggested by Eric Anholt. - radeonsi is scalar as suggested by Marek Olšák. - Change missing ones to be scalar. v2 -> v3: - r600 prefers vec4 as suggested by Marek Olšák. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: Use alloca_undef with array type instead of alloca_arrayRoland Scheidegger2018-05-161-28/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a single allocation of array type instead of the old-style array allocation for the temp and immediate arrays. Probably only makes a difference if they aren't used indirectly (so, if we used them solely because there's too many temps or immediates). In this case the sroa and early-cse passes can sometimes do some optimizations which they otherwise cannot. (As a side note, for the temp reg array, we actually really should use one allocation per array id, not just one for everything.) Note that the instcombine pass would actually promote such allocations to single alloc of array type as well, but it's too late for some artificial shaders we've seen to help (we don't want to run instcombine at the beginning due to its cost, hence would need another sroa/cse pass after instcombine). sroa/early-cse help there because they can actually eliminate all of the huge shader, reducing it to a single const output (don't ask...). (Interestingly, instcombine also removes all the bitcasts we do on that allocation for single-value gathering, and in the end directly indexes into the single vector elements, which according to spec is only semi-valid, but this happens regardless. Another thing instcombine also does is use inbound GEPs, which is probably something we should do manually as well - for indirectly indexed reg files llvm may not be able to figure it out on its own, but we should be able to guarantee all pointers are always inbound. In any case, by the looks of it using single allocation with array type seems to be the right thing to do even for ordinary shaders.) No piglit change. Reviewed-by: Jose Fonseca <[email protected]>