aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
...
* llvmpipe: lp_build_gather_elem_vec BE fix for 3x16 loadBen Crocker2017-09-011-2/+28
| | | | | | | | | | | | | | | | | | | | | | Fix loading of a 3x16 vector as a single 48-bit load on big-endian systems (PPC64, S390). Roland Scheidegger's commit e827d9175675aaa6cfc0b981e2a80685fb7b3a74 plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000. This patch fixes three of the four regressions observed by Ray: - draw-vertices - draw-vertices-half-float - draw-vertices-half-float_gles2 One regression remains: - draw-vertices-2101010 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613 Cc: "17.2" "17.1" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: correct channel shift logic on big endianRay Strode2017-09-011-1/+7
| | | | | | | | | | | | | | | | | | | lp_build_fetch_rgba_soa fetches a texel from a texture. Part of that process involves first gathering the element together from memory into a packed format, and then breaking out the individual color channels into separate, parallel arrays. The code fails to account for endianess when reading the packed values. This commit attempts to correct the problem by reversing the order the packed values are read on big endian systems. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613 Cc: "17.2" "17.1" <[email protected]> Signed-off-by: Ray Strode <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: remove unused variableBrian Paul2017-08-241-2/+0
| | | | Trivial.
* gallium: use tgsi_get_opcode_name instead of tgsi_opcode_info::mnemonicNicolai Hähnle2017-08-232-2/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: remove TGSI opcode SCSMarek Olšák2017-08-222-29/+0
| | | | | | | use COS+SIN instead. Reviewed-by: Roland Scheidegger <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* gallium: remove TGSI opcode BREAKCMarek Olšák2017-08-223-45/+0
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI opcode XPDMarek Olšák2017-08-222-59/+0
| | | | | | use MUL+MAD+MOV instead. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: remove TGSI opcode DPHMarek Olšák2017-08-222-20/+0
| | | | | | use DP4 or DP3 + ADD. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: remove TGSI opcode DP2AMarek Olšák2017-08-222-32/+0
| | | | | | use DP3 instead. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: remove TGSI_OPCODE_CALLNZMarek Olšák2017-08-222-2/+0
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: remove TGSI opcodes PUSHA, POPA, SAD, TXQ_LZMarek Olšák2017-08-222-20/+0
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle call attributes for llvm < 4.0 in lp_add_function_attrRoland Scheidegger2017-07-211-3/+7
| | | | | | | | | | | | | | | | | | | We had some caller using LLVMAddInstrAttributes, which couldn't be converted to lp_add_function_attr, because attributes were only handled for functions in this case, so fix this. For llvm >= 4.0, this already works correctly. (radeonsi seems to avoid setting call site attributes prior to llvm 4.0, the patch then citing it doesn't work when calling intrinsics. But at least for calling external functions we always used that, albeit only for actual call attributes, not call parameter attributes, though some quick test shows llvm seems to handle that as well. The attribute index is sort of iffy though, since attribute 0 of the call is the actual function, attribute 1 corresponds to the first parameter of the called function.) (Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct attributes are shown for calls, both for llvm 4.0 and 3.3.) Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: inline gallivm_init_llvm_targetsMarek Olšák2017-07-172-18/+8
| | | | | | there is only one user. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: Make sure module has the correct data layout when pass manager runsTom Stellard2017-05-181-16/+18
| | | | | | | | | | | | | | | | | | | The datalayout for modules was purposely not being set in order to work around the fact that the ExecutionEngine requires that the module's datalayout matches the datalayout of the TargetMachine that the ExecutionEngine is using. When the pass manager runs on a module with no datalayout, it uses the default datalayout which is little-endian. This causes problems on big-endian targets, because some optimizations that are legal on little-endian or illegal on big-endian. To resolve this, we set the datalayout prior to running the pass manager, and then clear it before creating the ExectionEngine. This patch fixes a lot of piglit tests on big-endian ppc64. Cc: [email protected]
* gallivm: Fix build against LLVM SVN >= r302589Michel Dänzer2017-05-111-3/+9
| | | | | | | deregisterEHFrames doesn't take any parameters anymore. Reviewed-by: Vedran Miletić <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERSSamuel Pitoiset2017-04-261-0/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: init vars to silence gcc warningsBrian Paul2017-04-071-2/+2
| | | | | | Silence warnings about using possibly uninitialized values. Signed-off-by: Brian Paul <[email protected]>
* gallivm: add lp_build_emit_fetch_src() helperSamuel Pitoiset2017-04-012-5/+24
| | | | | | | | | | | | | | lp_build_emit_fetch() is useful when the source type can be infered from the instruction opcode. However, for bindless samplers/images we can't do that easily because tgsi_opcode_infer_src_type() returns TGSI_TYPE_FLOAT for TEX instructions, while we need TGSI_TYPE_UNSIGNED64 if the resource register is bindless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove support for predicates from TGSI (v2)Marek Olšák2017-04-015-181/+18
| | | | | | | | | | | Neved used. v2: gallivm: rename "pred" -> "exec_mask" etnaviv: remove the cap gallium: fix tgsi_instruction::Padding Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix a maybe-uninitialized warningMarek Olšák2017-03-301-1/+1
| | | | | | | /home/marek/dev/mesa-main/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3598: warning: 'level' may be used uninitialized in this function [-Wmaybe-uninitialized] out1 = lp_build_cmp(&leveli_bld, PIPE_FUNC_GREATER, level, last_level); ^
* gallivm: remove lp_add_attr_dereferenceable in favor of amd/commonMarek Olšák2017-03-222-14/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallivm: (trivial) remove duplicated lineRoland Scheidegger2017-03-161-1/+0
| | | | pointed out by clang (stored value never read)
* gallivm,ac: add LP_FUNC_ATTR_CONVERGENTMarek Olšák2017-03-062-0/+2
| | | | Reviewed-by: Dave Airlie <[email protected]>
* gallivm, ac: add writeonly and inaccessiblememonly attributesMarek Olšák2017-03-032-0/+4
| | | | Reviewed-by: Dave Airlie <[email protected]>
* gallivm,ac: add function attributes at call sites instead of declarationsMarek Olšák2017-03-012-24/+55
| | | | | | | | | | | | | | | | They can vary at call sites if the intrinsic is NOT a legacy SI intrinsic. We need this to force readnone or inaccessiblememonly on some amdgcn intrinsics. This is only used with LLVM 4.0 and later. Intrinsics only used with LLVM <= 3.9 don't need the LEGACY flag. gallivm and ac code is in the same patch, because splitting would be more complicated with all the LEGACY uses all over the place. v2: don't change the prototype of lp_add_function_attr. Reviewed-by: Jose Fonseca <[email protected]> (v1)
* gallivm,ac: remove unused FUNC_ATTR_LAST enumsMarek Olšák2017-03-011-1/+0
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add no-signed-zeros-fp-math option to lp_create_builder (v2)Marek Olšák2017-02-212-4/+19
| | | | | | v2: define lp_float_mode Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: Reenable PPC VSX (v3)Ben Crocker2017-02-201-1/+13
| | | | | | | | | | | | | | | Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1, now that LLVM bug 26775 and its corollary, 25503, are fixed. Amendment: remove extraneous spaces in macro def & invocations. We would prefer a runtime check, e.g. via an LLVMQueryString (analogous to glGetString, eglQueryString) or LLVMGetVersion API, but no such API exists at this time. Signed-off-by: Ben Crocker <[email protected]> [Emil Velikov: remove LLVM_VERSION macro] Signed-off-by: Emil Velikov <[email protected]>
* gallivm: Override getHostCPUName() "generic" w/ "pwr8" (v4)Ben Crocker2017-02-201-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | If llvm::sys::getHostCPUName() returns "generic", override it with "pwr8" (on PPC64LE). This is a work-around for a bug in LLVM: a table entry for "POWER8NVL" is missing, resulting in (big-endian) "generic" being returned on little-endian Power8NVL systems. The result is that code that attempts to load the least significant 32 bits of a 64-bit quantity in memory loads the wrong half. This omission should be fixed in the next version of LLVM (4.0), but this work-around should be left in place in case some future version of POWER<n> also ends up unrepresented in LLVM's table. This workaround fixes failures in the Piglit arb_gpu_shader_fp64 conversion tests on POWER8NVL processors. (V4: add similar comment in the code.) Signed-off-by: Ben Crocker <[email protected]> Cc: 12.0 13.0 17.0 <[email protected]> Acked-by: Emil Velikov <[email protected]>
* gallivm: Improve debug output (V2)Ben Crocker2017-02-202-1/+18
| | | | | | | | | | | | | | | Improve debug output from gallivm_compile_module and lp_build_create_jit_compiler_for_module, printing the -mcpu and -mattr options passed to LLC. V2: enclose MAttrs debug_printf block and llc -mcpu debug_printf in "if (gallivm_debug & <flags>)..." Signed-off-by: Ben Crocker <[email protected]> Cc: 12.0 13.0 17.0 <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (v2) [Emil Velikov: rebase] Signed-off-by: Emil Velikov <[email protected]>
* gallium: remove TGSI_OPCODE_CLAMPMarek Olšák2017-02-182-24/+0
| | | | | | | Not used and not widely supported. Use MIN+MAX instead. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: turn PIPE_SHADER_CAP_DOUBLES into a screen capabilityNicolai Hähnle2017-02-021-2/+0
| | | | | | | | | | | | | | | | | | | Make the cap consistent with PIPE_CAP_INT64. Aside from the hypothetical case of using draw for vertex shaders (and actually caring about doubles...), every implementation supports doubles either nowhere or everywhere. Also, st/mesa didn't even check the cap correctly in all supported shader stages. While at it, add a missing LLVM version check for 64-bit integers in radeonsi. This is conservative: judging by the log, LLVM 3.8 might be sufficient, but there are probably bugs that have been fixed since then. v2: fix clover (Marek) Reviewed-by: Marek Olšák <[email protected]>
* gallivm: remove explicit __STDC_.*_MACROS definesEmil Velikov2017-01-271-8/+0
| | | | | | | | | | Correctly handled by the build systems. Cc: Roland Scheidegger <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't try to use fast rcp for fdivRoland Scheidegger2017-01-241-1/+3
| | | | | | | | The use of fast rcp instruction is disabled, and will always fall back to use a division instead (1 / x). Hence, if we get a division opcode, it doesn't make much sense trying to split that into rcp/mul. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix ddiv cpu implementationRoland Scheidegger2017-01-241-1/+0
| | | | | | | | | | | we can't use the cpu implementation of fdiv, as this one uses different lp_build_context, which causes assertion failure. Just use default fdiv action (there is no fast rcp for doubles which we could potentially use anyway). Cc: 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use #ifdef not #if for PIPE_ARCH_BIG_ENDIANDave Airlie2017-01-191-1/+1
| | | | | | | | This fixes the build on ppc/s390. Reviewed-by: Roland Scheidegger <[email protected]> Cc: "17.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallivm: (trivial) fix copy/paste bug with big endian codeRoland Scheidegger2017-01-181-2/+4
| | | | | | 8bd67a35c50e68c21aed043de11e095c284d151a introduced using undefined variable on big endian archs due to copy/paste bug. (compile hack tested only)
* gallivm: Cleanup USE_MCJIT.Jose Fonseca2017-01-181-10/+25
| | | | | | | Split USE_MCJIT macro dual nature into a separate constant time define and a run-time variable. Reviewed-by: Emil Velikov <[email protected]>
* tgsi: add DDIV instructionNicolai Hähnle2017-01-161-0/+2
| | | | | | | | | Double-precision division, to allow more precision than a DRCP + DMUL sequence. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: generalize 4x4f->1x16ub special case conversionRoland Scheidegger2017-01-061-56/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | This special packing path can be easily extended to handle not just float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8 (i.e. all interesting cases for llvmpipe fs backend code). The packing parts all stay the same (only the last step packing will be signed->signed instead of signed->unsigned but luckily even sse2 can do both). While here also note some bugs with that (we keep the bugs identical to what we did before on x86, albeit other archs may differ). In particular float->unorm8 too large values will still get clamped to 0, not 255, and for float->snorm8 NaNs will end up as -1, not 0 (but we do the clamp against 1.0 there to prevent too large values ending up as -1.0 - this is inconsistent to unorm8 handling but is what we ended up before, I'm not sure we can get away without it). This is quite fishy in any case as we depend on arch-dependent behavior of the iround (my understanding is in fact with altivec the conversion would actually saturate although I've no idea about NaNs, so probably wouldn't need to do anything for snorm). (There are only minimal piglit tests for unorm clamping behavior AFAICT, in particular nothing seems to test values which are too large to be handled by the float->int conversion.) For uint32->uint8 we also do a min against MAX_INT, since the source for the packs is always signed (again, on x86 - should probably be able to express these arch-dependent bits better some day). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix typo bug with small AoS format unpackingRoland Scheidegger2017-01-061-1/+1
| | | | | | | Fix typo using wrong (uninitialized) build context introduced by 4634cb5921b985f04f2daf00cda2d28036143bd3. (This only affects very rare small packed formats which have a PIPE_SWIZZLE_0 channel, such as r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
* gallivm: implement aos unpack (to unorm8) for small unorm formatsRoland Scheidegger2017-01-051-12/+152
| | | | | | | | | | | | | | | | | | | Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss with AVX). With SoA it's just shift/mask/cvt/mul for getting the colors, whereas there's still roughly 3 shifts, 3 or/and per channel for AoS (i.e. for SoA it's exactly the same as it would be for a rgba8 format, whereas the extra effort for AoS is significant). The filtering might still be faster (albeit with FMA the instruction count gets down quite a bit there on the SoA float filtering path on new cpus). And those small unorm formats often don't have an alpha channel (which makes things worse relatively for AoS path). (This also fixes a trivial bug in the llvmpipe fs code this was derived from, albeit it was only relevant for 4-bit channels.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize lp_build_unpack_arith_rgba_aos slightlyRoland Scheidegger2017-01-051-19/+97
| | | | | | | | | | | | | | | | | This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said, this path is taken for AoS sampling for small unorm (smaller than rgba8) formats, and it is completely hopeless even with those changes, with or without AVX. (Probably should have some code similar to the one in the llvmpipe fs backend code, using bit replication to extend to rgba8888 - rounding is not quite 100% accurate but if it's good enough there it should be here as well.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_autoRoland Scheidegger2017-01-051-2/+19
| | | | | | | | | | | | | | | | | If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source vectors at a time, afterwards we immediately reassemble the vectors). For AVX though this isn't really possible, since we expect 128bit output already for a single 256bit input. (One day we should handle AVX2 which again would need multiple inputs, however there's the problem that we get different ordered output there and we don't want to reorder, so would need to be able to tell build_conv to handle upper and lower halfs independently.) A similar strategy would probably work for 32->8bit too (if it doesn't hit the special case) but I'm going to try something different for that... Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: (trivial) minimally simplify mask constructionRoland Scheidegger2017-01-051-0/+2
| | | | | | | | | | | | | | | | simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we definitely shouldn't do. There's problems with this though with llvm select interaction, so it's disabled (basically using llvm select instead of intrinsics may still produce atrocious code, even in cases where we figured it should not, albeit I think this could probably be fixed with some better selection of optimization passes, but I have zero idea there really). Reviewed-by: Jose Fonseca <[email protected]>
* gallium: remove TGSI_OPCODE_SUBMarek Olšák2017-01-052-39/+5
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-053-17/+3
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: fix windows build breakGeorge Kyriazis2017-01-051-0/+7
| | | | | | | | | | wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <[email protected]>
* gallivm: generalize the compressed format soa fetch a bitRoland Scheidegger2016-12-211-37/+49
| | | | | | | | | This can now handle rgtc (unorm) too - this path no longer handles plain formats, but that's unnecessary they now all have their proper SoA unpack (this will still be dog-slow though due to the actual fetch being per-pixel util fallbacks). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: provide soa fetch path handling formats with more than 32bitRoland Scheidegger2016-12-211-154/+375
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only be partly possible for some formats, (unless there's AVX2 support) at least the transpose can be done with half the unpacks (and before using the transpose for AoS fallbacks, it was worse still). With less than 4 channels, things got way worse with the AoS fallback quickly even with 128bit vectors. The strategy is pretty much the same as the existing one for formats which fit into 32 bits, except there's now multiple vectors to be fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4 vectors, this amounts to a transpose, for 2 it's a bit different), then the unpack is done the same (with the exception that the shift of the channels is now modulo 32, and we need to select the right vector). In fact the most complex part about it is to get the shuffles right for separating into lo/hi parts for AVX/AVX2... This also makes use of the new ability of gather to use provided type information, which we abuse to outsmart llvm so we get decent shuffles, and to fetch 3x32bit vectors without having to ZExt the scalar. And just because we can, we handle double formats too, albeit they are a bit different (draw sometimes needs to handle that). v2: fix typo float/int bug (generating inefficient code). Reviewed-by: Jose Fonseca <[email protected]>