aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
...
* gallivm: optimize gather a bit, by using supplied destination typeRoland Scheidegger2016-12-217-78/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit as 3x32bit vector (this is still going to be 2 loads of course, but the loads can be done directly to simd vector that way). Also, we can now do some try to use the right int/float type. This should make no difference really since there's typically no domain transition penalties for such simd loads, however it actually makes a difference since llvm will use different shuffle lowering afterwards so the caller can use this to trick llvm into using sane shuffle afterwards (and yes llvm is really stupid there - nothing against using the shuffle instruction from the correct domain, but not at the cost of doing 3 times more shuffles, the case which actually matters is refusal to use shufps for integer values). Also do some attempt to avoid things which look great on paper but llvm doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors which is simply disastrous - I suspect type legalizer is to blame trying to extend these vectors to 128bit types somehow, so fetching these with scalars like before which is suboptimal due to the ZExt). Remove the ability for truncation (no point, this is gather, not conversion) as it is complex enough already. While here also implement not just the float, but also the 64bit avx2 gathers (disabled though since based on the theoretical numbers the benefit just isn't there at all until Skylake at least). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize SoA AoS fallback fetch path a littleRoland Scheidegger2016-12-211-22/+46
| | | | | | | | | | We should do transpose, not extract/insert, at least with "sufficient" amount of channels (for 4 channels, extract/insert shuffles generated otherwise look truly terrifying). Albeit we shouldn't fallback to that so often in any case. v2: ditch the extract/insert path, not worth keeping (we're going to avoid hitting the fallback that often with future patches). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soaRoland Scheidegger2016-12-213-8/+12
| | | | | | | | | | soa fetch so far always assumed that data was aligned. However, we want to use this for vertex fetch, and data might not be aligned there, so handle it in this path too (basically just pass through alignment through to other functions). (It looks like it wouldn't work for for cached s3tc but this is no different than with AoS fetch.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize 16bit->32bit gather path a bitRoland Scheidegger2016-12-061-3/+39
| | | | | | | | | | LLVM can't really optimize anything which crosses scalar/vector boundaries, so help a bit with some particular gather operations when the width is expanded (only do it for 16->32bit expansion for now), by doing expansion after fetch. That is probably a better solution anyway even if llvm would recognize it, makes for cleaner IR... Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soaRoland Scheidegger2016-12-061-4/+18
| | | | | | | | | | | | | | | | | | | | Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with minimal plumbing. I've seen code size go down nearly by a factor of 3 for a complete texture sampling function (including bilinear filtering) using R16F. (What we should do for everything not special cased is to do AoS gather, shuffle/shift things into SoA vectors, and then do the conversion there. Otherwise it's particularly bad with 1 or 2 channel formats - that r16f format with either 4 or 8-wide vectors was still doing one element at a time, essentially doing exactly the same work as for rgba16f. Also replacing the channels with SWIZZLE0/1 (particularly the latter) adds even more work, as it has to be done per aos vector, and not just straightforward at the end with the SoA vector.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use getHostCPUFeatures on x86/llvm-4.0+.Tim Rowley2016-12-051-0/+15
| | | | | | | Use llvm provided API based on cpuid rather than our own manually mantained list of mattr enabling/disabling. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-151-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: limit use of setFastMathFlags to LLVM 3.8 and laterMarek Olšák2016-11-151-0/+2
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallivm: add lp_create_builder with an unsafe_fpmath optionMarek Olšák2016-11-152-0/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix [IU]MUL_HI regression harderNicolai Hähnle2016-11-101-8/+12
| | | | | | | | | | The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix build after removal of deprecated attribute API v3Tom Stellard2016-11-093-4/+87
| | | | | | | | | | | | v2: Fix adding parameter attributes with LLVM < 4.0. v3: Fix typo. Fix parameter index. Add a gallivm enum for function attributes. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix [IU]MUL_HI regressionNicolai Hähnle2016-11-083-28/+90
| | | | | | | | | | | | | | | | This patch does two things: 1. It separates the host-CPU code generation from the generic code generation. This guards against accidently breaking things for radeonsi in the future. 2. It makes sure we actually use both arguments and don't just compute a square :-p Fixes a regression introduced by commit 29279f44b3172ef3b84d470e70fc7684695ced4b Cc: Roland Scheidegger <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: introduce 32x32->64bit lp_build_mul_32_lohi functionRoland Scheidegger2016-11-083-38/+172
| | | | | | | | | | | | This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no widening mul in llvm, and it does not recognize the widening mul pattern, so it generates code for real 64x64->64bit mul, which the cpu can't do natively, in contrast to 32x32->64bit mul which it could do). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.hMarek Olšák2016-10-201-1/+5
| | | | | Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* gallivm: add wrappers for missing functions in LLVM <= 3.8Marek Olšák2016-10-202-0/+27
| | | | | | radeonsi needs these. Reviewed-by: Nicolai Hähnle <[email protected]>
* draw: improve vertex fetch (v2)Roland Scheidegger2016-10-192-0/+30
| | | | | | | | | | | | | | | | | | | | | | | The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: print out time for jitting functions with GALLIVM_DEBUG=perfRoland Scheidegger2016-10-191-0/+11
| | | | | | | | Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use native packs and unpacks for the lerpsRoland Scheidegger2016-10-193-13/+156
| | | | | | | | | | | | | | | | | | | | | | | | For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use AVX2 gather instrinsics.Jose Fonseca2016-10-041-0/+95
| | | | | | v2: Use AVX2 gather for non aligned loads too. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use 8 wide AoS sampling on AVX2.Roland Scheidegger2016-10-041-5/+6
| | | | | | | | | | v2: Make sure that with num_lods > 1 and min_filter != mag_filter we still enter the splitting path. So this case would still use 4-wide aos path (as a side note, the 4-wide aos sampling path could actually be improved quite a bit if we have avx2, by just doing the filtering with 256bit vectors). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Basic AVX2 support.José Fonseca2016-10-044-28/+98
| | | | | | v2: pblendb -> pblendvb Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: support negation on 64-bit integersNicolai Hähnle2016-09-211-0/+4
| | | | | | | This should be analogous to 32-bit integers. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Nicolai Hähnle <[email protected]>
* gallivm/llvmpipe: prepare support for ARB_gpu_shader_int64.Dave Airlie2016-09-214-4/+498
| | | | | | | | | | | | | | | | This enables 64-bit integer support in gallivm and llvmpipe. v2: add conversion opcodes. v3: - PIPE_CAP_INT64 is not there yet - restrict DIV/MOD defaults to the CPU, as for 32 bits - TGSI_OPCODE_I2U64 becomes TGSI_OPCODE_U2I64 Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Nicolai Hähnle <[email protected]>
* gallivm: add lp_build_alloca_undefNicolai Hähnle2016-08-172-0/+24
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: add create_builder_at_entry helper functionNicolai Hähnle2016-08-171-23/+22
| | | | | | | Reduces code duplication. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* util: Move format_rgb9e5.h to src/utilJason Ekstrand2016-08-051-1/+1
| | | | | | | It's used from both mesa main and gallium. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add helper lp_add_attr_dereferenceableMarek Olšák2016-07-132-0/+14
| | | | | | | | | Not sure if this is the right way to do it, but it seems to work. v2: make it a no-op on LLVM <= 3.5 Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: set LLVMNoUnwindAttribute on all intrinsicsMarek Olšák2016-07-111-2/+4
| | | | | | | | | RadeonSI stats: Mostly 0% difference, but Valley shows a small improvement: Application Files SGPRs VGPRs SpillSGPR SpillVGPR Code Size LDS Max Waves Waits unigine_valley 278 0.00 % -0.29 % 0.00 % 0.00 % 0.01 % 0.00 % 0.17 % 0.00 % Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9Roland Scheidegger2016-06-201-2/+4
| | | | | | | | | | | | | | | | | | | Apparently, these are deprecated. There's some AutoUpgrade feature which is supposed to promote these to cmp/select, which apparently doesn't work with jit code. It is possible it's not actually even meant to work (see the bug filed against llvm which couldn't provide an answer neither) but in any case this is meant to be only temporary unless the intrinsics are really illegal. So, just use the fallback code (which should be cmp/select, we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages to optimize this back to pmin/pmax in the end). This addresses https://llvm.org/bugs/show_bug.cgi?id=28176 CC: <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Tested-by: Vinson Lee <[email protected]> Tested-by: Aaron Watry <[email protected]>
* gallivm: Fix trivial sign warningsJan Vesely2016-06-138-21/+22
| | | | | | | v2: include whitespace fixes Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: more 64-bit integer prep work.Dave Airlie2016-06-111-8/+8
| | | | | | | This converts one other place to using the new helper. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallivm: make non-float return code bitcast consistent.Dave Airlie2016-06-111-12/+6
| | | | | | | | This just uses the same form across the fetches. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium/gallivm: use 64-bit test instead of doubles.Dave Airlie2016-06-111-37/+36
| | | | | | | | | This just makes some generic code that currently emits double suitable for emitting 64-bit values. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallivm: Never emit llvm.fmuladd on LLVM 3.3.Jose Fonseca2016-06-102-1/+7
| | | | | | | | Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't handle llvm.fmuladd and instead it fall backs to a C function. Which in turn causes a segfault on Windows. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Use llvm.fmuladd.*.Jose Fonseca2016-06-105-47/+87
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* util,gallivm: Explicitly enable/disable fma attribute.Jose Fonseca2016-06-102-0/+11
| | | | | | | | | | As suggested by Roland Scheidegger. Use the same logic as f16c, since fma requires VEX encoding. But disable FMA on LLVM 3.3 without MCJIT. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: initialize init_native_targets_once_flag correctlyFrederic Devernay2016-05-301-1/+1
| | | | Signed-off-by: Marek Olšák <[email protected]>
* gallivm: eliminate a unnecessary AND with unorm lerpsRoland Scheidegger2016-05-271-10/+35
| | | | | | | | | Instead of doing a add and then mask out the upper bits, we can simply do a add with a half wide type (this, of course, assumes the hw can actually do it...), so we'll get the required zero in the upper bits automatically. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: improve dumping of bitcodeRoland Scheidegger2016-05-112-4/+9
| | | | | | | | | Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode. Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple modules can be dumped (albeit it might still overwrite previous modules, particularly the modules from draw tend to always have the same name). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: print declarations of intrinsics with GALLIVM_DEBUG=irRoland Scheidegger2016-05-101-0/+5
| | | | | | | Those aren't really interesting, however outputting them is helpful when trying to feed the IR to llvm llc (or opt) for debugging. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use InternalLinkage instead of PrivateLinkage for texture functionsRoland Scheidegger2016-05-101-1/+1
| | | | | | | At least with MCJIT the disassembler will crash otherwise when trying to disassemble such functions. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: disable avx512 featuresRoland Scheidegger2016-05-101-0/+12
| | | | | | | | | | | | | | | We don't target this yet, and some llvm versions incorrectly enable it based on cpu string, causing crashes. (Albeit this is a losing battle, it is pretty much guaranteed when the next new feature comes along llvm will mistakenly enable it on some future cpu, thus we would have to proactively disable all new features as llvm adds them.) This should fix https://bugs.freedesktop.org/show_bug.cgi?id=94291 (untested) Tested-by: Timo Aaltonen <[email protected]> Reviewed-by: Jose Fonseca <[email protected] CC: <[email protected]>
* gallium: enable intel jitevents profilingTim Rowley2016-05-091-0/+9
| | | | | | | | LLVM when configured with "intel jitevents" enabled can inform VTune about dynamic code, so individual shaders are attributed profiling data and the resulting assembly can be examined. Acked-by: Roland Scheidegger <[email protected]>
* gallivm: s/Elements/ARRAY_SIZE/Brian Paul2016-04-279-29/+29
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: make sampling more robust against bogus coordinatesRoland Scheidegger2016-04-263-13/+43
| | | | | | | | | | | | | | | | | | | | | | Some cases (especially these using fract for coord wrapping) did not handle NaNs (or Infs) correctly - the following code assumed the fract result could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract result was NaN - which then could produce out-of-bound offsets. (Note that the explicit NaN behavior changes for min/max on x86 sse don't result in actual changes in the generated jit code, but may on other architectures. Found by looking through all the wrap functions.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94955 No piglit changes. (v2: fix min/max typo in coord_mirror, add comment) Cc: "11.1 11.2" <[email protected]> Tested-by: Bruce Cherniak <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: use unreachable instead of assertsGrazvydas Ignotas2016-04-251-1/+1
| | | | | | | Avoids warnings in release builds. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*Marek Olšák2016-04-221-3/+3
| | | | Acked-by: Jose Fonseca <[email protected]>
* gallium: merge PIPE_SWIZZLE_* and UTIL_FORMAT_SWIZZLE_*Marek Olšák2016-04-226-45/+45
| | | | | | | | Use PIPE_SWIZZLE_* everywhere. Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE. The new enum is called pipe_swizzle. Acked-by: Jose Fonseca <[email protected]>
* gallivm: fix bogus argument order to lp_build_sample_mipmap functionRoland Scheidegger2016-04-211-2/+2
| | | | | | | | | | | | | | | Screwed up since 0753b135f6e83b171d8a1b08aea967374f3542bc. (Only an issue with different min/mag filters, and then only in some cases, which is probably why it went unnoticed for quite a while. The effect should have simply been nearest mip filter instead of linear, iff min was nearest, mag was linear, and all pixels hit the mignifying path.) Fixes a bunch of dEQP failures. Reviewed-by: Jose Fonseca <[email protected]> Cc: "11.1 11.2" <[email protected]>
* gallivm: Avoid llvm::sys::getProcessTriple().Jose Fonseca2016-04-191-3/+3
| | | | | | | Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3 onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway, Reviewed-by: Roland Scheidegger <[email protected]>