aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: remove unused float coord wrapping for aos samplingRoland Scheidegger2018-12-121-507/+23
| | | | | | | | | | | | | | | | | | | | | | | AoS sampling tries to use integers for coord wrapping when possible, as it should be faster. However, for AVX, this was suboptimal, because only floats can use 8x32bit vectors, whereas integers have to be split into 4x32bit vectors. (I believe part of why it was slower was also that at least earlier llvm versions had trouble optimizing it properly, since you can still do simple bit ops with 8x32bit vectors, so a sequence of int add / and / int add / and with such vectors would actually end up doing 128bit inserts/extracts between the operations instead of just doing the cheap 128bit ands.) Hence, a special float coord wrapping path was added to AoS sampling. But this path was actually disabled for a long time already, since we found that just splitting everything before entering the AoS path was still sligthly faster usually, so none of this float coord wrapping code was used anymore (AoS sampling code, when avx2 isn't supported, never sees vectors with length > 4). I thought it might be useful some day again, but I'm not interested anymore in optimizing for very weird instruction sets which have support for 256bit vectors for floats but not for ints, so just drop it. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Use nextafterf(0.5, 0.0) as rounding constantMatt Turner2018-11-281-1/+1
| | | | | | | | | | | The common truncf(x + 0.5) fails for the floating-point value just less than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after rounding is 1.0, thus truncf does not produce the desired value. The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before truncating. This works for all values. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix improper clamping of vertex index when fetching gs inputsRoland Scheidegger2018-11-091-10/+31
| | | | | | | | | | | | | | | | | | | | Because we only have one file_max for the (2d) gs input file, the value actually represents the max of attrib and vertex index (although I'm not entirely sure if we really want the max, since the max valid value of the vertex dimension can be easily deduced from the input primitive). Thus in cases where the number of inputs is higher than the number of vertices per prim, we did not properly clamp the vertex index, which would result in out-of-bound fetches, potentially causing segfaults (the segfaults seemed actually difficult to trigger, but valgrind certainly wasn't happy). This might have happened even if the shader did not actually try to fetch bogus vertices, if the fetching happened in non-active conditional clauses. To fix simply use the correct max vertex index value (derived from the input prim type) instead when clamping for this case. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Make it possible to disable some optimization shortcuts in release ↵Gert Wollny2018-10-064-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | builds For testing it is of interest that all tests of dEQP pass, e.g. to test virglrenderer on a host only providing software rendering like in a CI. Hence make it possible to disable certain optimizations that make tests fail. While we are there also add some documentation to the flags to make it clear that this is opt-out. Setting the environment variable "GALLIVM_PERF=no_filter_hacks" can be used to make the following tests pass in release mode: dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_* dEQP-GLES2.functional.texture.mipmap.cube.generate.* dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_* dEQP-GLES2.functional.texture.vertex.2d.wrap.* Related: https://bugs.freedesktop.org/show_bug.cgi?id=94957 v2: rename optimization disabling flag to 'safemath' and also move the nopt flag to the perf flags. v3: rename flag "safemath" to "no_filter_hacks" since safemath is usually associated with floating point operations (Roland) Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: ensure string is null-terminated instead of assert()ingEric Engestrom2018-09-251-3/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* gallivm: Detect VSX separately from AltivecVicki Pfau2018-08-301-18/+3
| | | | | | | | | | Previously gallivm would attempt to use VSX instructions on all systems where it detected that Altivec is supported; however, VSX was added to POWER long after Altivec, causing lots of crashes on older POWER/PPC hardware, e.g. PPC Macs. By detecting VSX separately from Altivec we can automatically disable it on hardware that supports Altivec but not VSX Signed-off-by: Vicki Pfau <[email protected]>
* gallivm: allow to pass two swizzles into fetches.Dave Airlie2018-08-302-30/+65
| | | | | | | | | | | | This hijacks the top 16-bits of swizzle, to pass in the swizzle for the second channel. This fixes handling .yx swizzles of 64-bit values. This should fixup radeonsi and llvmpipe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524 Reviewed-by: Marek Olšák <[email protected]>
* gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0Roland Scheidegger2018-08-241-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add scalar isa shader capChristian Gmeiner2018-06-201-0/+2
| | | | | | | | | | | | | | | | v1 -> v2: - nv30 is _NOT_ scalar as suggested by Ilia Mirkin. - Change from a screen cap to a shader cap as suggested by Eric Anholt. - radeonsi is scalar as suggested by Marek Olšák. - Change missing ones to be scalar. v2 -> v3: - r600 prefers vec4 as suggested by Marek Olšák. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: Use alloca_undef with array type instead of alloca_arrayRoland Scheidegger2018-05-161-28/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a single allocation of array type instead of the old-style array allocation for the temp and immediate arrays. Probably only makes a difference if they aren't used indirectly (so, if we used them solely because there's too many temps or immediates). In this case the sroa and early-cse passes can sometimes do some optimizations which they otherwise cannot. (As a side note, for the temp reg array, we actually really should use one allocation per array id, not just one for everything.) Note that the instcombine pass would actually promote such allocations to single alloc of array type as well, but it's too late for some artificial shaders we've seen to help (we don't want to run instcombine at the beginning due to its cost, hence would need another sroa/cse pass after instcombine). sroa/early-cse help there because they can actually eliminate all of the huge shader, reducing it to a single const output (don't ask...). (Interestingly, instcombine also removes all the bitcasts we do on that allocation for single-value gathering, and in the end directly indexes into the single vector elements, which according to spec is only semi-valid, but this happens regardless. Another thing instcombine also does is use inbound GEPs, which is probably something we should do manually as well - for indirectly indexed reg files llvm may not be able to figure it out on its own, but we should be able to guarantee all pointers are always inbound. In any case, by the looks of it using single allocation with array type seems to be the right thing to do even for ordinary shaders.) No piglit change. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: dump bitcode before optimizationRoland Scheidegger2018-04-241-13/+20
| | | | | | | | | | | | | | | | | | If we dump the bitcode for off-line debug purposes, we really want the pre-optimized bitcode, otherwise it's useless in identifying problems with IR optimization (if you have a shader which takes an hour to do IR optimization, it's also nice you don't have to wait that hour...). Also, print out the function passes for opt which correspond to what was used for jit compilation (and also the opt level for codegen). Using opt/llc this way should then pretty much mimic what was done for jit. (When specifying something like -time-passes -debug-pass=[Structure|Arguments] (for either opt or llc) that also gives very useful information in which passes all the time was spent, and which passes are really run along with the order - llvm will add passes due to dependencies on its own, and of course -O2 for llc comes with a ~100 pass list.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) do division by 1000 with int64Roland Scheidegger2018-04-241-1/+1
| | | | | | | Conversion to int can otherwise overflow if compile times are over ~71min. (Yes this can happen...) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove LICM passRoland Scheidegger2018-04-241-1/+9
| | | | | | | | | | | | | | LICM is simply too expensive, even though it presumably can help quite a bit in some cases. It was definitely cheaper in llvm 3.3, though as far as I can tell with llvm 3.3 it failed to do anything in most cases. early-cse also actually seems to cause licm to be able to move things when it previously couldn't, which causes noticeable compile time increases. There's more loop passes in llvm, but I'm not sure which ones are helpful, and I couldn't find anything which would roughly do what the old licm in llvm 3.3 did, so ditch it. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add early cse passRoland Scheidegger2018-04-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass is quite cheap, and can simplify the IR quite a bit for our generated IR. In particular on a variety of shaders I've found the time saved by other passes due to the simplified IR more than makes up for the cost of this pass, and on top of that the end result is actually better. The only downside I've found is this enables the LICM pass to move some things out of the main shader loop (in the case I've seen, instanced vertex fetch (which is constant within the jit shader) plus the derived instructions in the shader) which it couldn't do before for some reason. This would actually be desirable but can increase compile time considerably (licm seems to have considerable cost when it actually can move things out of loops, due to alias analysis). But blaming early cse for this seems inappropriate. (Note that the first two sroa / earlycse passes are similar to what a standard llvm opt -O1/-O2 pipeline would do, albeit this has some more passes even before but I don't think they'd do much for us.) It also in particular helps some crazy shader used for driver verification (don't ask...) a lot (about factor of 6 faster in compile time) (due to simplfiying the ir before LICM is run). While here, also move licm behind simplifycfg. For some shaders there seems to be very significant compile time gains (we've seen a factor of 10000 albeit that was a really crazy shader you'd certainly never see in a real app), beause LICM is quite expensive and there's cases where running simplifycfg (along with sroa and early-cse) before licm reduces IR complexity significantly. (I'm not entirely sure if it would make sense to also run it afterwards.) Reviewed-by: Jose Fonseca <[email protected]>
* radeonsi: generate image load/store/atomic ops using ac_build_image_opcodeNicolai Hähnle2018-04-201-1/+1
| | | | | | In preparation of dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <[email protected]>
* amd/common: pass address components individually to ac_build_image_intrinsicNicolai Hähnle2018-04-201-1/+1
| | | | | | This is in preparation for the new image intrinsics. Acked-by: Marek Olšák <[email protected]>
* gallivm: Fix include for LLVMAddPromoteMemoryToRegisterPassMike Lothian2018-04-021-0/+3
| | | | | | | | | Include llvm-c/Transforms/Utils.h with the newest LLVM 7 Signed-of-by: Mike Lothian <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-296-12/+12
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* gallivm: use enum tgis_opcodeBrian Paul2018-03-232-8/+12
| | | | Reviewed-by: Eric Anholt <[email protected]>
* gallium/llvmpipe: Fix compiler warnings about ddx/ddy/ddmax.Eric Anholt2018-02-121-1/+1
| | | | | | | My gcc doesn't figure out that dims >= 1 (seems reasonable), and doesn't notice that ddmax is used from the same no_rho_opt as its initialization. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm/llvmpipe: add const qualifiers on sampler variablesBrian Paul2018-02-013-6/+6
| | | | | | | Once a lp_build_sampler_soa or lp_build_sampler_aos object is created, it should never be modified. Found by inspection. Reviewed-by: Roland Scheidegger <[email protected]>
* ac: don't use byval LLVM qualifier in shadersMarek Olšák2018-01-272-3/+0
| | | | | | | shader-db doesn't show any regression and 32-bit pointers with byval are declared as VGPRs for some reason. Reviewed-by: Samuel Pitoiset <[email protected]>
* gallivm: fix crash with seamless cube filtering with different min/mag filterRoland Scheidegger2018-01-251-17/+21
| | | | | | | | | | | | | | We are not allowed to modify the incoming coords values, or things may crash (as we may be inside a llvm conditional and the values may be used in another branch). I recently broke this when fixing an issue with NaNs and seamless cube map filtering, and it causes crashes when doing cubemap filtering if the min and mag filters are different. Add const to the pointers passed in to prevent this mishap in the future. Fixes: a485ad0bcd ("gallivm: fix an issue with NaNs with seamless cube filtering") Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: support avx512 (16x32) in interleave2_halfGeorge Kyriazis2018-01-181-2/+38
| | | | | | | | | | | | | | lp_build_interleave2_half was not doing the right thing for avx512-style 16-wide loads. This path is hit in the swr driver with a 16-wide vertex shader. It is called from lp_build_transpose_aos, when doing texel fetches and the fetched data needs to be transposed to one component per output register. Special-case the post-load swizzle operations for avx512 16x32 (16-wide 32-bit values) so that we move the xyzw components correctly to the outputs. Reviewed-by: Roland Scheidegger <[email protected]>
* ac: import lp_create_builder() from gallivmSamuel Pitoiset2018-01-162-38/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallivm: implement accurate corner behavior for textureGather with cube mapsRoland Scheidegger2017-12-141-103/+201
| | | | | | | | | | | | | | | | The spec says the missing texel (when we wrap around both x and y axis) should be synthesized as the average of the 3 other texels. For bilinear filtering however we instead adjusted the filter weights (because, while the complexity looks similar, there would be 4 times as many color values to fix up than weights). Obviously this could not work for gather (hence accurate corner filtering was disabled with gather). Implement this by just doing it as the spec implies - calculate the 4th texel as the average of the other 3. With gather of course there's only one color to worry about, so it's not all that many instructions neither (albeit surely the whole cube map filtering is hilariously complex). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix an issue with NaNs with seamless cube filteringRoland Scheidegger2017-12-141-0/+11
| | | | | | | | | | | | | | | | | | | Cube texture wrapping is a bit special since the values (post face projection) always are within [0,1], so we took advantage of that and omitted some clamps. However, we can still get NaNs (either because the coords already had NaNs, or the face projection generated them), and in fact we didn't handle them quite safely. I've seen -INT_MAX + 1 been propagated through as the final int coord value, albeit I didn't observe a crash. (Not quite a coincidence, since any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive number - nevertheless, I'd rather not try my luck, I'm not entirely sure it can't really turn up negative neither due to seamless coord swapping, plus ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And we kill off NaNs similarly with ordinary texture wrapping too.) So kill off the NaNs by using the common max against zero method. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix texture wrapping for texture gather for mirror modesRoland Scheidegger2017-12-121-74/+171
| | | | | | | | | | | | | | | | | | | Care must be taken that all coords end up correct, the tests are very sensitive that everything is correctly rounded. This doesn't matter for bilinear filter (since picking a wrong texel with weight zero is ok), and we could also switch the per-sample coords mistakenly. While here, also optimize the coord_mirror helper a bit (we can do the mirroring directly by exploiting float rounding, no need for fixing up odd/even manually). I did not touch the mirror_clamp and mirror_clamp_to_border modes. In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy modes. They are specified against old gl rules, which actually does the mirroring not per sample (so you get swapped order if the coord is in the mirrored section). I think the idea though is that they should follow the respecified mirror_clamp_to_edge rules so the order would be correct. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: fix snorm blendingRoland Scheidegger2017-11-212-25/+32
| | | | | | | | | | | | | | | | | | | The blend math gets a bit funky due to inverse blend factors being in range [0,2] rather than [-1,1], our normalized math can't really cover this. src_alpha_saturate blend factor has a similar problem too. (Note that piglit fbo-blending-formats test is mostly useless for anything but unorm formats, since not just all src/dst values are between [0,1], but the tests are crafted in a way that the results are between [0,1] too.) v2: some formatting fixes, and fix a fairly obscure (to debug) issue with alpha-only formats (not related to snorm at all), where blend optimization would think it could simplify the blend equation if the blend factors were complementary, however was using the completely unrelated rgb blend factors instead of the alpha ones... Reviewed-by: Jose Fonseca <[email protected]>
* gallium: add CAPs to support HW atomic counters. (v3)Dave Airlie2017-11-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This looks like an evergreen specific feature, but with atomic counters AMD have hw specific counters they use instead of operating on buffers directly. These are separate to the buffer atomics, so require different limits and code paths. I've left the CAP for atomic type extensible in case someone else has a variant on this sort of thing (freedreno maybe?) and needs to change it. This adds all the CAPs required to add support for those atomic counters, along with a related CAP for limiting the number of output resources. I'd like to land this and the st patch then I can start to upstream the evergreen support for these and other GL4.x features. v2: drop the ATOMIC_COUNTER_MODE cap, just use the return from the HW counters. If 0 we use the current mode. v3: fix some rebase errors (Gert Wollny) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* util: move os_time.[ch] to src/utilNicolai Hähnle2017-11-091-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallivm: Use new LLVM fast-math-flags APITobias Droste2017-11-081-0/+4
| | | | | | | | | | | | | LLVM 6 changed the API on the fast-math-flags: https://reviews.llvm.org/rL317488 NOTE: This also enables the new flag 'ApproxFunc' to allow for approximations for library functions (sin, cos, ...). I'm not completly convinced, that this is something mesa should do. Signed-off-by: Tobias Droste <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
* gallivm: allow arch rounding with avx512Tim Rowley2017-11-021-1/+2
| | | | | | Fixes piglit vs-roundeven-{float,vec[234]} with simd16 VS. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: allow 512-bit vectorsTim Rowley2017-10-112-9/+9
| | | | | | | | | Increase the max allowed vector size from 256 to 512. No piglit llvmpipe regressions running on avx2. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: don't use pabs intrinsic with llvm version >= 6Roland Scheidegger2017-10-071-9/+4
| | | | | | | | | | | | The intrinsic is gone, causing shader compilation to crash. While here, also change the fallback code to match what llvm's auto-updater of these intrinsics would do (except that there will still be zext/trunc instructions in there), which should ensure that the sequence gets recognized and fused back into a pabs in the end (I didn't test this, and it's possible even the old sequence would get recognized, but I don't see a reason why we shouldn't use the same sequence in any case). Tested-by: Vinson Lee <[email protected]>
* gallivm/ppc64le: adjust VSX code generation control.Ben Crocker2017-10-051-7/+30
| | | | | | | | | | | | | | | | | | | | | | | In lp_build_create_jit_compiler_for_module(), advance the minimum version of LLVM for VSX code generation to 4.0; this is the minimum revision at which several known VSX code generation bugs are fixed: https://llvm.org/bugs/show_bug.cgi?id=25503 (fixed in 3.8.1) https://llvm.org/bugs/show_bug.cgi?id=26775 (fixed in 3.8.1) https://llvm.org/bugs/show_bug.cgi?id=33531 (fixed in 4.0) An llc performance bug introduced in LLVM 4.0, https://llvm.org/bugs/show_bug.cgi?id=34647 is still pending as of LLVM 5.0, but only has a pronounced effect on one of the Piglit tests: ext_transform_feedback-max-varyings. All changes tested via Piglit. Cc: "17.2" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* gallivm: allow additional llc optionsBen Crocker2017-10-051-0/+23
| | | | | | | | | | | | | | | | In init_native_targets, allow the passing of additional options to the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control. This option is available only #ifdef DEBUG, initially. At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions() declaration. v2: Fix compile error with old llvm versions (sroland) Cc: "17.2" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix typo in debug_printf messageBen Crocker2017-10-051-1/+1
| | | | | | | | | | In gallivm_compile_module, fix a typo in the debug_printf("Invoke as \"llc ..." message. Cc: "17.2" <[email protected]> Signed-off-by: Ben Crocker <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: silence 'variable may be used uninitialized' warningsBrian Paul2017-10-031-1/+1
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* gallium: add new LOD opcodeRoland Scheidegger2017-09-301-0/+14
| | | | | | | | | | The operation performed is all the same as LODQ, but with the usual differences between dx10 and GL texture opcodes, that is separate resource and sampler indices (plus result swizzling, and setting z/w channels to zero). Reviewed-by: Jose Fonseca <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-291-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: infer that dst[1] of DFRACEXP is an integerNicolai Hähnle2017-09-292-3/+3
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallivm: add support for TGSI instructions with two outputsNicolai Hähnle2017-09-292-1/+22
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallivm: add dst register index to lp_build_tgsi_context::emit_storeNicolai Hähnle2017-09-293-9/+9
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* tgsi: infer that DLDEXP's second source has an integer typeNicolai Hähnle2017-09-291-2/+2
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-211-0/+1
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe, gallivm: implement lod queries (LODQ opcode)Roland Scheidegger2017-09-204-57/+143
| | | | | | | | | | | | | | | | | | | | | | This uses all the existing code to calculate lod values for mip linear filtering. Though we'll have to disable the simplifications (if we know some parts of the lod calculation won't actually matter for filtering purposes due to mip clamps etc.). For better or worse, we'll also disable lod calculation hacks (mostly should make a difference for cube maps) always - the issue with per-pixel lod being difficult is mostly because we then have different mipmaps needed for the actual texel fetch, which isn't a problem with lodq. We still use approximation for the log2 - for that reason I believe the float part of the lod is only accurate to about 4-5 bits (and one bit less with 1d textures actually) which is hopefully good enough (though d3d10 technically requires 6 bits - could use quadratic interpolation instead of linear to get 8 bits or so). Since lodq requires unclamped lod, we also have to move some sampler key calculations to texture sampling code - even if we know we're going to access mipmap 0 we still have to calculate lod and apply lod_bias for lodq. Passes piglit ARB_texture_query_lod tests (after having fixed the test). Reviewed-by: Jose Fonseca <[email protected]>
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-181-0/+2
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: fix gather implementation a bitRoland Scheidegger2017-09-091-10/+48
| | | | | | | | | | | | | | | | | | gather is defined in terms of bilinear filtering, just without the filtering part. However, there's actually some subtle differences required in our implementation, because we use some tricks to simplify coord wrapping for the two coords per direction. For bilinear filtering, we don't care if we end up with an incorrect texel, as long as the filter weight is 0.0 for it. Likewise, the order of the texels doesn't actually matter (as long as they still have the correct filter weight). But for gather, these tricks lead to incorrect results. Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix right now, and noone really seems to care...). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe, tgsi: hook up dx10 gather4 opcodeRoland Scheidegger2017-09-071-7/+21
| | | | | | | | | Trivial. We already support tg4 for legacy tex opcodes, so the actual texture sampling code already handles it. (Just like TG4, we don't handle additional capabilities and always sample red channel.) Reviewed-by: Jose Fonseca <[email protected]>