| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AoS sampling tries to use integers for coord wrapping when possible,
as it should be faster. However, for AVX, this was suboptimal, because
only floats can use 8x32bit vectors, whereas integers have to be split
into 4x32bit vectors. (I believe part of why it was slower was also
that at least earlier llvm versions had trouble optimizing it properly,
since you can still do simple bit ops with 8x32bit vectors, so a
sequence of int add / and / int add / and with such vectors would
actually end up doing 128bit inserts/extracts between the operations
instead of just doing the cheap 128bit ands.)
Hence, a special float coord wrapping path was added to AoS sampling.
But this path was actually disabled for a long time already, since we
found that just splitting everything before entering the AoS path was
still sligthly faster usually, so none of this float coord wrapping
code was used anymore (AoS sampling code, when avx2 isn't supported,
never sees vectors with length > 4). I thought it might be useful some
day again, but I'm not interested anymore in optimizing for very weird
instruction sets which have support for 256bit vectors for floats but
not for ints, so just drop it.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The common truncf(x + 0.5) fails for the floating-point value just less
than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after
rounding is 1.0, thus truncf does not produce the desired value.
The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before
truncating. This works for all values.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because we only have one file_max for the (2d) gs input file, the value
actually represents the max of attrib and vertex index (although I'm
not entirely sure if we really want the max, since the max valid value
of the vertex dimension can be easily deduced from the input primitive).
Thus in cases where the number of inputs is higher than the number of
vertices per prim, we did not properly clamp the vertex index, which
would result in out-of-bound fetches, potentially causing segfaults
(the segfaults seemed actually difficult to trigger, but valgrind
certainly wasn't happy). This might have happened even if the shader
did not actually try to fetch bogus vertices, if the fetching happened
in non-active conditional clauses.
To fix simply use the correct max vertex index value (derived from
the input prim type) instead when clamping for this case.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
builds
For testing it is of interest that all tests of dEQP pass, e.g. to test
virglrenderer on a host only providing software rendering like in a CI.
Hence make it possible to disable certain optimizations that make tests fail.
While we are there also add some documentation to the flags to make it clear
that this is opt-out.
Setting the environment variable "GALLIVM_PERF=no_filter_hacks" can be used to make
the following tests pass in release mode:
dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_*
dEQP-GLES2.functional.texture.mipmap.cube.generate.*
dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_*
dEQP-GLES2.functional.texture.vertex.2d.wrap.*
Related:
https://bugs.freedesktop.org/show_bug.cgi?id=94957
v2: rename optimization disabling flag to 'safemath' and also move the
nopt flag to the perf flags.
v3: rename flag "safemath" to "no_filter_hacks" since safemath is usually
associated with floating point operations (Roland)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously gallivm would attempt to use VSX instructions on all systems
where it detected that Altivec is supported; however, VSX was added to
POWER long after Altivec, causing lots of crashes on older POWER/PPC
hardware, e.g. PPC Macs. By detecting VSX separately from Altivec we can
automatically disable it on hardware that supports Altivec but not VSX
Signed-off-by: Vicki Pfau <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This hijacks the top 16-bits of swizzle, to pass in the swizzle
for the second channel.
This fixes handling .yx swizzles of 64-bit values.
This should fixup radeonsi and llvmpipe.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These have been removed. Unfortunately auto-upgrade doesn't work for
jit. (Worse, it seems we don't get a compilation error anymore when
compiling the shader, rather llvm will just do a call to a null
function in the jitted shaders making it difficult to detect when
intrinsics vanish.)
Luckily the signed ones are still there, I helped convincing llvm
removing them is a bad idea for now, since while the unsigned ones have
sort of agreed-upon simplest patterns to replace them with, this is not
the case for the signed ones, and they require _significantly_ more
complex patterns - to the point that the recognition is IMHO probably
unlikely to ever work reliably in practice (due to other optimizations
interfering). (Even for the relatively trivial unsigned patterns, llvm
already added test cases where recognition doesn't work, unsaturated
add followed by saturated add may produce atrocious code.)
Nevertheless, it seems there's a serious quest to squash all
cpu-specific intrinsics going on, so I'd expect patches to nuke them as
well to resurface.
Adapt the existing fallback code to match the simple patterns llvm uses
and hope for the best. I've verified with lp_test_blend that it does
produce the expected saturated assembly instructions. Though our
cmp/select build helpers don't use boolean masks, but it doesn't seem
to interfere with llvm's ability to recognize the pattern.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v1 -> v2:
- nv30 is _NOT_ scalar as suggested by Ilia Mirkin.
- Change from a screen cap to a shader cap as suggested
by Eric Anholt.
- radeonsi is scalar as suggested by Marek Olšák.
- Change missing ones to be scalar.
v2 -> v3:
- r600 prefers vec4 as suggested by Marek Olšák.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a single allocation of array type instead of the old-style array
allocation for the temp and immediate arrays.
Probably only makes a difference if they aren't used indirectly (so,
if we used them solely because there's too many temps or immediates).
In this case the sroa and early-cse passes can sometimes do some
optimizations which they otherwise cannot.
(As a side note, for the temp reg array, we actually really should
use one allocation per array id, not just one for everything.)
Note that the instcombine pass would actually promote such
allocations to single alloc of array type as well, but it's too late
for some artificial shaders we've seen to help (we don't want to run
instcombine at the beginning due to its cost, hence would need
another sroa/cse pass after instcombine). sroa/early-cse help there
because they can actually eliminate all of the huge shader, reducing
it to a single const output (don't ask...).
(Interestingly, instcombine also removes all the bitcasts we do on that
allocation for single-value gathering, and in the end directly indexes
into the single vector elements, which according to spec is only
semi-valid, but this happens regardless. Another thing instcombine also
does is use inbound GEPs, which is probably something we should do
manually as well - for indirectly indexed reg files llvm may not be
able to figure it out on its own, but we should be able to guarantee
all pointers are always inbound. In any case, by the looks of it
using single allocation with array type seems to be the right thing
to do even for ordinary shaders.)
No piglit change.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that hour...).
Also, print out the function passes for opt which correspond to what
was used for jit compilation (and also the opt level for codegen).
Using opt/llc this way should then pretty much mimic what was done
for jit. (When specifying something like -time-passes
-debug-pass=[Structure|Arguments] (for either opt or llc) that also
gives very useful information in which passes all the time was spent,
and which passes are really run along with the order - llvm will add
passes due to dependencies on its own, and of course -O2 for llc
comes with a ~100 pass list.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Conversion to int can otherwise overflow if compile times are over
~71min. (Yes this can happen...)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to move things when it previously couldn't,
which causes noticeable compile time increases.
There's more loop passes in llvm, but I'm not sure which ones are helpful,
and I couldn't find anything which would roughly do what the old licm in
llvm 3.3 did, so ditch it.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is actually better.
The only downside I've found is this enables the LICM pass to move some
things out of the main shader loop (in the case I've seen, instanced
vertex fetch (which is constant within the jit shader) plus the derived
instructions in the shader) which it couldn't do before for some reason.
This would actually be desirable but can increase compile time
considerably (licm seems to have considerable cost when it actually can
move things out of loops, due to alias analysis). But blaming early cse
for this seems inappropriate. (Note that the first two sroa / earlycse
passes are similar to what a standard llvm opt -O1/-O2 pipeline would
do, albeit this has some more passes even before but I don't think
they'd do much for us.)
It also in particular helps some crazy shader used for driver
verification (don't ask...) a lot (about factor of 6 faster in compile
time) (due to simplfiying the ir before LICM is run).
While here, also move licm behind simplifycfg. For some shaders there
seems to be very significant compile time gains (we've seen a factor
of 10000 albeit that was a really crazy shader you'd certainly never
see in a real app), beause LICM is quite expensive and there's cases
where running simplifycfg (along with sroa and early-cse) before licm
reduces IR complexity significantly. (I'm not entirely sure if it would
make sense to also run it afterwards.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
In preparation of dimension-aware LLVM image intrinsics.
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This is in preparation for the new image intrinsics.
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Include llvm-c/Transforms/Utils.h with the newest LLVM 7
Signed-of-by: Mike Lothian <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
My gcc doesn't figure out that dims >= 1 (seems reasonable), and doesn't
notice that ddmax is used from the same no_rho_opt as its initialization.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Once a lp_build_sampler_soa or lp_build_sampler_aos object is created,
it should never be modified. Found by inspection.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
shader-db doesn't show any regression and 32-bit pointers with byval
are declared as VGPRs for some reason.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are not allowed to modify the incoming coords values, or things may
crash (as we may be inside a llvm conditional and the values may be used
in another branch).
I recently broke this when fixing an issue with NaNs and seamless cube
map filtering, and it causes crashes when doing cubemap filtering
if the min and mag filters are different.
Add const to the pointers passed in to prevent this mishap in the future.
Fixes: a485ad0bcd ("gallivm: fix an issue with NaNs with seamless cube filtering")
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lp_build_interleave2_half was not doing the right thing for avx512-style
16-wide loads.
This path is hit in the swr driver with a 16-wide vertex shader. It is
called from lp_build_transpose_aos, when doing texel fetches and the
fetched data needs to be transposed to one component per output register.
Special-case the post-load swizzle operations for avx512 16x32 (16-wide
32-bit values) so that we move the xyzw components correctly to the outputs.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The spec says the missing texel (when we wrap around both x and y axis)
should be synthesized as the average of the 3 other texels. For bilinear
filtering however we instead adjusted the filter weights (because, while
the complexity looks similar, there would be 4 times as many color values
to fix up than weights). Obviously this could not work for gather (hence
accurate corner filtering was disabled with gather).
Implement this by just doing it as the spec implies - calculate the 4th
texel as the average of the other 3. With gather of course there's only
one color to worry about, so it's not all that many instructions neither
(albeit surely the whole cube map filtering is hilariously complex).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cube texture wrapping is a bit special since the values (post face
projection) always are within [0,1], so we took advantage of that and
omitted some clamps.
However, we can still get NaNs (either because the coords already had NaNs,
or the face projection generated them), and in fact we didn't handle them
quite safely. I've seen -INT_MAX + 1 been propagated through as the final int
coord value, albeit I didn't observe a crash. (Not quite a coincidence, since
any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive
number - nevertheless, I'd rather not try my luck, I'm not entirely sure it
can't really turn up negative neither due to seamless coord swapping, plus
ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And
we kill off NaNs similarly with ordinary texture wrapping too.)
So kill off the NaNs by using the common max against zero method.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
While here, also optimize the coord_mirror helper a bit (we can do the
mirroring directly by exploiting float rounding, no need for fixing up
odd/even manually).
I did not touch the mirror_clamp and mirror_clamp_to_border modes.
In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy
modes. They are specified against old gl rules, which actually does
the mirroring not per sample (so you get swapped order if the coord
is in the mirrored section). I think the idea though is that they should
follow the respecified mirror_clamp_to_edge rules so the order would be
correct.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
anything but unorm formats, since not just all src/dst values are
between [0,1], but the tests are crafted in a way that the results
are between [0,1] too.)
v2: some formatting fixes, and fix a fairly obscure (to debug)
issue with alpha-only formats (not related to snorm at all), where
blend optimization would think it could simplify the blend equation
if the blend factors were complementary, however was using the
completely unrelated rgb blend factors instead of the alpha ones...
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This looks like an evergreen specific feature, but with atomic
counters AMD have hw specific counters they use instead of operating
on buffers directly. These are separate to the buffer atomics,
so require different limits and code paths.
I've left the CAP for atomic type extensible in case someone
else has a variant on this sort of thing (freedreno maybe?)
and needs to change it.
This adds all the CAPs required to add support for those atomic
counters, along with a related CAP for limiting the number of
output resources.
I'd like to land this and the st patch then I can start to
upstream the evergreen support for these and other GL4.x features.
v2: drop the ATOMIC_COUNTER_MODE cap, just use the return
from the HW counters. If 0 we use the current mode.
v3: fix some rebase errors (Gert Wollny)
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM 6 changed the API on the fast-math-flags:
https://reviews.llvm.org/rL317488
NOTE: This also enables the new flag 'ApproxFunc' to allow for
approximations for library functions (sin, cos, ...). I'm not completly
convinced, that this is something mesa should do.
Signed-off-by: Tobias Droste <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Fixes piglit vs-roundeven-{float,vec[234]} with simd16 VS.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Increase the max allowed vector size from 256 to 512.
No piglit llvmpipe regressions running on avx2.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The intrinsic is gone, causing shader compilation to crash.
While here, also change the fallback code to match what llvm's auto-updater
of these intrinsics would do (except that there will still be zext/trunc
instructions in there), which should ensure that the sequence gets recognized
and fused back into a pabs in the end (I didn't test this, and it's possible
even the old sequence would get recognized, but I don't see a reason why we
shouldn't use the same sequence in any case).
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In lp_build_create_jit_compiler_for_module(), advance the minimum
version of LLVM for VSX code generation to 4.0; this is the minimum
revision at which several known VSX code generation bugs are fixed:
https://llvm.org/bugs/show_bug.cgi?id=25503 (fixed in 3.8.1)
https://llvm.org/bugs/show_bug.cgi?id=26775 (fixed in 3.8.1)
https://llvm.org/bugs/show_bug.cgi?id=33531 (fixed in 4.0)
An llc performance bug introduced in LLVM 4.0,
https://llvm.org/bugs/show_bug.cgi?id=34647
is still pending as of LLVM 5.0, but only has a pronounced effect on
one of the Piglit tests: ext_transform_feedback-max-varyings.
All changes tested via Piglit.
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In init_native_targets, allow the passing of additional options to
the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control.
This option is available only #ifdef DEBUG, initially.
At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions()
declaration.
v2: Fix compile error with old llvm versions (sroland)
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In gallivm_compile_module, fix a typo in the
debug_printf("Invoke as \"llc ..." message.
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The operation performed is all the same as LODQ, but with the usual
differences between dx10 and GL texture opcodes, that is separate resource
and sampler indices (plus result swizzling, and setting z/w channels
to zero).
Reviewed-by: Jose Fonseca <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
Denotes availability of 64bit int atomic instructions
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses all the existing code to calculate lod values for mip linear
filtering. Though we'll have to disable the simplifications (if we know some
parts of the lod calculation won't actually matter for filtering purposes due
to mip clamps etc.). For better or worse, we'll also disable lod calculation
hacks (mostly should make a difference for cube maps) always - the issue with
per-pixel lod being difficult is mostly because we then have different mipmaps
needed for the actual texel fetch, which isn't a problem with lodq.
We still use approximation for the log2 - for that reason I believe the float
part of the lod is only accurate to about 4-5 bits (and one bit less with 1d
textures actually) which is hopefully good enough (though d3d10 technically
requires 6 bits - could use quadratic interpolation instead of linear to get
8 bits or so).
Since lodq requires unclamped lod, we also have to move some sampler key
calculations to texture sampling code - even if we know we're going to access
mipmap 0 we still have to calculate lod and apply lod_bias for lodq.
Passes piglit ARB_texture_query_lod tests (after having fixed the test).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Denotes native half precision float operations capability
v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16
fix indentation
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gather is defined in terms of bilinear filtering, just without the filtering
part. However, there's actually some subtle differences required in our
implementation, because we use some tricks to simplify coord wrapping for the
two coords per direction.
For bilinear filtering, we don't care if we end up with an incorrect
texel, as long as the filter weight is 0.0 for it. Likewise, the order of
the texels doesn't actually matter (as long as they still have the correct
filter weight).
But for gather, these tricks lead to incorrect results.
Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions
which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix
right now, and noone really seems to care...).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)
Reviewed-by: Jose Fonseca <[email protected]>
|