| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
The bug only happens on the AOS / fixed-pt path.
|
| |
|
|
|
|
|
|
|
|
| |
This is relying on lp_build_pack2 using the sse2 pack intrinsics which
handle clamping.
(Alternatively could have make it use lp_build_packs2 but it might
not even produce more efficient code than not using the fastpath
in the first place.)
|
|
|
|
|
| |
There's no apparent reason for the former to exist. And they didn't
even have the same value.
|
| |
|
|
|
|
|
| |
There seems to be no reason for it, so do same math for both
(except the scale mul, of course).
|
| |
|
|
|
|
| |
Has similiar use cases to the S8X24 and X24S8 formats.
|
|
|
|
|
|
| |
these formats are needed for hw that can sample and write stencil values.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
this adds the capability + a stencil semantic id, + tgsi scan support.
Signed-off-by: Dave Airlie <[email protected]>
|
| |
|
| |
|
| |
|
|
|
|
| |
To allow more optimizations, in particular for direct textures.
|
|
|
|
| |
Useful to give human legible names in other cases.
|
|
|
|
|
|
|
|
|
|
|
| |
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
|
|
|
|
|
| |
We end up treating them as scalars in the end, and it saves some
instructions.
|
|
|
|
| |
With this commit all explicit Phi emission is now gone.
|
|
|
|
| |
GALLIVM_DEBUG=no_brilinear runtime option
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.
Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
|
| |
|
|
|
|
| |
No need for for a flow stack anymore.
|
| |
|
|
|
|
| |
Simply rely on mem2reg pass. It's easier and more reliable.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Stop disassembling on unconditional backwards jumps.
|
|
|
|
|
| |
Don't branch more than once in quick succession. Don't branch at the
end of the shader.
|
|
|
|
|
| |
LLVM seems to finds it easier to reason about these than our
mantissa-manipulation code.
|
| |
|
|
|
|
| |
Fixes slowdown in isosurf with earlier versions of llvm.
|
|
|
|
|
|
|
|
|
|
| |
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
|
|
|
|
| |
Only requires sse2 now.
|
|
|
|
|
|
|
|
|
| |
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
|
|
|
|
|
|
| |
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
|
|
|
|
|
|
|
| |
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
|
|
|
|
|
|
|
|
|
| |
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
|
| |
|
|
|
|
|
|
| |
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
|
|
|
|
| |
trunc of -1.5 is -1.0 not 1.0...
|
| |
|
|
|
|
| |
Doesn't change generated code quality, but saves some typing.
|
| |
|
|
|
|
|
| |
Also, pass more stuff trhough the sample build context, instead of
arguments.
|
| |
|
| |
|