| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
These degenerate instructions can often be emitted by state trackers
when the semantics of instructions don't match precisely.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Mere syntactical change.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D3D9 Shader Model 2 restricted the fog register to one component,
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172945.aspx ,
but that restriction no longer exists in Shader Model 3, and several
WHCK tests enforce that.
So this change:
- lifts the single-component restriction TGSI_SEMANTIC_FOG
from Gallium interface
- updates the Mesa state tracker to enforce output fog has (f, 0, 0, 1)
- draw module was updated to leave TGSI_SEMANTIC_FOG output registers
alone
Several gallium drivers that are going out of their way to clear
TGSI_SEMANTIC_FOG components could be simplified in the future.
Thanks to Si Chen and Michal Krol for identifying the problem.
Testing done: piglit fogcoord-*.vpfp tests
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Same as Si Chen's commit e7a5905d8a3960b0981750f8131e3af9acbfcdb8 for
tgsi_exec module.
Not actually tested, because softpipe is failing the test that caught
this bug due to unrelated issues.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Fixes "Uninitialized pointer read" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not necessary to scale down cubemap texture coords when generating
mipmaps: we are doing a 2x minification therefore it's guaranteed that
the texture coords will always be at least 1 texel away of the edges.
Scaling down can actually be harmful, as it may cause artefacts when
generating mipmaps with nearest filtering. Sample points will lie
exactly in the middle each 2x2 texels, so the scaling factor was causing
different texels to be take on each quadrant of the cube face. This is
apparent with a 1x1 checkerboard pattern in the base mipmap level:
instead of next mipmap level receiving a constant color throughout the
face, it will have different colors for each quadrant of the face.
The behaviour for blits is left untouched for now, but the cubemap
texture coord scaling hack should be reconsidered eventually.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The exec_mask must be taken in consideration, just like emit_kill above.
The tgsi_exec module has the same bug and should be fixed in a future
change.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
They're not needed in postprocess.h
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Move private data structures and function prototypes out of the
public postprocess.h header file.
Create a pp_private.h for the shared, private data structures, functions.
Remove pp_program.h header.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
To match the pp_ namespace convention.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
CC: "10.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag.
I had prototyped this for a while while debugging an issue, but finally
cleaned this up and added a few more bells and whistles.
v2: Use '$' as marker; better output. Thanks to Brian, Zack and Roland
reviews.
Here is a sample output.
CONST[0].x = 0.00625000009 0.00625000009 0.00625000009 0.00625000009
CONST[0].y = -0.00714285718 -0.00714285718 -0.00714285718 -0.00714285718
CONST[0].z = -1 -1 -1 -1
CONST[0].w = 1 1 1 1
IN[0].x = 143.5 175.5 175.5 143.5
IN[0].y = 123.5 123.5 155.5 155.5
IN[0].z = 0 0 0 0
IN[0].w = 1 1 1 1
$ 1: RCP TEMP[0].w, IN[0].wwww
TEMP[0].w = 1 1 1 1
$ 2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw
TEMP[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976
TEMP[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316
$ 3: MUL OUT[0].xy, TEMP[0], TEMP[0].wwww
OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976
OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316
$ 4: MUL OUT[0].z, IN[0].zzzz, TEMP[0].wwww
OUT[0].z = 0 0 0 0
$ 5: MOV OUT[0].w, TEMP[0]
OUT[0].w = 1 1 1 1
$ 6: END
OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976
OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316
OUT[0].z = 0 0 0 0
OUT[0].w = 1 1 1 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since we explicitly require a integer input we should avoid using exp2 math
(even if we were using optimized versions), which turns the exp2 into a int
sub (plus some casts).
v2: fix bogus uint (needs to be int) math spotted by Matthew, fix comments
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Thanks to Pino Toscano. Patch from Debian package.
Cc: "10.0" <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
This helps fix an issue in the svga driver, and is just safer all-around.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes VDPAU thread save again.
v2: fix some memory leaks reported by Aaron Watry.
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There's only one minor functional change, for immediates the pixel offsets
are no longer added since the values are all the same for all elements in
any case (it might be better if those weren't stored as soa vectors in the
first place maybe).
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, the llvmpipe and draw modules will calculate the depth bias
according to floating point depth buffer semantics described in the
arb_depth_buffer_float specification, when the driver has a z buffer bound
with a format type of UTIL_FORMAT_TYPE_FLOAT.
By default, the driver will use the existing UNORM calculation for depth bias.
A new function, draw_set_zs_format, was added to calculate the Minimum
Resolvable Depth value and floating point depth sense for the draw module.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Patch from Debian package
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Andreas Boll <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first element.
(Copied straight from the same fix for temps.)
While here fix up a couple of broken comments in the fetch functions,
plus don't name a straight float type float4 which is just confusing.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SSE can't handle true vector shifts (with variable shift count),
so llvm is turning them into a mess of extracts, scalar shifts and inserts.
It is however possible to emulate them in lp_build_minify with float muls,
which should be way faster (saves over 20 instructions per 8-wide
lp_build_minify). This wouldn't work for "generic" 32bit shifts though
since we've got only 24bits of mantissa (actually for left shifts it would
work by using sse41 int mul instead of float mul but not for right shifts).
Note that this has very limited scope for now, since this is only used with
per-pixel lod (otherwise we're avoiding the non-constant shift count by doing
per-quad shifts manually), and only 1d textures even then (though the latter
should change).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
util_format_is_rgba8_variant
Just happened to notice it was missing while looking at it.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the
pretty stack trace opt-in rather than opt-out.
The default value of DisablePrettyStackTrace has changed to true in LLVM
3.4 and newer.
Signed-off-by: Vinson Lee <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
| |
|
|
|
|
|
|
| |
We can create clip_ptr_type once instead of n times inside the loop.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
A convenient front end to indices generate/translate code, for emulating
primitives which are not supported natively by the driver.
This handles saving/restoring index buffer state, etc.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Add 'start' parameter to generator/translator.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering code, since we need per-texel
weights which our 2d lerp doesn't (and can't) do. And of course the (now
per element) weights need to be adjusted too for it to work.
Invoke the new filtering code whenever there's an edge to keep things simpler,
as it will work for edges too not just corners but of course it's only needed
with corners.
More ugly code for not much gain but at least a hacked up cubemap demo
shows very nice corners now... Not sure yet if and how this should be
configurable...
v2: incorporate feedback from Jose, only use special corner filtering code
when there's a corner not when there's only an edge (as corner filtering code
is slower, though a perf difference was only measureable when always
forcing edge code). Plus some minor style fixes.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new function replaces four old functions: set_fragment/vertex/
geometry/compute_sampler_views().
Note: at this time, it's expected that the 'start' parameter will
always be zero.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For seamless cube filtering it is necessary to determine new faces and new
coords per sample. The logic for this is _seriously_ complex (what needs
to happen is very "asymmetric" wrt face, x/y under/overflow), further
complicated by the fact that if the 4 samples are in a corner (meaning we
only have actually 3 samples, and all 3 are on different faces) then
falling off the edge is happening _both_ on x and y axis simultaneously.
There was a noticeable performance hit in mesa's cubemap demo when seamless
filtering was forced on (just below 10 percent or so in a debug build, when
disabling all filtering hacks, otherwise it would probably be a bit more) and
when always doing the logic, hence use a branch which it only does it if any
of the pixels in a quad (or in two quads) actually hit this. With that there
was no measurable performance hit in the cubemap demo (neither in a debug nor
release buidl), but this will vary (cubemap demo very rarely hits edges).
Might also be different on other cpus, as this forces SoA sampling path which
potentially can be quite a bit slower.
Note that as for corners, this code gets all the 3 samples which actually
exist right, and the 4th texel will simply be the same as one of the others,
meaning that filter weights will be a bit wrong. This however should be
enough for full OpenGL (but not d3d10) compliance.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
translate_sse.c contains code for msabi on x86_64, but it appears to be
untested.
Currently arguments 1 and 2 passed to the generated code are moved as 32-bit
quantities into the registers used by sysvabi, irrespective of the architecture.
Since these may be pointers, they must be moved as 64-bit quantities to avoid
truncation.
Commit f4dd0991719ef3e2606920c5100b372181c60899 disabled tranlate_sse.c on MinGW
x86_64, I don't know if was due to this issue, or a different one...
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Cygwin also uses the msabi calling convention on x86_64, not the sysvabi calling
convention
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
ignored, and an empty message aborts the commit.
|
|
|
|
|
|
|
|
|
|
|
| |
implementation which uses mmap()
The heap is NX on 64-bit Cygwin, so use the rtasm_exec_malloc() implementation
which uses mmap() to allocate an anonymous page with execute permission, rather
than the one which just uses malloc().
Signed-off-by: Jon TURNEY <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 94d05bf87a21bd364e84f699a0064e5fba58a6f9 as it has a
few problems:
- it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there
- it is merging not only rtti, but the whole cxxflags (defines etc)
which has proven to be a source of troubles (breaks debugging etc.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the recent bind_sampler_states() interface change in gallium
we changed the CSO single_sampler_done() function so that if we were
decreasing the number of sampler states bound in the driver, we'd
null-out the "extra/old" sampler states to unbind them. See commit
1e2fbf265.
However, we didn't make the corresponding fix for sampler views.
This caused an assertion to fail in the svga driver which checked
that the number of sampler views matched the number of sampler states.
This patch fixes cso_restore_sampler_views() so that it nulls-out
the extra/old sampler views if the number of new views is less than
the number of current/old views.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* The rtti fix actually dug up a bug in the scons build scripts.
* Autotools took the LLVM cpp and cxx flags, while scons only took
the cpp flags.
* This grabs the cxx flags and applies them where needed. We may
want to make the same change for the llvm cpp flags in scons.
* The only linux platform I can find with LLVM no-rtti is Ubuntu.
* Fixes bug #70471
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
| |
Otherwise (vs_slot < 0) will never be true.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* As discussed on the mailing list,
forced no-rtti breaks C++ public
API's such as the Haiku C++ libGL.so
* -fno-rtti *can* be still set however
instead of blindly forcing -fno-rtti,
we can rely on the llvm-config
--cppflags output.
If the system llvm is built without
rtti (default), the no-rtti flag will be
present in llvm-config --cppflags
(which we pick up on)
If llvm is built with rtti
(REQUIRES_RTTI=1), then -fno-rtti is
removed from llvm-config --cppflags.
* We could selectively add / remove rtti
from various components, however mixing
rtti and non-rtti code is tricky and
could introduce missing symbols.
* This needs impact tested.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
_GNU_SOURCE appears to not be used reliably. Use _MSC_VER instead so
that MSVC alone is affected.
|
|
|
|
|
|
|
|
|
|
| |
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
v2: also get rid of 3 helper functions no longer used.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).
v2: based on Brian's feedback, clean up code a bit.
And use sign bit of major axis instead of pre-select s/t/r sign for coord
mirroring (which should be the same in the end, saves 2 ands).
Also fix two bugs with select/mirror of derivatives, the minor axes need to
use major axis sign as well (instead of major derivative axis sign), and
don't mistakenly use absolute values of major derivative and inverse major
values.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
GNU C++ compiler declares the C99 lrint, etc. when _GNU_SOURCE is
defined, but MSVC does not.
Trivial.
|
|
|
|
|
|
|
|
|
| |
Both the imul_hi and umul_hi are working with this patch.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code introduces two new 32bit integer multiplication opcodes which
can be used to produce correct 64 bit results. GLSL, OpenCL and D3D10+
require them. We use two seperate opcodes, because they match the
behavior of GLSL and OpenCL, are a lot easier to add than a single
opcode with multiple destinations and because there's not much (any)
difference wrt code-generation.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|