| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
There are some undefined behavior subtleties, so having a function to match
the u_bit_scan_consecutive_range makes sense.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to use the SysSemanticToIndex to tell if we've seen
the semantics at all.
Acked-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us restart the machine at a PC value, and exits
the machine when we hit a barrier.
Compute shaders will then execute all the threads up to the
barrier, then restart the machines after the barrier once
all are done.
v2: comment the code a bit, change return types.
Acked-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
compute shaders don't need input/outputs so don't bother
allocating memory for these.
Acked-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This implements basic load/store/atomic ops on MEMORY types
for compute shaders.
Acked-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This is just a cleanup that will make later changes easier
to make.
Acked-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used later to restart barriered execution
threads in compute, for now we just want to change the API.
Acked-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For compute support some of the system values are .xyz types,
so move to using a vector instead of a single channel.
[airlied: squash swizzle fix from compute series].
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
a) SysSemanticToIndex needs to be indexed with the semantic name
not the decl->Declaration.Semantic.
b) doing this in run is too late, as the mappings are all setup
prior to run in the execs.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
More portable, particularly when building with Clang, which implements
all MSVC intrisincs in its own intrin.h, but doesn't actually support
`#pragma instrinsic`.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was definitely bugs here mixing up the PIPE_ and TGSI_ defines,
hopefully they didn't cause any problems, since mostly it was special
cases for GEOMETRY.
This clarifies at shader machine create what type of shader this
machine will execute. This is needed also for compute shaders where
we don't want to allocate inputs/outputs.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It gets annoying that changing the tgsi exec rebuilds the state
tracker unnecessarily. Putting this include into draw_gs.h which
uses it causes a lot less rebuilds.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some cases (especially these using fract for coord wrapping) did not handle
NaNs (or Infs) correctly - the following code assumed the fract result
could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract
result was NaN - which then could produce out-of-bound offsets.
(Note that the explicit NaN behavior changes for min/max on x86 sse don't
result in actual changes in the generated jit code, but may on other
architectures. Found by looking through all the wrap functions.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94955
No piglit changes.
(v2: fix min/max typo in coord_mirror, add comment)
Cc: "11.1 11.2" <[email protected]>
Tested-by: Bruce Cherniak <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
And remove local definition of Elements() macro.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To silence a valgrind uninitialized memory warning.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94955
Cc: "11.1 11.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
Avoids warnings in release builds.
Signed-off-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Mark variables MAYBE_UNUSED to avoid unused-but-set-variable warnings
in release build.
Signed-off-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Use PIPE_SWIZZLE_* everywhere.
Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE.
The new enum is called pipe_swizzle.
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Screwed up since 0753b135f6e83b171d8a1b08aea967374f3542bc.
(Only an issue with different min/mag filters, and then only in some cases,
which is probably why it went unnoticed for quite a while.
The effect should have simply been nearest mip filter instead of linear, iff
min was nearest, mag was linear, and all pixels hit the mignifying path.)
Fixes a bunch of dEQP failures.
Reviewed-by: Jose Fonseca <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for lowering TRUNC using the following sequence:
FRC tmpA, |src|
SUB tmpA, |src|, tmpA
CMP dst, -tmpA, tmpA
Note that this is incompatible with FRC lowering.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL. Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.
We also need to deal with FLR instructions emitted by the lowering
code. Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For use by radeonsi.
v2: Make sure that it works for all 64 bits set.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Just keep a copy of the module_name in gallivm.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.
This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.
Tested MCJIT with LLVM 3.3 to 3.6.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.
The new option is GALLIVM_MCJIT.
Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows. I'm almost certain this used to
work, so there probably is a regression somewhere.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Instead of LLVM C++ interfaces.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And llvm::raw_string_ostream where not (LLVM 3.3).
Thereby eliminating yet another dependency on unstable LLVM interfaces.
As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)
Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This isn't currently that easy to expand, so fix it up
before expanding it later to include dynamic samplers.
[airlied: use some local variables (Roland)]
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
The second ffs returns 0, yielding count == -1.
v2: change 1 to 1u
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
llvm 3.7 sometimes simply miscompiles vector selects.
See https://bugs.freedesktop.org/show_bug.cgi?id=94972
This was fixed in llvm r249669
(https://llvm.org/bugs/show_bug.cgi?id=24532).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
Tested-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The credit for finding and isolating this bug goes to Vinson and Roland.
The buggy LLVM versions were found by doing
opt -instcombine llvm-pr27332.ll > /dev/null
where llvm-pr27332.ll is the IR from
https://llvm.org/bugs/show_bug.cgi?id=27332#c3
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to use sse roundps intrinsic directly, but switched to use the llvm
intrinsics for rounding with e4f01da15d8c6ce3e8c77ff3ff3d2ce2574a3f7b.
However, llvm semantics follows standard math lib round function which is
specced to do roundNearestAwayFromZero but we really want roundNearestEven
(moreoever, using round generates atrocious code since the cpu can't do it
directly and it results in scalar calls to libm __roundf).
So, use llvm.nearbyint instead, which does exactly the right thing, and even
has the advantage of being available with llvm 3.3 too. (I've verified it
actually generates a roundps instruction with llvm 3.3.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Increase r to four channels as rgba is written to it
Reviewed-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
v2: fail in radeon_winsys_bo_create if size > 32 bits
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
being able to allocate more than 4 GB may be useful
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|