| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
For adding .v4f32 like suffixes to intrinsics, taking special care for
scalar case, which was being often neglected.
This fixes invalid IR when doing mipmap filtering on SSE2 (the only
case where we'd use intrinsics with scalars.)
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Exactly the same code.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We could unconditionally use these instrinsics, but performance with SSE2
would suck, as LLVM falls back to calling libm.
lp_test_arit.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
For simulating less capable machines.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
LLVM often can't determine the mask elements are all ones/zeros, and
there doesn't seem to be a good way to hint that.
Thanks to Roland Scheidegger for spotting and analyzing the issue.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
No longer needed.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only provide a fallback for LLVM 3.3.
One less dependency on LLVM C++ interface.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
By using os_log_message directly, as _debug_vprintf truncates messages
to 4K.
Also cleanup the disassemble interface.
Spotted by Roland.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an old patch I had around.
Vector selects seem to work well from LLVM 3.3. Using them should
improve code quality, as it might make constant propagation pass more
effective.
Tested lp_test_*
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
This instruction has the resource (buffer or image) as a destination to
represent the writemask for SSBO writes. However, this is obviously not
a "real" destination for the purpose of emitting LLVM IR.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
swr driver which is written in C++ needs access to some more
gallium utility functions than are currently exposed.
Reviewed-by: Roland Scheidegger <[email protected]>
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Because the if statement that checks whether we have a return
statement is valid only on x86, surround it with X86 or X86-64
arch defines
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, disassemble() directly prints to stdout. This has broke the
profiling support for llvmpipe JIT code.
This patch redirects the output to an sstream object, which is then
either gets printed to stdout (for assembly debugging) or gets written
to a file in /tmp/ (for profiling support).
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Just like the rest of the msaa "implementation" it's just fake for now...
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
This functionality is not exposed via the LLVM C API.
Tested-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM removed LLVMAddTargetData for the 3.9 release in r260919. For the two
places in mesa where this is called, only enable the lines when compiling
for less then 3.9.
For the radeon driver, I'm not sure how to check if any other LLVM calls need
to be adjusted. I think since the target data used is extracted from the
LLVMModule, it isn't necessary to pass it back to LLVM again.
The code does compile, and at least for radeonsi does run OpenGL games.
[ Michel Dänzer: Move #if closer to LLVMAddTargetData in lp_bld_init.c,
and add HAVE_LLVM < 0x0309 guards around now unused occurrences of TD
and data_layout ]
Signed-off-by: Matthew Dawson <[email protected]>
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This cap indicates the supported representations of programs. It should
be a mask of pipe_shader_ir bits. It will allow to enable
ARB_compute_shader if the underlying driver supports TGSI.
Changes from v2:
- improve description of PIPE_SHADER_CAP_SUPPORTED_IRS
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for these opcodes, the conversion functions were already
there albeit need some new packing stuff.
Just like the tgsi version, piglit won't like it for all the same
reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing
tests, albeit since PK2H won't due to those rounding differences I don't
know if that one works or not as the piglit test is rather difficult to
deal with).
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Add support for these opcodes, the conversion functions were already
there albeit need some new packing stuff.
Just like the tgsi version, piglit won't like it for all the same
reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing
tests, albeit since PK2H won't due those rounding differences I don't
know if that one works or not as the piglit test is rather difficult to
deal with).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a bug when building a pack instruction.
For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.
This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale
I've also corrected some coding style issues in the function.
v2: Returned else statements to vmware style
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In lp_build_conv() and lp_build_conv_auto(), there is a special case of
conversion when sse2 is present. That code path is suitable without any
changes to altivec, because all the functions that are called in that
code path already support altivec.
This patch increase the FPS in POWER arch across the board
between 10%-25%
I checked ipers, glxgears, glxspheres64, openarena, xonotic and glmark2.
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the fetching of fp64 inputs to the geometry shader,
this fixes the recently posted piglit's
arb_gpu_shader_fp64/execution/gs-fs-vs-double-array.shader_test
arb_vertex_attrib_64bit/execution/gs-fs-vs-attrib-double-array.shader_test
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
texel fetches don't use any samplers. Previously we just set the same
number for both texture and sampler unit (as per "ordinary" gl style
sampling where the numbers are always the same) however this would trigger
some assertions checking that the sampler index isn't over PIPE_MAX_SAMPLERS
limit elsewhere with d3d10, so just set to 0.
(Fixing the assertion instead isn't really an option, the sampler isn't
really used but might still pass an out-of-bound pointer around and even
copy some things from it.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch disables the use of VSX instructions, as they cause some
piglit tests to fail
For more details, see: https://llvm.org/bugs/show_bug.cgi?id=25503#c7
With this patch, ppc64le reaches parity with x86-64 as far as piglit test
suite is concerned.
v2:
- Added check that we have at least LLVM 3.4
- Added the LLVM bug URL as a comment in the code
v3:
- Only disable VSX if Altivec is supported, because if Altivec support
is missing, then VSX support doesn't exist anyway.
- Change original patch description.
Signed-off-by: Oded Gabbay <[email protected]>
Cc: "11.0" <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).
This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should prevent disparity between features Mesa and LLVM
believe are supported by the CPU.
http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990
Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.
v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk.
Reviewed-by: Roland Scheidegger <[email protected]>
CC: "10.6 11.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.
I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720
Cc: 11.0 10.6 <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous version has precision issues. This can be a problem
with tessellation. Sadly, I can't find the article where I read it
anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert
this.
v2: added the comment
|
|
|
|
|
|
|
|
| |
and clear the emit_data structure.
The new radeonsi min/max opcode implementation requires this.
(it looks good according to Roland S.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.
When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).
The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.
This patch allows users to force gallivm to register its targets by
calling the gallivm_init_llvm_targets() function.
v2:
- Use call_once and remove mutexes and static initializations.
- Replace gallivm_init_llvm_{begin,end}() with
gallivm_init_llvm_targets().
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
CC: "10.6 11.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.
This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.
Signed-off-by: Oded Gabbay <[email protected]>
CC: "10.6 11.0" <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
lp_bld_tgsi_soa.c: In function 'lp_emit_immediate_soa':
lp_bld_tgsi_soa.c:3065:18: warning: unused variable 'size' [-Wunused-variable]
const uint size = imm->Immediate.NrTokens - 1;
^
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
This will help remove some duplicated code from radeon.
Reviewed-by: Dave Airlie <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
Its only use is to implement a custom version of LLVMDumpValue
on some Windows and embedded platforms.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
All LLVM API calls that require an ostream object have been removed from
the disassemble() function, so we don't need to use this class to wrap
_debug_printf() we can just call this function directly.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes crashes in llvmpipe with LLVM 3.8 and also some piglit tests
on radeonsi that use the draw module.
This is just a temporary solution. The correct solution will require
creating a TargetMachine during gallivm initialization and pulling the
DataLayout from there. This will be a somewhat invasive change, and it
will need to be validatated on multiple LLVM versions.
https://llvm.org/bugs/show_bug.cgi?id=24172
Reviewed-by: Roland Scheidegger <[email protected]>
|