| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
MCJIT uses an ObjectCache object to implement the cache,
this creates and instances of it and adds it to the MCJIT
instances, it stores the cached object for later use by
the outer layers.
Reviewed-by: Roland Scheidegger <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
|
|
|
|
|
|
|
|
| |
If the object is loaded from the cache, a bunch of gallivm/llvm
interactions can be skipped.
Reviewed-by: Roland Scheidegger <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
|
|
|
|
|
|
|
|
| |
This plumbs the cache object into the gallivm API, nothing uses
it yet.
Reviewed-by: Roland Scheidegger <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
|
|
|
|
|
|
|
|
|
| |
Cached shaders require relinking, so hardcoding the pointer
can't work. This switches out the printf code to use new
proper API.
Reviewed-by: Roland Scheidegger <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the virgl CI code was using the noopt path and crashing with a
wierd can't select llvm.coro.subfn.addr error, turns out we have
to call the cleanup pass no matter what.
This enable a lot more virgl gles31 passes, but we have
to disable tessellation shaders as now they executed, they
crash due to missing OES_gpu_shader5, I should try and reenable
them when llvmpipe is further along
Fixes: d32690b43c91d ("gallivm: add coroutine pass manager support")
Reviewed-by: Roland Scheidegger <[email protected]>
Acked-by: Elie Tournier <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5320>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change enables using full 256-bit AVX and AVX2 instructions
on newer platforms.
Reviewed-by: Alok Hota <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4225>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4225>
|
|
|
|
|
|
|
|
|
|
|
| |
llvm 8 has some missing pass dependencies, fix the s390 case
as well.
v2: add ARM also (Michel)
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3805>
|
|
|
|
|
|
|
|
|
|
| |
To improve the robustness of the code, we want to better
detect issues in testing (using asserts) and use more
secure techniques.
Reviewed-by: Krzysztof Raszkowski <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3710>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3710>
|
|
|
|
|
|
|
|
|
|
|
| |
As requested by Tim.
This was generated with:
grep 'PIPE_ARCH_.*_ENDIAN' -rIl | xargs sed -ie 's@PIPE_ARCH_\(.*\)_ENDIAN@UTIL_ARCH_\1_ENDIAN@'g
v2: - add this patch
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow it to be used as a drop in replacement for
_mesa_little_endian in a number of cases.
v2: - Always define PIPE_ARCH_LITTLE_ENDIAN and PIPE_ARCH_BIG_ENDIAN,
define the one that reflects the host system to 1 and the other to 0
- replace all uses of #ifdef, #ifndef, and #if defined() with #if
and #if ! with PIPE_ARCH_*_ENDIAN
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The coroutine split pass is missing a dependency before LLVM 9.0,
and fails to initialise properly if the CallGraphWrapperPass hasn't
be initialised earlier (x86 does it due to some of it's passes
requiring it).
This is a workaround for llvm 8 (coroutines are only supported in 8
and higher). It adds another pass that has a dependency on the pass
the coroutines split requires. This pass shouldn't have any raal
effects.
Fixes: d32690b43c9 (gallivm: add coroutine pass manager support)
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
To go any further than this would be to break the current version of
Android.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
[ Michel Dänzer: Dropped jessie line from debian-install.sh again ]
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Suggested-by: Michel Dänzer <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
coroutines require a proper pass manager, so add the passes
to the correct places
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
This forces using general sampling and should improve precision and
performance in some cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
builds
For testing it is of interest that all tests of dEQP pass, e.g. to test
virglrenderer on a host only providing software rendering like in a CI.
Hence make it possible to disable certain optimizations that make tests fail.
While we are there also add some documentation to the flags to make it clear
that this is opt-out.
Setting the environment variable "GALLIVM_PERF=no_filter_hacks" can be used to make
the following tests pass in release mode:
dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_*
dEQP-GLES2.functional.texture.mipmap.cube.generate.*
dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_*
dEQP-GLES2.functional.texture.vertex.2d.wrap.*
Related:
https://bugs.freedesktop.org/show_bug.cgi?id=94957
v2: rename optimization disabling flag to 'safemath' and also move the
nopt flag to the perf flags.
v3: rename flag "safemath" to "no_filter_hacks" since safemath is usually
associated with floating point operations (Roland)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that hour...).
Also, print out the function passes for opt which correspond to what
was used for jit compilation (and also the opt level for codegen).
Using opt/llc this way should then pretty much mimic what was done
for jit. (When specifying something like -time-passes
-debug-pass=[Structure|Arguments] (for either opt or llc) that also
gives very useful information in which passes all the time was spent,
and which passes are really run along with the order - llvm will add
passes due to dependencies on its own, and of course -O2 for llc
comes with a ~100 pass list.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Conversion to int can otherwise overflow if compile times are over
~71min. (Yes this can happen...)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to move things when it previously couldn't,
which causes noticeable compile time increases.
There's more loop passes in llvm, but I'm not sure which ones are helpful,
and I couldn't find anything which would roughly do what the old licm in
llvm 3.3 did, so ditch it.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is actually better.
The only downside I've found is this enables the LICM pass to move some
things out of the main shader loop (in the case I've seen, instanced
vertex fetch (which is constant within the jit shader) plus the derived
instructions in the shader) which it couldn't do before for some reason.
This would actually be desirable but can increase compile time
considerably (licm seems to have considerable cost when it actually can
move things out of loops, due to alias analysis). But blaming early cse
for this seems inappropriate. (Note that the first two sroa / earlycse
passes are similar to what a standard llvm opt -O1/-O2 pipeline would
do, albeit this has some more passes even before but I don't think
they'd do much for us.)
It also in particular helps some crazy shader used for driver
verification (don't ask...) a lot (about factor of 6 faster in compile
time) (due to simplfiying the ir before LICM is run).
While here, also move licm behind simplifycfg. For some shaders there
seems to be very significant compile time gains (we've seen a factor
of 10000 albeit that was a really crazy shader you'd certainly never
see in a real app), beause LICM is quite expensive and there's cases
where running simplifycfg (along with sroa and early-cse) before licm
reduces IR complexity significantly. (I'm not entirely sure if it would
make sense to also run it afterwards.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Include llvm-c/Transforms/Utils.h with the newest LLVM 7
Signed-of-by: Mike Lothian <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In gallivm_compile_module, fix a typo in the
debug_printf("Invoke as \"llc ..." message.
Cc: "17.2" <[email protected]>
Signed-off-by: Ben Crocker <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The datalayout for modules was purposely not being set in order to work around
the fact that the ExecutionEngine requires that the module's datalayout
matches the datalayout of the TargetMachine that the ExecutionEngine is
using.
When the pass manager runs on a module with no datalayout, it uses
the default datalayout which is little-endian. This causes problems
on big-endian targets, because some optimizations that are legal on
little-endian or illegal on big-endian.
To resolve this, we set the datalayout prior to running the pass
manager, and then clear it before creating the ExectionEngine.
This patch fixes a lot of piglit tests on big-endian ppc64.
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve debug output from gallivm_compile_module and
lp_build_create_jit_compiler_for_module, printing the
-mcpu and -mattr options passed to LLC.
V2: enclose MAttrs debug_printf block and llc -mcpu debug_printf
in "if (gallivm_debug & <flags>)..."
Signed-off-by: Ben Crocker <[email protected]>
Cc: 12.0 13.0 17.0 <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]> (v2)
[Emil Velikov: rebase]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Split USE_MCJIT macro dual nature into a separate constant time define
and a run-time variable.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Compilation to actual machine code can easily take as much time as the
optimization passes on the IR if not more, so print this out too.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
v2: pblendb -> pblendvb
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
As suggested by Roland Scheidegger.
Use the same logic as f16c, since fma requires VEX encoding.
But disable FMA on LLVM 3.3 without MCJIT.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode.
Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple
modules can be dumped (albeit it might still overwrite previous modules,
particularly the modules from draw tend to always have the same name).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Just keep a copy of the module_name in gallivm.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.
The new option is GALLIVM_MCJIT.
Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows. I'm almost certain this used to
work, so there probably is a regression somewhere.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
For simulating less capable machines.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM removed LLVMAddTargetData for the 3.9 release in r260919. For the two
places in mesa where this is called, only enable the lines when compiling
for less then 3.9.
For the radeon driver, I'm not sure how to check if any other LLVM calls need
to be adjusted. I think since the target data used is extracted from the
LLVMModule, it isn't necessary to pass it back to LLVM again.
The code does compile, and at least for radeonsi does run OpenGL games.
[ Michel Dänzer: Move #if closer to LLVMAddTargetData in lp_bld_init.c,
and add HAVE_LLVM < 0x0309 guards around now unused occurrences of TD
and data_layout ]
Signed-off-by: Matthew Dawson <[email protected]>
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes crashes in llvmpipe with LLVM 3.8 and also some piglit tests
on radeonsi that use the draw module.
This is just a temporary solution. The correct solution will require
creating a TargetMachine during gallivm initialization and pulling the
DataLayout from there. This will be a somewhat invasive change, and it
will need to be validatated on multiple LLVM versions.
https://llvm.org/bugs/show_bug.cgi?id=24172
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are issues with inlining everything, most notably llvm will use much
more memory (and be slower) when compiling. Ideally we'd probably use
functions for shader functions too but texture sampling usually is responsible
for quite some IR (it can easily reach 80% of total IR instructions) so this
seems like a good start.
This still generates a different function for all different combinations just
like before, however it is possible llvm is missing some optimization
opportunities - it is believed though such opportunities should be somewhat
rare, but at least for now it can still be switched off (at compile time only).
It should probably make compiled code also smaller because the same function
should be used for different variants in the same module (so for the
opaque/partial or linear/elts variants).
No piglit change (though it does indeed speed up unrealistic tests like
fp-indirections2 by a factor of 30 or so).
Has a small negative performance impact in openarena - I suspect this could
be fixed by running some IPO passes (despite the private linkage, llvm right
now does NO optimization at all wrt anything going past the call, even if
there's just one caller - so things like values stored before the call and then
always written by the function etc. will not be optimized away, nor will dead
arguments (which we mostly shouldn't have) be eliminated, always constant
arguments promoted etc.).
v2: use proper return values instead of pointer function arguments.
llvm supports aggregate return values, which do wonders here eliminating
unnecessary stack variables - everything in fact will be returned in registers
even without any IPO optimizations. It makes the code simpler too.
With this I could not measure a peformance impact in openarena any longer
(though since there's still no constant value propagation etc. into the tex
functions this does not mean it couldn't have a negative impact elsewhere).
v3: fix some minor issues suggested by Jose, and do disassembly (and the
profiling) without hacks.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JITMemoryManager was removed in LLVM 3.6, and replaced by its base class
RTDyldMemoryManager.
This change fixes our JIT memory managers specializations to derive from
RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
This enables llvmpipe to run with LLVM 3.6.
However, lp_free_generated_code is basically a no-op because there are
not enough hook points in RTDyldMemoryManager to track and free the code
of a module. In other words, with MCJIT, code once created, stays
forever allocated until process destruction. This is not speicfic to
LLVM 3.6 -- it will happen whenever MCJIT is used regardless of version.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'll need to update gallivm for the interface changes in LLVM 3.6, and
the fewer the number of older LLVM versions we support the less hairy that
will be.
As consequence HAVE_AVX define can disappear. (Note HAVE_AVX meant
whether LLVM version supports AVX or not. Runtime support for AVX is
always checked and enforced independently.)
Verified llvmpipe builds and runs with with LLVM 3.3, 3.4, and 3.5.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.
Reviewed-by: Brian Paul <[email protected]>
Tested-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Mathias Froehlich <[email protected]>
|