| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Add support for providing an emulation callback function for arch/width
combinations that don't map cleanly to an x86 intrinsic.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Move x86 intrinsic lowering to a separate pass. Builder now instantiates
generic intrinsics for features not supported by llvm. The separate x86
lowering pass is responsible for lowering to valid x86 for the target
SIMD architecture. Currently it's a port of existing code to get it
up and running quickly. Will eventually support optimized x86 for AVX,
AVX2 and AVX512.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed preprocessor defines from structures passed to LLVM jitted code.
The python scripts do not understand the preprocessor defines and ignores
them. So for fields that are compiled out due to a preprocessor define
the LLVM script accounts for them anyway because it doesn't know what
the defines are set to. The sanitize defines for open source are fine
in that they're safely used.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Needed work for jit code debug.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Hook up archrast counters for shader stats: instructions executed.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Removing some code that doesn't seem to do anything meaningful.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Added a SWR_SHADER_STATS structure which is passed to each shader. The
stats pass will instrument the shader to populate this.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
mem[offset] += value
This function will be heavily used by all stats intrinsics.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Fix slow permutes in PA tri lists under SIMD16 emulation on AVX
Added missing permute (interlane, immediate) to SIMDLIB
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Finish up the remaining explicit intrinsic uses. At this point all
explicit Intrinsic::getDeclaration() usage has been replaced with auto
generated macros generated with gen_llvm_ir_macros.py. Going forward,
make sure to only use the intrinsics here, adding new ones as needed.
Next step is to remove all references to x86 intrinsics to keep the
builder target-independent. Any x86 lowering will be handled by a
separate pass.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Replace sqrt, maskload, fp min/max, cttz, ctlz with llvm equivalent.
Replace AVX maskedstore intrinsic with LLVM intrinsic. Add helper llvm
macros for stacksave, stackrestore, popcnt.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Start removing avx2 macros for functionality that exists in llvm.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
for getting masked gather intrinsic (also compatible with LLVM 4)
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add stats for degenerate and backfacing primitive counts
Wire archrast stats for alpha blend and alpha test.
pass value to jitter, upon return have archrast event increment a value
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Help support debug info in 16 wide shaders.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Stuff parameters into a blend context struct before passing down through
the PFN_BLEND_JIT_FUNC function pointer. Needed for stat changes.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Add assert for correct usage of memory accesses
v2: reworded commit message; renamed enum more appropriately
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
VPHADDD, PMAXUD, PMINUD
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
This is for parity with autotools.
Signed-off-by: Jan Alexander Steffens (heftig) <[email protected]>
Acked-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Tested-by: Benedikt Schemmer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(This patch doesn't enable the behavior. It will be enabled in a later
commit.)
Draw calls from multiple IBs can be executed in parallel.
v2: do emit partial flushes on SI
v3: invalidate all shader caches at the beginning of IBs
v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
only do this for flushes invoked internally
v5: empty IBs should wait for idle if the flush requires it
v6: split the commit
If we artificially limit the number of draw calls per IB to 5, we'll get
a lot more IBs, leading to a lot more partial flushes. Let's see how
the removal of partial flushes changes GPU utilization in that scenario:
With partial flushes (time busy):
CP: 99%
SPI: 86%
CB: 73:
Without partial flushes (time busy):
CP: 99%
SPI: 93%
CB: 81%
Tested-by: Benedikt Schemmer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Fixes: 918b798668c "radeonsi: make sure CP DMA is idle at the end of IBs"
|
|
|
|
| |
which also simplifies the build scripts.
|
| |
|
|
|
|
|
|
|
|
|
| |
so that the draw is started as soon as possible.
v2: only prefetch the API VS and VBO descriptors
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
| |
This code was written with the constant engine in mind.
We can simplify it now.
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
it doesn't seem to be needed.
Acked-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Acked-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
just pass the flag that indicates it.
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
and other cleanups
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
HAVE_LLVM > 0 is a tautology.
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
This is the ABI I'm hoping to stabilize for merging the driver. seqnos
are eliminated, which allows for the GPU scheduler to task-switch between
DRM fds even after submission to the kernel. In/out sync objects are
introduced, to allow the Android fencing extension (not yet implemented,
but should be trivial), and to also allow the driver to tell the kernel to
not start a bin until a previous render is complete.
|
|
|
|
|
| |
With the DRM scheduler changes, I'm about to remove all seqnos from the
UABI.
|
|
|
|
|
| |
Since I'll be using the DRM scheduler, we won't run into the problem of a
runaway client starving other clients of GPU time.
|
|
|
|
|
|
|
| |
This should avoid mistakes with not flushing as we change the series of
loads. Already, it fixes a hopefully unreachable case where we were
emitting just the TILE_COORDINATES and not the dummy store that needs to
go with it.
|
|
|
|
|
| |
This is a more obvious name for what the variable means, and matches what
it's called for stores.
|
|
|
|
|
| |
Since I just fixed a bug due to forgetting to do these right, do it once
in the helper func.
|
|
|
|
|
| |
Fixes a simulator assertion failure in
KHR-GLES3.packed_depth_stencil.blit.depth32f_stencil8
|
|
|
|
|
|
|
| |
This was dying in the simulator on
GTF-GLES3.gtf.GL3Tests.packed_depth_stencil.packed_depth_stencil_blit.
We'll need to do basically the same thing as Z32F/S8 does in the MSAA
Z24S8 case.
|
|
|
|
|
|
|
| |
The v3dX(get_internal_type_bpp_for_output_format)() call only handles
color output formats (which overlap in enum numbers with depth output
formats), so for depth we just need to take the normal cpp times the
number of samples.
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Acked-by: Christian König <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Acked-by: Christian König <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Acked-by: Christian König <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Acked-by: Christian König <[email protected]>
|