| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Move x86 intrinsic lowering to a separate pass. Builder now instantiates
generic intrinsics for features not supported by llvm. The separate x86
lowering pass is responsible for lowering to valid x86 for the target
SIMD architecture. Currently it's a port of existing code to get it
up and running quickly. Will eventually support optimized x86 for AVX,
AVX2 and AVX512.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed preprocessor defines from structures passed to LLVM jitted code.
The python scripts do not understand the preprocessor defines and ignores
them. So for fields that are compiled out due to a preprocessor define
the LLVM script accounts for them anyway because it doesn't know what
the defines are set to. The sanitize defines for open source are fine
in that they're safely used.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Needed work for jit code debug.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Hook up archrast counters for shader stats: instructions executed.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Removing some code that doesn't seem to do anything meaningful.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Added a SWR_SHADER_STATS structure which is passed to each shader. The
stats pass will instrument the shader to populate this.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
mem[offset] += value
This function will be heavily used by all stats intrinsics.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Fix slow permutes in PA tri lists under SIMD16 emulation on AVX
Added missing permute (interlane, immediate) to SIMDLIB
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Finish up the remaining explicit intrinsic uses. At this point all
explicit Intrinsic::getDeclaration() usage has been replaced with auto
generated macros generated with gen_llvm_ir_macros.py. Going forward,
make sure to only use the intrinsics here, adding new ones as needed.
Next step is to remove all references to x86 intrinsics to keep the
builder target-independent. Any x86 lowering will be handled by a
separate pass.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Replace sqrt, maskload, fp min/max, cttz, ctlz with llvm equivalent.
Replace AVX maskedstore intrinsic with LLVM intrinsic. Add helper llvm
macros for stacksave, stackrestore, popcnt.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Start removing avx2 macros for functionality that exists in llvm.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
for getting masked gather intrinsic (also compatible with LLVM 4)
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add stats for degenerate and backfacing primitive counts
Wire archrast stats for alpha blend and alpha test.
pass value to jitter, upon return have archrast event increment a value
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Help support debug info in 16 wide shaders.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Stuff parameters into a blend context struct before passing down through
the PFN_BLEND_JIT_FUNC function pointer. Needed for stat changes.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Add assert for correct usage of memory accesses
v2: reworded commit message; renamed enum more appropriately
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
VPHADDD, PMAXUD, PMINUD
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
(cherry picked from commit a1c421c638fd9ff2810b2a59f1ccd0a3a03657b1)
|
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
(cherry picked from commit 8bd719e3faee8cb0054f51cf1fe9d372a9eea0ea)
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
(cherry picked from commit cf0864dc63caf1285bdede364e9a39b22bac5938)
|
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
(cherry picked from commit 6d88ea9dd46e630ee861e773dfe4a49f5d1c1fbd)
|
|
|
|
|
|
|
|
| |
This reverts commit 6217eedc9bac86856d5048c43b5f5a3f6976c13e.
I was using this for testing and accidentally put it on master
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit 21e2e73f71096fd4607051c060cf82c593663d50.
I was using this for testing and accidentally put it on master
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is for parity with autotools. It names the library
libMesaOpenCL.so.1.0.0 and points mesa.icd to the .1 symlink.
opencl_version now matches configure.ac's OPENCL_VERSION.
Signed-off-by: Jan Alexander Steffens (heftig) <[email protected]>
Tested-By: Aaron Watry <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
This is for parity with autotools.
Signed-off-by: Jan Alexander Steffens (heftig) <[email protected]>
Acked-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Currently this requires libdrm from git, since the version reported by
meson is wrong.
|
|
|
|
| |
For meson wraps.
|
|
|
|
|
|
|
|
|
|
|
| |
'scale[i]' can be non-integer.
Original patch by Philip Rebohle.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106074
Fixes: 0f3de89a56a ("radv: Use the guard band.")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Niuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SPIR-V spec doesn’t specify a size requirement for these and the
equivalent functions in the GLSL spec have explicit alternatives for
doubles. Refract is a little bit more complicated due to the fact that
the final argument is always supposed to be a scalar 32- or 16- bit
float regardless of the other operands. However in practice it seems
there is a bug in glslang that makes it convert the argument to 64-bit
if you actually try to pass it a 32-bit value while the other
arguments are 64-bit. This adds an optional conversion of the final
argument in order to support any type.
These have been tested against the automatically generated tests of
glsl-4.00/execution/built-in-functions using the ARB_gl_spirv branch
which tests it with quite a large range of combinations.
The issue with glslang has been filed here:
https://github.com/KhronosGroup/glslang/issues/1279
v2: Convert the eta operand of Refract from any size in order to make
it eventually cope with 16-bit floats.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only change neccessary is to change the type of the constant used
to compare against.
This has been tested against the arb_gpu_shader_fp64/execution/
fs-isinf-dvec tests using the ARB_gl_spirv branch.
v2: Use nir_imm_floatN_t for the constant.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There is an existing macro that is used to choose between either a
float or a double immediate constant based on the bit size of the
first operand to the builtin. This is now changed to use the new
nir_imm_floatN_t helper function to reduce the number of places that
make this decision.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This lets you easily build float immediates just given the bit size.
If we have this single place here to handle this then it will be
easier to add support for 16-bit floats later.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise we create unused conditional return flags and things
get unnecessarily ugly fast when lowering nested functions.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
The extra params we unused by the drivers that used DrawBuffers.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes warnings like this:
[184/1137] Compiling C++ object 'src/compiler/glsl/glsl@sta/lower_jumps.cpp.o'.
In file included from ../src/mesa/main/mtypes.h:48,
from ../src/compiler/glsl_types.h:149,
from ../src/compiler/glsl/lower_jumps.cpp:59:
../src/compiler/glsl/lower_jumps.cpp: In member function '{anonymous}::block_record {anonymous}::ir_lower_jumps_visitor::visit_block(exec_list*)':
../src/compiler/glsl/list.h:650:17: warning: unnecessary parentheses in declaration of 'node' [-Wparentheses]
for (__type *(__inst) = (__type *)(__list)->head_sentinel.next; \
^
../src/compiler/glsl/lower_jumps.cpp:510:7: note: in expansion of macro 'foreach_in_list'
foreach_in_list(ir_instruction, node, list) {
^~~~~~~~~~~~~~~
Signed-off-by: Marc Dietrich <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A couple spots were missed for handling of the new INT8/UINT8 base type.
Also de-duplicate get_base_type().. get_scalar_type() had nearly the
same switch statement, with the exception that anything with base_type
that was not scalar would return error_type. So just handle that one
special case in get_scalar_type().
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Tested-by: Benedikt Schemmer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
There is a kernel patch that adds the new flag.
Reviewed-by: Samuel Pitoiset <[email protected]>
Tested-by: Benedikt Schemmer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(This patch doesn't enable the behavior. It will be enabled in a later
commit.)
Draw calls from multiple IBs can be executed in parallel.
v2: do emit partial flushes on SI
v3: invalidate all shader caches at the beginning of IBs
v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
only do this for flushes invoked internally
v5: empty IBs should wait for idle if the flush requires it
v6: split the commit
If we artificially limit the number of draw calls per IB to 5, we'll get
a lot more IBs, leading to a lot more partial flushes. Let's see how
the removal of partial flushes changes GPU utilization in that scenario:
With partial flushes (time busy):
CP: 99%
SPI: 86%
CB: 73:
Without partial flushes (time busy):
CP: 99%
SPI: 93%
CB: 81%
Tested-by: Benedikt Schemmer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
ir_binop_gequal needs to be converted to nir_op_sge when native integers
are not supported in the driver.
Otherwise it becomes no different than ir_binop_less after the
conversion.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
To handle the source color image transitions in the same place.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Niuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
That looks useless, and I think radv_handle_image_transition()
will do a fast-clear eliminate because it's called after the
resolve.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Niuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
DCC implies a fast-clear eliminate, so I think this sounds
reasonable.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Niuwenhuizen <[email protected]>
|