| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This takes any color or backcolor that has unspecified
shading and converts it to flat shading.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
anywhere else
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.
This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This pass expects the shader to be in LCSSA form.
The algorithm is based on 'The Simple Divergence Analysis' from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira.
Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenGL and OpenGL ES specs require that implementations clamp the
value of gl_PointSize to an implementation-depedent range. This pass
is useful for any GPU hardware that doesn't do this automatically
for either one or both sides of the range, such as V3D.
v2:
- Turn into a generic NIR pass (Eric).
- Make the pass work before lower I/O so we can use the deref variable
to inspect if we are writing to gl_PointSize (Eric).
- Make the pass take the range to clamp as parameter and allow it
to clamp to both sides of the range or just one side.
- Make the pass report progress.
v3:
- Fix copyright header (Eric)
- use fmin/fmax instead of bcsel to clamp (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).
v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses
Signed-off-by: Rhys Perry <[email protected]>
Co-authored-by: Daniel Schürmann <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most integer operations are omitted because dealing with integer
overflow is hard. There are a few things that could be smarter if there
was a small amount more tracking of ranges of integer types (i.e.,
operands are Boolean, operand values fit in 16 bits, etc.).
The changes to nir_search_helpers.h are included in this patch to
simplify reordering the changes to nir_opt_algebraic.py.
v2: Memoize range analysis results. Without this, some shaders appear
to get stuck in infinite loops.
v3: Rebase on many months of Mesa changes, including 1-bit Boolean
changes.
v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction".
v5: Use nir_alu_srcs_equal for detecting (a*a). Previously just the SSA
value was compared, and this incorrectly matched (a.x*a.y).
v6: Many code improvements including (but not limited to) better names,
more comments, and better use of helper functions. All suggested by
Caio. Rework the handling of several opcodes to use a table for mapping
source ranges to a result range. This change fixed a bug that caused
fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero.
Slightly tighten the range of fmul by recognizing that x*x is gt_zero if
x is gt_zero. Add similar handling for -x*x.
v7: Use _______ in the tables as an alias for unknown. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.
Next commit will remove the ones that were only added for that reason.
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
This version has less ops for the same precision.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Each tests has a comment with the expected before and after NIR. The
tests don't actually check this. The tests only check whether or not
the optimization pass reported progress. I couldn't think of a robust,
future-proof way to check the before and after code.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:
1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.
The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.
[Alyssa: Rebase on latest master, updating with respect to typeless
moves]
Acked-by: Alyssa Rosenzweig <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag. This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction. This can save register pressure by
not needing to store the Boolean value.
There are several possible ares for future improvement to this pass:
1. Be more conservative. If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare. The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.
2. Be less conservative. Currently the pass requires that all users of
the comparison match the pattern. The idea is that after the pass is
complete, no instruction will use the resulting Boolean value. The only
uses will be of the flag value. It may be beneficial to relax this
requirement in some cases.
3. Be less conservative. Also try to rematerialize comparisons used for
discard_if intrinsics. After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already. The changes were pretty small. Instructions were helped in 19
shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.
4. Copy the preceeding ALU instruction. If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction. On Intel GPUs, this may enable
cmod propagation to make additional progress.
v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648
It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Tested-by: Qiang Yu <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).
v2: Document the parameters to nir_lower_flrp. Rebase on top of
3766334923e ("compiler/nir: add lowering for 16-bit flrp")
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.
It uses newly introduced pass to gather SSA types and should be
used as late as possible.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl. It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float. Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information. However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some hardware (e.g. Mali400) the shader needs to apply some
transformations for correct gl_FragCoord handling. The lowering
actions look like the following in pseudocode:
gl_FragCoord.xyz = gl_FragCoord_orig.xyz
gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w
Add this lowering as a nir pass in preparation for using it in the driver.
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.
v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!
v3: Bikeshed continues.
v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.
Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Suggested-by: Qiang Yu <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass attempts to dectect code sequences like
if (x < y) {
z = y - x;
...
}
and replace them with sequences like
t = x - y;
if (t < 0) {
z = -t;
...
}
On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction. It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.
Currently only floating point compares and adds are supported. Adding
support for integer will be a challenge due to integer overflow. There
are a couple possible solutions, but they may not apply to all
architectures.
v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block(). Add
missing (-A + B) case. Suggested by Caio.
v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32. Suggested by Ken.
v4: Add some comments explaining how this works. Suggested by Ken.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: Rebase on 1-bit Boolean changes.
Reviewed-by: Thomas Helland <[email protected]> [v1]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.
While we are at it, factorize a little bit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This pass was originally written for lowering TCS output reads and
writes but it is also applicable just about anything including UBOs,
SSBOs, and shared variables.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: (all from Jason)
Reuse existing function for the end of the block combinations.
Check the SSA values are coming from the right place in tests.
Document the case when the store to array_deref is reused.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This pass tries to turn scalar and array-of-scalar IO variables into
vector IO variables whenever possible.
Reviewed-by: Connor Abbott <[email protected]>
Cc: "19.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.
Anyway, this is better than nothing and covers the most common builtins.
v2: add hadd proof from Jason
move some of the lowering into opt_algebraic and create new nir opcodes
simplify nextafter lowering
fix normalize lowering for inf
rework upsample to use nir_pack_bits
add missing files to build systems
v3: split lines of iadd/sub_sat expressions
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering.
ior: fmax instead of fadd allows removing the fsat.
inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better
with the scalar instruction set.
Reviewed-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
|
|
| |
We also enable it in all of the NIR drivers.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick [email protected]
|
|
|
|
|
|
|
|
| |
Just to make it easier to run a nir tests together.
Fixes: a0ae12ca91a45f81897e774019cde9bd081f03a0
("nir/algebraic: Add unit tests for bitsize validation")
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The non-failure path can be tested by just compiling mesa and then
testing it, but the failure paths won't be hit unless you make a mistake,
so it's best to test them with some unit tests.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.
To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).
Tested-By: Gert Wollny <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
This is different from the GL_ARB_spirv pass because it generates a much
simpler data structure that isn't tied to OpenGL and mtypes.h.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing this as part of the existing copy_prop_vars pass.
Separation makes easier to expand the scope of both passes to be more
than per-block. For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").
Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.
v2: Remove entries from dynarray instead of marking items as
deleted. Use foreach_reverse. (Caio)
(all from Jason)
Do not cache nir_deref_path. Not worthy for this patch.
Clear unused writes when hitting a call instruction.
Clean up enumeration of modes for barriers.
Move metadata calls to the inner function.
v3: For copies, use the vector length to calculate the mask.
(all from Jason)
Use nir_component_mask_t when applicable.
Rename functions for clarity.
Consider local vars used by a call to be conservative (SPIR-V has
such cases).
Comment and assert the assumption that stores and copies are
always to a deref that ends with a vector or scalar.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add basic helpers for doing tests on the vars related optimization
passes. The main goal is to lower the barrier to create tests during
development and debugging of the passes. Full coverage is not a
requirement.
v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason)
Move nir_imm_ivec2() to nir_builder.h. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array. The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore. It can always be improved later if needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split". This
pass exists to help other passes more easily see through structure
variables. If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.
Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.
Signed-off-by: Mathieu Bridon <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: d1992255bb29054fa51763376d125183a9f602f3
("meson: Add build Intel "anv" vulkan driver")
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|