| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Acked-by: Vedran Miletić <[email protected]>
Acked-by: Juha-Pekka Heikkila <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix build with Python < 2.7.
File "src/compiler/nir/nir_builder_opcodes_h.py", line 46, in <module>
from nir_opcodes import opcodes
File "src/compiler/nir/nir_opcodes.py", line 178, in <module>
unop_convert("{}2{}{}".format(src_t[0], dst_t[0], bit_size),
ValueError: zero length field name in format
Fixes: 762a6333f21f ("nir: Rework conversion opcodes")
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs: 13318947 -> 13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
| |
Analogous to earlier commit(s).
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation was fine for GLSL which doesn't really have
a signed modulus/remainder. They just leave the behavior undefined
whenever either source is negative. However, in SPIR-V, there is a
defined behavior for negative arguments. This commit beefs up the pass
so that it handles both correctly. Tested using a hacked up version of
the Vulkan CTS test to get 64-bit support.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The algorithms used by this pass, especially for division, are heavily
based on the work Ian Romanick did for the similar int64 lowering pass
in the GLSL compiler.
v2: Properly handle vectors
v3: Get rid of log2_denom stuff. Since we're using bcsel, we do all the
calculations anyway and this is just extra instructions.
v4:
- Add back in the log2_denom stuff since it's needed for ensuring that
the shifts don't overflow.
- Rework the looping part of the pass to be easier to expand.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Each of the pop functions (and push_else) take a control flow parameter as
their second argument. If NULL, it assumes that the builder is in a block
that's a direct child of the control-flow node you want to pop off the
virtual stack. This is what 90% of consumers will want. The SPIR-V pass,
however, is a bit more "creative" about how it walks the CFG and it needs
to be able to pop multiple levels at a time, hence the argument.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
nir_const_value is not needed in get_iteration
Signed-off-by: Elie Tournier <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Elie Tournier <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
It's a problem waiting to happen. Individual headers should be annotated
if needed.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
This reduces the instruction count in some fp64 and int64 piglit tests
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
There's nothing "double" about it other than, perhaps, the fact that it
packs two 32-bit values.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
NIR is a typeless IR and the two opcodes, when considered bitwise, do
exactly the same thing. There's no reason to have two versions.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
These are enough for the spir-v generator to handle UConvert
and SConvert operations, and fix the 4 tests in CTS.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC warns about different const qualifiers. Add the extra const to
silence it.
nir_phi_builder.c(244) : warning C4090: 'initializing' : different 'const' qualifiers
nir_phi_builder.c(245) : warning C4090: 'initializing' : different 'const' qualifiers
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC warns about implicit conversion as below. Annotate the literal
appropriately to silence the warning.
nir_gather_info.c(249) : warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original number was chosen in an attempt to match the limits applied to
GLSL IR.
A look at the git history of the why these limits were chosen for GLSL IR
shows it was more to do with the slow speed of unrolling large loops in
GLSL IR than anything else. The speed of loop unrolling in NIR is not a
problem so we may wish to bump this even higher in future.
No shader-db change, however a furture change will disbale the GLSL IR
optimisation loop in the i965 backend results in 4 loops from The Talos
Principle failing to unroll. Bumping the limit allows them to unroll which
results in the instruction count matching the previous output from when the
GLSL IR opts were still enabled.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code always compared integers as 64-bit. Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want. Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit. While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.
This gets the UE4 bitfield_extract optimization working again. It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously both sources were unsized. This caused problems when the
thing being shifted was 64-bit but the shift count was 32-bit. The
expectation in NIR is that all unsized sources (and destination) will
ultimately have the same size.
The changes in nir_opt_algebraic.py are to prevent errors like:
Failed to parse transformation:
03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte')
03:12:25 Traceback (most recent call last):
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__
03:12:25 xform = SearchAndReplace(xform)
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__
03:12:25 BitSizeValidator(varset).validate(self.search, self.replace)
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate
03:12:25 validate_dst_class = self._validate_bit_class_up(replace)
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up
03:12:25 src_class = self._validate_bit_class_up(val.sources[i])
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up
03:12:25 assert src_class == src_type_bits
03:12:25 AssertionError
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Connor Abbott <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This change makes me wonder whether double packing should be
reimplemented as int64BitsToDouble(packInt2x32(v)). I'm a little on the
fence since not all platforms that support fp64 natively support int64.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
v3 (idr): Make the "from" type in a cast unsized. This reduces the
number of required cast operations at the expensive slightly more
complex code. However, this will be a dramatic improvement when other
sized integer types are added. Suggested by Connor.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
v2: Rebase on 19a541f (nir: Get rid of nir_constant_data)
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]> [v1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following optimisations:
min(x, -x) = -abs(x)
min(x, -abs(x)) = -abs(x)
min(x, abs(x)) = x
max(x, -abs(x)) = x
max(x, abs(x)) = abs(x)
max(x, -x) = abs(x)
shader-db:
total instructions in shared programs: 13067779 -> 13067775 (-0.00%)
instructions in affected programs: 249 -> 245 (-1.61%)
helped: 4
HURT: 0
total cycles in shared programs: 252054838 -> 252054806 (-0.00%)
cycles in affected programs: 504 -> 472 (-6.35%)
helped: 2
HURT: 0
Signed-off-by: Elie Tournier <[email protected]>
Reviewed-by: Plamena Manolova <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We were including it once per value, so probably around 10k times.
Let's not cause the compiler any more work than we have to.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We were using impl->num_blocks, but that isn't guaranteed to be
up-to-date until after the block_index metadata is required. If we were
unlucky, this could lead to overwriting memory.
Noticed by inspection.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results BDW:
total instructions in shared programs: 13060410 -> 13060313 (-0.00%)
instructions in affected programs: 24533 -> 24436 (-0.40%)
helped: 88
HURT: 0
total cycles in shared programs: 256585692 -> 256586698 (0.00%)
cycles in affected programs: 647290 -> 648296 (0.16%)
helped: 35
HURT: 30
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reported-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug in code motion that occurred when the best block is the
same as the schedule early block. In this case, because we're checking
(lca != def->parent_instr->block) at the top of the loop, we never get to
the check for loop depth so we wouldn't move it out of the loop. This
commit reworks the loop to be a simple for loop up the dominator chain and
we place the (lca != def->parent_instr->block) check at the end of the
loop.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we will end up with an extra instruction to compare the
result of the inot.
On BDW:
total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
instructions in affected programs: 103379 -> 103240 (-0.13%)
helped: 127
HURT: 0
total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
cycles in affected programs: 11324730 -> 11321188 (-0.03%)
helped: 114
HURT: 21
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.
On BDW:
total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2
total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240
LOST: 0
GAINED: 1
The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On BDW:
total instructions in shared programs: 13061890 -> 13061877 (-0.00%)
instructions in affected programs: 2441 -> 2428 (-0.53%)
helped: 13
HURT: 0
total cycles in shared programs: 256612254 -> 256611784 (-0.00%)
cycles in affected programs: 16418 -> 15948 (-2.86%)
helped: 10
HURT: 2
V2: don't use ffma directly
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tries to move comparisons (a common source of boolean values)
closer to their first use. For GPUs which use condition codes,
this can eliminate a lot of temporary booleans and comparisons
which reload the condition code register based on a boolean.
V2: (Timothy Arceri)
- fix move comparision for phis so we dont end up with:
vec1 32 ssa_227 = phi block_34: ssa_1, block_38: ssa_240
vec1 32 ssa_235 = feq ssa_227, ssa_1
vec1 32 ssa_230 = phi block_34: ssa_221, block_38: ssa_235
- add nir_op_i2b/nir_op_f2b to the list of comparisons.
V3: (Timothy Arceri)
- tidy up suggested by Jason.
- add inot/fnot to move comparison list
V4: (Jason Ekstrand)
- clean up move_comparison_source
- get rid of the tuple
- rework phi handling
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This is more correct and should also be a tiny bit faster since we're
just comparing pointers instead of calling nir_src_equal.
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "13.0" <[email protected]>
|