| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we walked through a given deref_node's copies and, after
lowering the copy away, removed it from both the source and destination
copy sets. This commit changes this to only remove it from the other
node's copy set (not the one we're lowering). At the end of the loop, we
just throw away the copy set for the node we're lowering since that node no
longer has any copies. This has two advantages:
1) It's more efficient because we're doing potentially half as many set
search operations.
2) It now properly handles copies from a node to itself. Perviously, it
would delete the copy from the set when processing the destinatioon and
then assert-fail when we couldn't find it for the source.
Cc: "11.0" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
(cherry picked from commit 226ba889a0f820b9f4b1132e379620d2688c96e7)
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Cc: [email protected]
(cherry picked from commit 59bbe2681b73c3795b7298e2486d5fde7c464ed5)
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Cc: [email protected]
(cherry picked from commit bc3942e2970c60a816cf954b1fa4d416d0852bd9)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 9f5e7ae9d83ce6de761936b95cd0b7ba4c1219c4)
[Emil Velikov] Correctly derive nir_shader from vec_to_movs_state
Signed-off-by: Emil Velikov <[email protected]>
Conflicts:
src/glsl/nir/nir.h
src/glsl/nir/nir_lower_vec_to_movs.c
|
|
|
|
|
|
|
|
|
| |
Previously, we were passing the shader around, we were just calling it
"mem_ctx". However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.
Reviewed-by: Eduardo Lima Mitev <[email protected]>
(cherry picked from commit b7eeced3c724bf5de05290551ced8621ce2c7c52)
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Cc: [email protected]
(cherry picked from commit 0f037bd71ffe083c05cd0867ef54bce91ff84243)
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Cc: [email protected]
(cherry picked from commit 8bb44510fca5315bbdd61502c72c22c7198c0daf)
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit dc18b9357b553a972ea439facfbc55e376f1179f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader. In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.
We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.
Cc: "11.0" <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
(cherry picked from commit 8c8fc5f8336c8c79e5890265ae6c03271aa94075)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2, review from Francisco Jerez:
- make the destination variable as large as what the nir instrinsic
defines (4) instead of the size of the return variable of glsl. This
is still safe for the already existing code because all the intrinsics
affected returned the same amount of components as expected by glsl IR.
In the case of image_size, it is not possible to do so because the
returned number of component depends on the image type and this case
is not well handled by nir.
v3:
- Style fix
Signed-off-by: Martin Peres <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
Much more readable.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Makes the function a bit smaller.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db
Signed-off-by: Thomas Helland <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.
Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs: 15785 -> 15585 (-1.27%)
helped: 36
HURT: 0
GAINED: 0
LOST: 0
Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26
Make that:
vec1 ssa_27 = fne ssa_21, 0.0f
This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8
Signed-off-by: Thomas Helland <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
NIR instruction count results on i965:
total instructions in shared programs: 1261954 -> 1261937 (-0.00%)
instructions in affected programs: 455 -> 438 (-3.74%)
One in yofrankie, two in tropics. Apparently i965 had also optimized all
of these out anyway.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
There are so many flags in textures, that the CSE pass would have a hard
time referencing the correct set when figuring out if two texture ops are
the same. By zeroing, we can avoid that fragility.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Avoids regressions in vc4 when trying to do our blending in NIR.
v2: Add the other unpack ops I meant to when writing the original commit
message.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening. vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.
total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs: 37 -> 23 (-37.84%)
v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
over to src[0], too (by anholt).
Reviewed-by: Thomas Helland <[email protected]> (v2)
Tested-by: Thomas Helland <[email protected]> (v2)
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
This is useful to increase the CSE opportunities for a scalar backend. It
avoids regressions when dropping vc4's custom CSE implementation.
v2: Cleanups by Matt (decl in the for loop, and unreachable()).
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
I lazily generated some of these in VC4 NIR lowering.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
This reverts commit ab5b7a0fe659ff6f9c1885d5cb047b6531959506. We use more
than one bit of value in tgsi_to_nir.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid copying an overwritten swizzle, use the original values.
Example:
Former swizzle[] = xyzw
src->swizzle[] = zyxx
The expected output swizzle = zyxx but if we reuse swizzle in the loop,
then output swizzle would be zyzz.
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.
This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The only values allowed are 0 and 1, and the value is checked before
assigning.
This is a copy of 8eeca7a56c that seems to have been made to the glsl
ir type after it was copied for use in nir but before nir landed.
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Non-Gallium parts of Mesa require MSVC 2013 which provides these.
|
|
|
|
|
|
|
|
| |
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
tests: fix warnings
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Connor renamed the parameter, inverting the sense.
Update the comment accordingly.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
It's no longer used.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
It's now unused.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.
The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.
v2: rename and flip meaning of flag (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).
shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs: 11928 -> 11670 (-2.16%)
helped: 64
HURT: 0
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Suggested by Jason Ekstrand.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are basically just moves, so they should be safe as well.
When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:
if ssa_21 {
block block_1:
/* preds: block_0 */
vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2
Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass. But with
the vec4 operation, they were not. We want to keep getting selects.
Normal i965 on Broadwell:
instructions in affected programs: 200 -> 176 (-12.00%)
helped: 4
With brw_fs_channel_expressions() disabled:
instructions in affected programs: 1832 -> 1646 (-10.15%)
helped: 30
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
* Changes suggested by mattst88
[[email protected]: Add nir support]
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.
This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.
The crash can be reproduced ~20% of times with the dEQP test:
dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
When we compute the output swizzle we want to consider the number of
components in the add operation. So far we were using the writemask
of the multiplication for this instead, which is not correct.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders in Civilization V and Beyond Earth do
pow(pow(x, 2.2), 0.454545)
which is converting to and from sRGB colorspace.
A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.
instructions in affected programs: 934 -> 886 (-5.14%)
helped: 16
|