| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Non-Gallium parts of Mesa require MSVC 2013 which provides these.
|
|
|
|
|
|
|
|
| |
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
tests: fix warnings
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Connor renamed the parameter, inverting the sense.
Update the comment accordingly.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
It's no longer used.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
It's now unused.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.
The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.
v2: rename and flip meaning of flag (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).
shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs: 11928 -> 11670 (-2.16%)
helped: 64
HURT: 0
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Suggested by Jason Ekstrand.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are basically just moves, so they should be safe as well.
When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:
if ssa_21 {
block block_1:
/* preds: block_0 */
vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2
Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass. But with
the vec4 operation, they were not. We want to keep getting selects.
Normal i965 on Broadwell:
instructions in affected programs: 200 -> 176 (-12.00%)
helped: 4
With brw_fs_channel_expressions() disabled:
instructions in affected programs: 1832 -> 1646 (-10.15%)
helped: 30
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
* Changes suggested by mattst88
[[email protected]: Add nir support]
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.
This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.
The crash can be reproduced ~20% of times with the dEQP test:
dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
When we compute the output swizzle we want to consider the number of
components in the add operation. So far we were using the writemask
of the multiplication for this instead, which is not correct.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders in Civilization V and Beyond Earth do
pow(pow(x, 2.2), 0.454545)
which is converting to and from sRGB colorspace.
A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.
instructions in affected programs: 934 -> 886 (-5.14%)
helped: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics. However, this set to 1
by every pass that ever creates a load/store intrinsic. Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1. Let's just delete it.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: Undefine coordinate components not applicable to the target.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: Undefine coordinate components not applicable to the target.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.
I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:
ES3-CTS.shaders.struct.uniform.sampler_array_fragment
v2: remove unnecessary comment (Topi)
simplify changes and the overall code (Jason)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
|
|
|
|
|
|
| |
s/agressive/aggressive/g
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results on Broadwell:
total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
instructions in affected programs: 1330548 -> 1315224 (-1.15%)
helped: 5797
HURT: 76
GAINED: 0
LOST: 8
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously, this case was being handled in match_expression prior to
calling match_value. However, there is really no good reason for this
given that match_value has all of the information it needs. Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit switches us from the current setup of using hash sets for
use/def sets to using linked lists. Doing so should save us quite a bit of
memory because we aren't carrying around 3 hash sets per register and 2 per
SSA value. It should also save us CPU time because adding/removing things
from use/def sets is 4 pointer manipulations instead of a hash lookup.
Running shader-db 50 times with USE_NIR=0, NIR, and NIR + use/def lists:
GLSL IR Only: 586.4 +/- 1.653833
NIR with hash sets: 675.4 +/- 2.502108
NIR + use/def lists: 641.2 +/- 1.557043
I also ran a memory usage experiment with Ken's patch to delete GLSL IR and
keep NIR. This patch cuts an aditional 42.9 MiB of ralloc'd memory over
and above what we gained by deleting the GLSL IR on the same dota trace.
On the code complexity side of things, some things are now much easier and
others are a bit harder. One of the operations we perform constantly in
optimization passes is to replace one source with another. Due to the fact
that an instruction can use the same SSA value multiple times, we had to
iterate through the sources of the instruction and determine if the use we
were replacing was the only one before removing it from the set of uses.
With this patch, uses are per-source not per-instruction so we can just
remove it safely. On the other hand, trying to iterate over all of the
instructions that use a given value is more difficult. Fortunately, the
two places we do that are the ffma peephole where it doesn't matter and GCM
where we already gracefully handle duplicates visits to an instruction.
Another aspect here is that using linked lists in this way can be tricky to
get right. With sets, things were quite forgiving and the worst that
happened if you didn't properly remove a use was that it would get caught
in the validator. With linked lists, it can lead to linked list corruption
which can be harder to track. However, we do just as much validation of
the linked lists as we did of the sets so the validator should still catch
these problems. While working on this series, the vast majority of the
bugs I had to fix were caught by assertions. I don't think the lists are
going to be that much worse than the sets.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
We were rolling our own rewrite_src variant in copy-propagation. Let's
stop doing that and use the ones in core NIR.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The out-of-SSA pass was one of the first passes written when getting SSA
up-and-going (for obvious reasons). As such, it came before a lot of the
nifty SSA-based helpers were introduced. This commit modernizes it so that
we're no longer doing nearly as much manual banging on use/def sets.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
| |
Nothing produces it, and nothing can consume it.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
All paths that produce GLSL IR for NIR lower ir_unop_log. All paths
that consume NIR will explode if they geta nir_op_flog.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Nothing produces it, and nothing can consume it.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
All paths that produce GLSL IR for NIR lower ir_unop_exp. All paths
that consume NIR will explode if they geta nir_op_fexp.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
instructions in affected programs: 380 -> 376 (-1.05%)
helped: 2
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... and (a >= c) || (b >= c) as max(a, b) >= c.
Similar to commit 97e6c1b9.
total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
instructions in affected programs: 6400 -> 6304 (-1.50%)
helped: 68
HURT: 4
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
| |
No changes, but does prevent some regressions in the next commit.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Helps the same set of programs as the previous commit.
instructions in affected programs: 4490 -> 4346 (-3.21%)
helped: 8
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16
because we avoid an integer multiplication.
instructions in affected programs: 2353 -> 2245 (-4.59%)
helped: 4
GAINED: 4
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|