| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike the current CSE pass, global value numbering is capable of detecting
common values even if one does not dominate the other. For instance, in
you have
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
ssa_2 = ssa_0 + 7;
/* use ssa_2 */
}
Global value numbering doesn't care about dominance relationships so it
figures out that ssa_1 and ssa_2 are the same and converts this to
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
/* use ssa_1 */
}
Obviously, we just broke SSA form which is bad. Global code motion,
however, will repair this for us by turning this into
ssa_1 = ssa_0 + 7;
if (...) {
/* use ssa_1 */
} else {
/* use ssa_1 */
}
This intended to eventually mostly replace CSE. However, conventional CSE
may still be useful because it's less of a scorched-earth approach and
doesn't require GCM. This makes it a bit more appropriate for use as a
clean-up in a late optimization run.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Found by inspection. Untested beyond compilation. This also matches the
logic used in nir_lower_alu_to_scalar.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
In aad4f1550, we removed the concept of "fake" edges from NIR. Now, if you
have a block at the end of an infinite loop it really has no predecessors.
This updates the unit tests to match.
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97587
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
This was let over from aad4f15506c
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
I accidentally added these with 0dc4cab. Oops!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 144cbf8 ("nir: Make nir_opt_remove_phis see through moves."), Ken
made nir_opt_remove_phis able to coalesce phi nodes whose sources are
all moves with the same swizzle. However, he didn't add the logic
necessary for handling the fact that the phi may now have multiple
different sources, even though the sources point to the same thing. For
example, if we had something like:
if (...)
a1 = b.yx;
else
a2 = b.yx;
a = phi(a1, a2)
... = a
then we would rewrite it to
if (...)
a1 = b.yx;
else
a2 = b.yx;
... = a1
by picking a random phi source, which in this case is invalid because
the source doesn't dominate the phi. Instead, we need to change it to:
if (...)
a1 = b.yx;
else
a2 = b.yx;
a3 = b.yx;
... = a3;
Fixes 12 CTS tests:
ES31-CTS.functional.tessellation.invariance.outer_edge_symmetry.quads*
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
And re-implement nir_after_cf_node_and_phis() using it.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When NIR was first introduced, Connor added this fake-edge hack to work
around issues related to unreachable blocks. Thanks to GLSL IR's jump
lowering code, the only unreachable code you can have is a block after an
infinite loop. With SPIR-V, we didn't have the jump lowering code so we
could also end up with the "if (...) { break; } else { continue; }" case
which generates an unreachable block after the if. Because of this, most
of NIR had to be fixed up for handling unreachable blocks. The only
remaining case of not handling unreachable blocks was specifically the
block-after-infinite-loop case in dead_cf which was fixed by the previous
commit. We can now delete the fake edge hack.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
vc4 is about to start using the shader info field to set up discard
handling.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jason suggested adding an assert(function->impl) here. All callers
of this function actually want ->impl, so I decided just to change
the API.
We also change the nir_lower_io_to_temporaries API here. All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally. Folding this change in
avoids the need to change it and change it back.
v2: Fix one call I missed in ir3_compiler (caught by Eric).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the pass internals to work with a nir_function_impl
directly rather than a nir_function. The next patch will change
the API.
v2: Rebase after framebuffer fetch landed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This requires emitting a series of copies at the top of the program
from each output variable to the corresponding temporary. The initial
copy can be skipped for non-framebuffer fetch outputs whose initial
value is undefined, and the final copy needs to be skipped for
read-only outputs (i.e. gl_LastFragData), since it would be illegal to
emit a store output intrinsic for it.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The NIR representation of framebuffer fetch is the same as the GLSL
IR's until interface variables are lowered away, at which point it
will be translated to load output intrinsics. The GLSL-to-NIR pass
just needs to copy the bits over to the NIR program.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some programs, we can have very deep dominance trees and the recursion
can cause us to risk stack overflows. Instead, we replace the recursion
with a pair of loops, one at the start and one at the end. This is
functionally equivalent to what we had before and it's actually a bit
easier to read in the new form without the recursion.
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit rename_variables_block() is recursively called,
performing a depth-first traversal of the control flow graph. The
function uses a non-trivial amount of stack space for local variables,
which puts us in danger of smashing the stack, given a sufficiently deep
dominance tree.
XCOM: Enemy Within contains a shader with such a dominance tree (1574
nir_blocks in total, depth of at least 143).
Jason tells me that he believes that any walk over the nir_blocks that
respects dominance is sufficient (a DFS might have been necessary prior
to the introduction of nir_phi_builder).
In fact, the introduction of nir_phi_builder made the problem worse:
rename_variables_block(), walks to the bottom of the dominance tree
before calling nir_phi_builder_value_get_block_def() which walks back to
the top of the dominance tree...
In any case, this patch ensures we avoid that problem as well.
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Without this the following line will segfault and we don't get to
see the results of the validate_assert() above.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Generally you'd see the gl_Color reference first and get some cursor set.
However, in piglit draw-pixel-with-texture we're now seeing the TexCoord
dereferenced first.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I
was calling the "state uniforms" that I was putting into the NIR fighting
with its other lowering passes. Instead of using magic uniform base
numbers in the backend, follow the lead of load_user_clip_plane and just
define system values for them.
v2: Fix unintended change to channel_num, drop unspecified const_index
value on blend_const_color_r_float.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
vc4 wants to have per-scalar IO load/stores so that dead code elimination
can happen on a more granular basis, which it has been doing in the
backend using a multiplication by 4 of the intrinsic's driver_location.
We can represent it properly in the NIR using the first_component field,
though.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The previous nir_load_system_value(b, nir_intrinsic_load_whatever), 0) was
rather verbose, when system values should be easy to generate.
The index is left out because only one system value had an index included
in it.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
I wanted to include this from nir_builder as well, so it also needed the
undefs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
GLSL-to-NIR generates system value usage, and vc4/freedreno would both
like the system value instead of the varying, so switch this pass over to
it.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to Connor, it's safe to assume that the first operand of
bcsel, as well as the operand of b2f and b2i, must be well formed
booleans.
https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html
With the previous improvements to a@bool handling, this now has no
change in shader-db instruction counts on Broadwell.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
load_front_face and load_helper_invocation produce booleans.
On Broadwell:
total instructions in shared programs: 11638956 -> 11638011 (-0.01%)
instructions in affected programs: 115093 -> 114148 (-0.82%)
helped: 628
HURT: 14
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave
differently. Having the logic spread out over three functions makes it
harder to decide where to put new logic, as well.
So, combine them all. It's a bit simpler because there's now only one
recursive function rather than a pair of mutually recursive functions.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, 'a@type' can only match if 'a' is produced by an ALU
instruction. This is rather limited - there are other cases we
can easily detect which we should handle.
Extending the code in-place would be fairly messy, so we introduce
a new src_is_type() helper.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The first simply picks the bany_inequal[234] opcodes based on the SSA
def's number of components. The latter implicitly compares with zero
to achieve the same semantics of GLSL's any().
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders include code that looks like:
uniform int i;
uniform vec4 bones[...];
foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);
CSE would do some work on this:
x = i * 3
foo(bones[x], bones[x + 1], bones[x + 2]);
The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like
x = i * 3
foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);
Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.
x = i * 48;
foo(bones[x], bones[x + 16], bones[x + 32]);
So, ~6 instructions becomes ~3.
Some individual shader-db results look pretty bad. However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.
G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0
total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317
Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307
Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261
Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5
total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415
Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5
total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would not print a swizzle on ssa_52 when only its .x
component is used (as seen in the definition of ssa_53):
vec3 ssa_52 = fadd ssa_51, ssa_51
vec1 ssa_53 = flog2 ssa_52
vec1 ssa_54 = flog2 ssa_52.y
vec1 ssa_55 = flog2 ssa_52.z
But this makes the interpretation of the RHS of the definition difficult
to understand and dependent on the size of the LHS. Just print swizzles
when they are not the identity swizzle, so the previous example is now
printed as:
vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz
vec1 ssa_53 = flog2 ssa_52.x
vec1 ssa_54 = flog2 ssa_52.y
vec1 ssa_55 = flog2 ssa_52.z
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found a shader in Tales of Maj'Eyal that contains:
if ssa_21 {
block block_1:
/* preds: block_0 */
...instructions that prevent the select peephole...
vec1 32 ssa_23 = imov ssa_4
vec1 32 ssa_24 = imov ssa_4.y
vec1 32 ssa_25 = imov ssa_4.z
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
vec1 32 ssa_26 = imov ssa_4
vec1 32 ssa_27 = imov ssa_4.y
vec1 32 ssa_28 = imov ssa_4.z
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26
vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27
vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28
Here, copy propagation will bail because phis cannot perform swizzles,
and CSE won't do anything because there is no dominance relationship
between the imovs. By making nir_opt_remove_phis handle identical moves,
we can eliminate the phis and rewrite everything to use ssa_4 directly,
so all the moves become dead and get eliminated.
I don't think we need to check "exact" - just the alu sources.
Presumably phi sources should match in their exactness.
On Broadwell:
total instructions in shared programs: 11639872 -> 11638535 (-0.01%)
instructions in affected programs: 134222 -> 132885 (-1.00%)
helped: 338
HURT: 0
v2: Fix return value to be NULL, not false (caught by Iago).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Broadwell:
total instructions in shared programs: 11640214 -> 11639872 (-0.00%)
instructions in affected programs: 17744 -> 17402 (-1.93%)
helped: 78
HURT: 0
total spills in shared programs: 2924 -> 2922 (-0.07%)
spills in affected programs: 104 -> 102 (-1.92%)
helped: 1
HURT: 0
total fills in shared programs: 4394 -> 4389 (-0.11%)
fills in affected programs: 237 -> 232 (-2.11%)
helped: 1
HURT: 0
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nir_opt_peephole_select has the job of removing IF statements with no side
effects. However, if the IF statement's successor didn't have any
instructions in it, we were skipping it, which occurred in mupen64 on vc4
with glsl_to_nir enabled:
instructions in affected programs: 6134 -> 4120 (-32.83%)
total uniforms in shared programs: 38268 -> 38219 (-0.13%)
No changes on Haswell shader-db.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Looks like a copy and paste error from f752effa087
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
"flat centroid" and "flat sample" both just mean "flat", so we should
ignore interpolateAtCentroid/Sample and just return the flat value.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
On i965, we can't support coordinate offsets for texelFetch or rectangle
textures. Previously, we were doing this with a GLSL pass but we need to
do it in NIR if we want those workarounds for SPIR-V.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 52e75dcb8c04c0dde989970c4c587cbe8313f7cf made nir_lower_io
start using nir_intrinsic_set_base instead of writing const_index[0]
directly. However, those intrinsics apparently don't /have/ a base,
so this caused assert failures.
However, the old code was happily setting non-existent const_index
fields, so it was pretty bogus too.
Jason pointed out that load_shared and store_shared have a base,
and that the i965 driver uses that field. So presumably atomics
should have one as well, so that loads/stores/atomics all refer
to variables with consistent addressing.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes sure we give the correct driver location
for doubles when using component packing. Specifically
it handles packing a dvec3 with a double which is the
only packing scenario allowed which spans across two
locations.
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now nir_lower_io can optionally produce load_interpolated_input
and load_barycentric_* intrinsics for fragment shader inputs.
flat inputs continue using regular load_input.
v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean
passing (in response to review feedback from Chris Forbes).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backends can normally handle shader inputs solely by looking at
load_input intrinsics, and ignore the nir_variables in nir->inputs.
One exception is fragment shader inputs. load_input doesn't capture
the necessary interpolation information - flat, smooth, noperspective
mode, and centroid, sample, or pixel for the location. This means
that backends have to interpolate based on the nir_variables, then
associate those with the load_input intrinsics (say, by storing a
map of which variables are at which locations).
With GL_ARB_enhanced_layouts, we're going to have multiple varyings
packed into a single vec4 location. The intrinsics make this easy:
simply load N components from location <loc, component>. However,
working with variables and correlating the two is very awkward; we'd
much rather have intrinsics capture all the necessary information.
Fragment shader input interpolation typically works by producing a
set of barycentric coordinates, then using those to do a linear
interpolation between the values at the triangle's corners.
We represent this by introducing five new load_barycentric_* intrinsics:
- load_barycentric_pixel (ordinary variable)
- load_barycentric_centroid (centroid qualified variable)
- load_barycentric_sample (sample qualified variable)
- load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample())
- load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset())
Each of these take the interpolation mode (smooth or noperspective only)
as a const_index, and produce a vec2. The last two also take a sample
or offset source.
We then introduce a new load_interpolated_input intrinsic, which
is like a normal load_input intrinsic, but with an additional
barycentric coordinate source.
The intention is that flat inputs will still use regular load_input
intrinsics. This makes them distinguishable from normal inputs that
need fancy interpolation, while also providing all the necessary data.
This nicely unifies regular inputs and interpolateAt functions.
Qualifiers and variables become irrelevant; there are just
load_barycentric intrinsics that determine the interpolation.
v2: Document the interp_mode const_index value, define a new
BARYCENTRIC() helper rather than using SYSTEM_VALUE() for
some of them (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|