| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
|
|
|
|
|
|
|
|
|
|
| |
Similar to what we do in postsched. It is useful for pre-RA sched to be
a bit aware of things that would cause syncs. In particular for the tex
fetches, since the vecN src/dst tends to limit postsched's ability to
re-order them.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an instruction's only use is as an output, and it increases register
pressure, then try to avoid scheduling it until there are no other
options.
A semi-common pattern is `fragcolN.a = 1.0`, this pushes all these
immed loads to the end of the shader.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
|
|
|
|
|
|
|
|
|
| |
Realize that certain instructions make a vecN live, and account for
this, in hopes of scheduling the remaining components of the vecN
sooner.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4440>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces the depth-first search scheduler with a more traditional
ready-list scheduler. It primarily tries to reduce register pressure
(number of live values), with the exception of trying to schedule kills
as early as possible. (Earlier iterations of this scheduler had a
tendency to push kills later, and in particular moving texture fetches
which may not be necessary ahead of kills.)
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4440>
|
|
|
|
|
|
|
|
| |
This will need to be used in some cases for the upcoming bindless
support, plus ldc.k instructions which push data from a UBO to const
registers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
|
| |
They were inserting a nop between back to back SFU instrucions. But
that doesn't actually appear to be required. And they get stripped out
later anyways before legalize.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
|
| |
In schedule live value tracking, differentiate between half vs full
precision. Half-precision live values are less costly than full
precision.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
| |
Current scheduler thresholds try to ensure there are warps available to
switch to when hiding texture fetch latency. But if there is none to
hide, we should allow scheduler to use more registers to reduce nops.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
| |
We want to do this only once. If we have post-RA sched pass, then we
don't want to do it pre-RA. Since legalize is where we resolve the
branch/jumps, we might as well move this into legalize.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
| |
This way we can deal with it in one place, *after* all the blocks have
been scheduled. Which will simplify life for a post-RA sched pass.
This has the benefit of already taking into account nop's that legalize
has to insert for non-delay related reasons.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
| |
We're going to want these also for a post-RA sched pass. And also to
split nop stuffing out into it's own pass.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
| |
So many open coded list iterators were getting annoying.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Add some infrastructure to trace scheduler decisions. The next patch
will add some more traces, just splitting this out to reduce clutter.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec05.
Fixes: f30c256ec05 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We can at least get rid of the if-not-NULL check in a bunch of places.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a placeholder instruction to track texture fetches made prior to FS
shader dispatch. These, like meta:input instructions are scheduled
before any real instructions, so that RA realizes their result values
are live before the first real instruction. And to give legalize a way
to track usage of fetched sample requiring (sy) sync flags.
There is some related special handling for varying texcoord inputs used
for pre-fs-fetch, so that they are not DCE'd and remain in linkage
between FS and previous stage. Note that we could almost avoid this
special handling by giving meta:tex_prefetch real src arguments, except
that in the FS stage, inputs are actual bary.f/ldlv instructions.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An instruction can reference only a single address register value.
Add an assert to catch bugs.
Also, address value should also be local to the same block as the
instruction.
(The one spot where changing the instruction address is actually legit
needs to clear the address first.)
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The live_values and use_count was not being properly updated. This
starts triggering problems with the next patch, where we allow copy
propagation for RELATIV access.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early. And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.
For manhattan:
total instructions in shared programs: 27869 -> 28413 (1.95%)
instructions in affected programs: 26756 -> 27300 (2.03%)
helped: 102
HURT: 87
total full in shared programs: 1903 -> 1719 (-9.67%)
full in affected programs: 1390 -> 1206 (-13.24%)
helped: 124
HURT: 9
The reduction in register usage nets ~20% gain in manhattan. (So
getting mediump support should be a huge win for gles gfxbench.)
Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).
The effect is less pronounced on smaller shaders.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Account for shader outputs and values live in any direct/indirect
successor block.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Not sure why new-style frag inputs start triggering this. But we
probably shouldn't consider src's from other blocks.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not a perfect solution, and the "pressure" target is hard-coded. But it
doesn't really seem to much in the common case, and avoids exploding
register usage in dEQP ssbo tests.
So this should serve as a stop-gap solution until I have time to re-
write the scheduler.
Hurts slightly in instruction count, but gains (reduces) slightly the
register usage in shader-db. Fixes ~150 dEQP-GLES31.functional.ssbo.*
that were failing due to RA fail.
Signed-off-by: Rob Clark <[email protected]>
|
|
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver. The parts that are gallium
specific have been refactored out and remain in the gallium driver.
Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.
NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build. Waiting
patiently for the day that we can remove *everything* from the autotools
build.
Signed-off-by: Rob Clark <[email protected]>
|