| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add one member to the existed ir3_stream_output so that we could
assign location information from nir_xfb_info, rather than defining
new struct.
- Redefine maximum of so buffers, streams and outputs, which will be
used for turnip.
- Also enable caps for transform feedback for spirv_to_nir.
v2. Remove redefined maximums and use IR3_MAX_SO_* and add
IR3_MAX_SO_STREAMS.
v3. Remove the newly added location field so that we could keep aligned
with 32 bytes. Instead we create an array mapping between the location
and consecutive index, which is GL driver is doing.
Signed-off-by: Hyunjun Ko <[email protected]>
Reviewed-by: Jonathan Marek <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
|
|
|
|
|
|
|
|
|
|
| |
Update postsched to be better aware of where costly (ss) syncs would
result. Sometimes it is better to allow a nop or two, to avoid a
sync quickly after an SFU.
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the second (scalar pass) use the information about # of registers
used in the first pass as the target max, and round-robin within that
range. This generally gives the post-RA sched pass more opportunities
to re-order instructions to remove nop's.
Also, we can be a bit clever when assigning dest registers for SFU
instructions, by picking the register used for it's src (if available
and already assigned). This avoids some (ss) syncs caused by write
after read hazards. (Ie. the SFU instruction will read it's own src
before writing dest.)
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
| |
We'll use the feedback from the first pass to select a target register
usage in the second pass.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
| |
Also count tex-prefetch instructions. And only let the no-latency rule
kick in for frag shaders.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
|
| |
Doesn't take into account stalls that result from a register written in
a different block, etc. But this should be more useful than just using
number of (ss)'s by trying to estimate how costly a given sync is.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
|
|
|
| |
They were inserting a nop between back to back SFU instrucions. But
that doesn't actually appear to be required. And they get stripped out
later anyways before legalize.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
|
|
|
|
|
|
| |
Not supported.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3929>
|
|
|
|
|
|
|
|
| |
Catch problems earlier when inputs are not setup correctly.
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-VK.glsl.texture_functions.query.texturesamples.sampler2dms_fixed_vertex
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
| |
Fixes crash seen in:
dEQP-VK.glsl.conversions.matrix_to_matrix.mat4_to_mat3x4_vertex
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
| |
Need to list_delinit() before we clone the instruction to split it into
individual samgpN instructions, otherwise we get list corruption.
Tested-by: Eduardo Lima Mitev <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
|
| |
1) emperically, 10 seems like a more accurate # than 4
2) push "soft" delay handling into ir3_delayslots(), as
we should also be using it to calculate the costs
that the schedulers use
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
| |
In schedule live value tracking, differentiate between half vs full
precision. Half-precision live values are less costly than full
precision.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
| |
Current scheduler thresholds try to ensure there are warps available to
switch to when hiding texture fetch latency. But if there is none to
hide, we should allow scheduler to use more registers to reduce nops.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid spurious sync flags, we want to, for a6xx+, operate in terms of
half-regs, with a full precision register testing the corresponding two
half-regs that it conflicts with.
And while we are at it, stop open-coding BITSET
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
| |
No longer used.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
| |
No longer used, other than in ir3 cmdline compiler, where it can be
replaced with a local variable.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
|
|
|
|
|
|
|
|
|
| |
This way we can also use ir3_print from computerator, which mostly
bypasses the ir3_block construct (since it doesn't need to do
scheduling, etc)
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3926>
|
|
|
|
|
|
|
|
|
| |
Updates for differences between fdre-a3xx's early version of ir3, and
what we have now in mesa. And updates for instruction name and syntax
changes.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3926>
|
|
|
|
|
|
|
|
|
| |
This cuts out a bunch of deref chain walking that the compiler can do for
us.
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
|
|
|
|
|
|
|
|
| |
GLuint worked fine for storing our enum, but it should be an enum
pipe_format since the image-formats merge.
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
|
|
|
|
|
|
|
|
| |
Now that we have access to the interior switch statement not going through
the txs special case for coord_components, we can just use it.
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lowers mediump FS outputs to fp16 in the ir3 backend. For now
this is a modest improvement, which mostly helps us whittle down the
full mediump work. Once the GLSL level support lands, then right hand
side of the store output intrinsics will be fp16 expressions and we'll
cancel out the fp16 -> fp32 -> fp 16 round trip here.
We've had different attempts at implementing this: rewriting stores in
the GLSL IR, lowering GLSL IR outputs to temporaries and inserting
conversions when writing the temporaries to the outputs. In the end,
GLSL ends up getting in the way a lot and doing it at the nir level is
easier and still possible since we have the output var precisions.
This part of the fp16 work is more of a step on the way towards full
fp16 support and will add a few extra conversion instructions:
total instructions in shared programs: 8151 -> 8163 (0.15%)
instructions in affected programs: 1187 -> 1199 (1.01%)
helped: 4
HURT: 10
total nops in shared programs: 3146 -> 3152 (0.19%)
nops in affected programs: 563 -> 569 (1.07%)
helped: 5
HURT: 10
total non-nops in shared programs: 5005 -> 5011 (0.12%)
non-nops in affected programs: 92 -> 98 (6.52%)
helped: 0
HURT: 3
total dwords in shared programs: 12832 -> 12800 (-0.25%)
dwords in affected programs: 96 -> 64 (-33.33%)
helped: 1
HURT: 0
total last-baryf in shared programs: 118 -> 115 (-2.54%)
last-baryf in affected programs: 21 -> 18 (-14.29%)
helped: 1
HURT: 0
total full in shared programs: 424 -> 417 (-1.65%)
full in affected programs: 15 -> 8 (-46.67%)
helped: 7
HURT: 0
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
|
|
|
|
|
|
|
|
| |
So far we only handle full regs of arrays during pre-allocation.
This patch is to handle half regs of arrays and also consider the size
of half regs when finding out conflicts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass tries to fold f2f16 conversion into alu instructions.
This will be useful to help reduce the number of instructions once
mesa starts supporting precision lowering. For example:
add.f r0.w, r0.w, c0.x
cov.f32f16 hr2.x, r0.w
to
add.f hr2.x, r0.w, c0.x
Additionally this pass also tries to fold f2f16 conversion into load_input
instruction:
bary.f r0.x, 3, r0.w
cov.f32f16 hr0.x, r0.x
to
bary.f hr1.x, 3, r0.x
v2: Edit to not fold OPC_MAX_F and OPC_MIN_F, since that's not valid.
v3: Add OPC_ABSNEG_F to the blacklist as well.
v4: Don't remove dead cov instructions, DCE will do that later; don't
iterate through sources when a cov only has one; remove special
handling of IR3_REG_ARRAY and IR3_REG_RELATIV.
v5: Handle folding into u32.u32 movs of floats correctly, don't bail
out on IR3_REG_RELATIV or IR3_REG_ARRAY movs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
|
|
|
|
|
| |
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit leads to match immed values unexpectedly.
This makes constlen for each shader including bvert wrong.
Also fixes atan2 for mediump deqp tests.
Fixes: cbd1f47433b ("freedreno/ir3: convert back to 32-bit values for half constant registers.")
v2: Move conversion up above fabs/fneg modifier handling as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
|
|
|
|
|
|
| |
v2: Reworked to assign half-opcodes in ir3_ra.c (krh).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A sequence like:
(nop3)cov.f32f16 hr0.x, c0.x
mul.f hr4.y, hr1.z, hr0.x
can be turned into:
mul.f hr4.y, hr1.z, hc0.x
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
|
|
|
|
| |
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
|
|
|
|
|
|
|
|
| |
This lets is_same_type_reg() recognize that the dst and src of the
immediate MOV are the same and unblocks fp16 constant propagation.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.
Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).
This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
|
|
|
|
|
|
|
|
|
|
|
| |
Lies, damn lies, and leftover hacks!
We no longer hard-code these two, so fix the disasm to print the correct
values.
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases we need to split components out from what was already a
collect. That was making it hard to DCE unused components of the
collect. (Ie. unused components of fragcoord, etc)
So just detect this case and skip the chained collect+split.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was somehow working to create the instructions in a random block,
and use the value in other blocks, by dumb luck. But two-pass-RA's
better choice of register assignment causes a couple dEQPs to start
failing without this fix:
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
| |
Just killing the SSA link isn't enough. It confuses RA, legalize,
and postsched to see a bogus unused reg.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
| |
This apparently can happen with gs/tess. And will cause problems with
two-pass-ra, so lets just skip them.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the aspects of tex prefetch are in common with normal tex
instructions, such as having a wrmask to control which components
are written. Add a helper for this.
This should result in actually using the prefetch wrmask to avoid
fetching unneeded components.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
| |
ra_block_compute_live_ranges() treats zero as "not yet defined", so
probably best to not let this be a valid instruction #
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After RA, we can schedule to increase parallelism (reduce nop's) without
worrying about increasing register pressure. This pass lets us cut down
the instruction count ~10%, and prioritize bary.f, kill, etc, which
would tend to increase register pressure if we tried to do that before
RA.
It should be more useful if RA round-robin'd register choices.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kill (and other cat0/flow instructions) do not have a dst register.
Which was mostly harmless before, other than RA thinking it would need
a free register to write. (But nothing consumed it, so the value would
be immediately dead.) But this would cause more problems with postsched
which would see a bogus dependency.
Also, post-RA sched *does* need to see the dependency on the predicate
register.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally these were nested functions, which worked nicely, giving us
the function of a local macro that was actual 'c' syntax (ie. not token
pasted macro). But these were converted to macros because clang doesn't
let us have nice gcc extensions.
Extract these back out into functions, before adding more things and
making the macros even more cumbersome.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
| |
Also dump where arrays are allocated. This was useful for debugging.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
| |
A post-RA sched pass will move the extra mov's to the wrong place, so
rework the fixup so it can run after RA (and therefore after postsched)
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|
|
|
|
|
|
|
|
|
| |
We want to do this only once. If we have post-RA sched pass, then we
don't want to do it pre-RA. Since legalize is where we resolve the
branch/jumps, we might as well move this into legalize.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
|