| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Similiar to iadd, we can fold an added constant value from an imad24_ir3
into the load_uniform's constant offset. This avoids some cases where
the addition of imad24_ir3 could otherwise be a regression in instr
count.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
| |
This maps to mul.s24
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't encode immed sources for cat3 (mad) instructions, but we can
use const in first or third src. We handled this case already, but we
weren't considering that we could lower immed to const.
For manhattan:
total instructions in shared programs: 35202 -> 34718 (-1.37%)
instructions in affected programs: 14931 -> 14447 (-3.24%)
helped: 90
HURT: 0
total full in shared programs: 2451 -> 2359 (-3.75%)
full in affected programs: 653 -> 561 (-14.09%)
helped: 69
HURT: 2
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
| |
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Simply emit an ir3_MAD_S24 instruction in the backend.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
| |
to mul.s24/mul.u24, to better reflect that these are 24b multiply.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eduardo Lima Mitev <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:
* The coordinate is a vector where all its components come from a
shader input.
* The order of the components match exactly that of the input (no
swizzles).
* The instruction is in the 'main' function, and in the outer
most-block.
The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.
The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It seems that pre-fs texture fetch only works if ij_pix ends up in r0.x.
I've tried unknown zero bits, to no avail, and blob also seems to force
r0.x when this feature is used.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Useful to see in disassembly listing texture fetches that were moved to
pre-dispatch.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If the only use of varyings is a pre-shader texture-fetch, we still need
to issue a bary.f with the end-input flag, otherwise we'll block further
VS invocations, as the hw will think varying storage is still busy.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It is possible that the result of a pre-fs texture fetch is an output
(or partially an output) of the FS. Sine the meta:tex_prefetch
instructions are dropped before the assembler, we need to account for
this when we fixup the register footprint.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a placeholder instruction to track texture fetches made prior to FS
shader dispatch. These, like meta:input instructions are scheduled
before any real instructions, so that RA realizes their result values
are live before the first real instruction. And to give legalize a way
to track usage of fetched sample requiring (sy) sync flags.
There is some related special handling for varying texcoord inputs used
for pre-fs-fetch, so that they are not DCE'd and remain in linkage
between FS and previous stage. Note that we could almost avoid this
special handling by giving meta:tex_prefetch real src arguments, except
that in the FS stage, inputs are actual bary.f/ldlv instructions.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When we enable pre-dispatch texture fetch, we could have a scenario
where the barycentric i/j coord sysval is not used in the shader, but
only used for the varying fetch for the pre-dispatch texture fetch.
In this case we need to take care not to DCE this sysval.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Will be needed for special handling of SYSTEM_VALUE_BARYCENTRIC_PIXEL
(ij_pix) when pre-fs texture fetch is enabled.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Not sure I remember how long this has been unused for. But it's unused
now.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eduardo Lima Mitev <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
When we don't have streamout enabled, we have to read this register to
get the number of primitives emitted.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When used in a GS pipeline, the VS doesn't end with the END
instruction. Instead it chains to the GS, which continues running with
the same register allocation. The intended use cases seems to be that
you can compile a regular VS (ie outputs in registers and ending with
END) but then tack on link-time generated code past the END to write
the outputs using STLW, in case the VS is used with GS.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
We don't know what kind of loads we might have to wait on when coming
in from chsh in the VS so set both sync flags.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These sysvals have to be unclobbered by VS and in the same registers
in both VS and GS, since the chsh from VS to GS doesn't reload the
values. We use the pre-color argument to ir3_ra() to always place
these values in r0.x and r0.y.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inputs are the GS header, which contains vertex ID, local primitive ID
and thread ID as well as primitive ID. The setup is a little different
from other sysvals, since we always have to receive them in the VS so
that it can pass them on into the GS.
The vertex flag outputs from GS is set up as a proper nir output in
the lowering pass and doesn't need special handling here.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
This implements the load_vs_primitive_stride_ir3,
load_vs_vertex_stride_ir3 and load_primitive_location_ir3 intrinsics,
used for getting the primitive layout strides and locations.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Since the presence of GS changes how the VS operates we need to track
that in the shader key.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
These intrinsics will let us do all the offset calculations in nir,
which is nicer to work with and lets nir_opt_algebraic eat it all up.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
These access memory used for passing data between geometry stages.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
We'll need to pre-color certain input registers betwee VS and GS
shaders.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Before, offset held the offset, which can be either immediate or a
register. Use a third register to hold the offset so that we can use
a register.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Just add the constructors for now and special case similar to END so
we don't remove them.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Not perfect but gets through some tests.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Variables with same location should use the same driver_location.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Avoids assert fails in ir3.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Taken from nir_lower_samplers. Sampler arrays don't work though, this is
just to avoid an assert fail in ir3.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Fix some mistakes in previous series.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Notably includes centroid varying bits that were missing.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
This is what we do in freedreno.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Noticed while debugging a tiling-looking issue by comparing our gmem
blit setup to freedreno's.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes ir3 compiler failure failure in
dEQP-VK.renderpass.dedicated_allocation.formats.r8g8b8a8_unorm.clear.clear_draw
(now just a rendering failure where the subpass clear isn't happening)
Reviewed-by: Kristian H. Kristensen <[email protected]>
|