| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
The `ir3_shader` is the root mem ctx, with `ir3_shader_variant` hanging
off that, and various variant specific allocations hanging off the
variant.
This lets us delete a bunch of cleanup code.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
|
|
| |
Prep to convert over to ralloc.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems a lot of the lowerings being run the second time were
unnecessary. In addition, when const_state is moved to the variant,
then it will become impossible to know ahead of time whether a variant
needs additional optimizing, which means that ir3_key_lowers_nir() needs
to go away. The new approach should have the same effect, since it skips
running lowerings that are unnecessary and then skips the opt loop if no
optimizations made progress, but it will work better when we move
ir3_nir_analyze_ubo_ranges() to be after variant creation.
The one maybe controversial thing I did is to make
nir_opt_algebraic_late() always happen during variant lowering. I wanted
to avoid code duplication, and it seems to me that we should push the
_late variants as far back as possible so that later opt_algebraic runs
don't miss out on optimization opportunities.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only difference between this and `const_state->num_ubos` was that
the latter is counting # of ubos loaded via `ldg` (based on UBO addrs
in push-consts). But turns out there isn't really any reason to care.
Instead just add an early return in the one code-path that cares about
the number of `ldg` UBOs.
This gets rid of one more thing we need to move from `ir3_shader` to
`ir3_shader_variant`.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
|
|
|
|
| |
As with const_state, this will also need to move into the variant. To
simplify that, just move it into the const_state itself, since after all
it is related.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are going to want to move this back to the variant, and come up with
a different strategy for binning/nonbinning to share the same constant
layout, in order to implement shader-cache support. (Since then we
can have a mix of dynamically compiled variants and cache hits, so there
is no good place to serialize the const-state.)
To reduce the churn as we re-arrange things, move direct access to the
const-state to a helper fxn. This patch is the boring churny part.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
|
|
|
|
| |
Deduplicate a bit of hand-building of ir3_shader/_variant from
computerator and delay test. This also removes the need for
external things to depend on generated ir3_parser header.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than assuming a6xx+ means mergedregs. We can actually (mostly?)
do splitregs on a6xx as well. And GS/DS/HS currently require it, which
might be papering over a bug, or might be something to do with how
chaining shaders works. At any rate, we should at least be consistent,
and not have the compiler thinking we are doing mergedregs when we are
actually doing splitregs.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
|
| |
Just pass thru the variant, since it has everything we need. And
will be needed in the next patch.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
| |
Prep for the next patch.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
|
| |
Allow different regset's to coexist, so we can make mergedregs vs split
reg file a variant property.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
| |
It is only needed one place, let's move it there.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
|
| |
Prep to make merged/split register file mode a property of the regmask,
rather than the ir3_register.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
|
|
|
|
|
|
|
| |
In addition to duplicating what core NIR does better, this was wrong for
Vulkan, where it should be 0 as there are no non-bindless samplers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5519>
|
|
|
|
|
|
|
|
|
|
| |
ir3_shader_from_nir() calls ir3_optimize_nir(), which currently sets up
the const state. However, we need to know the number of user consts
reserved by the driver before setting up the const state, which means
that this information needs to be passed into ir3_shader_from_nir()
somehow rather than being set in the shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5500>
|
|
|
|
|
|
|
|
|
|
| |
A pass to eliminate extra mov's from an array. We need to do this after
scheduling so we know that there are not any potentially conflicting
array writes between the original `mov` and it's use(s).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2124
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
| |
We'll also need this in the postsched-cp pass.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
|
|
| |
A bit cleaner than open coding the list manipulation. Plus I want to
use it in the next patch, rather than adding more open coded list
futzing.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
|
|
| |
When a sequence of same instruction is encoded with repeat flag,
destination registers are written on successive cycles. Teach the
delay calculation about this.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
|
|
| |
These two encodings are mutually exclusive. If the instruction is a
vector(ish) `(rptN)` instruction, then we can't fold a `(nopN)` post-
delay into it.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the `try_swap_mad_two_srcs()` case, valid_flags() gets called both
for the src that we want to try to fold, and for the other src that we
are trying to swap to make that possible. It can happen in the 2nd case
that a RELATIV src has already been folded. Since `ssa()` returns non-
null in both the `IR3_REG_SSA` and `IR3_REG_ARRAY` cases (in the later
case, it is the dependent array access that the current instruction
cannot be moved ahead of), we need to explicitly check that the src
reg we are looking at is still an SSA src.
Reported-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
|
| |
Verify that instructions which have a relative src and/or dest, have
`instr->address`.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is better to use `nir_intrinsic_dest_components()` which also handles
the case of intrinsics with a fixed number of dest components.
Somehow this starts showing up with a nir_serialize round-trip with
shader-cache. But we really shouldn't have been relying on
`intr->num_components` directly.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only reason for this dependency was the fd_bo used for the uploaded
shader. But this isn't used by turnip. Now that we've unified the
cleanup path from gallium, it isn't hard to pull the fd_bo upload/free
parts into ir3_gallium.
This cleanup has the added benefit that the shader disk-cache will not
have to deal with it.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5476>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ir3_nir_move_varying_inputs is broken when there a load input outside of
the first block which depends on the result of a previous load input.
This simplification/rework avoids the problem, and should also be faster.
Fixes this dEQP-VK test:
dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_pixel_center.128_128_1.samples_2
Signed-off-by: Jonathan Marek <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5465>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach RA to setup additional interference to prevent textures fetched
before the FS starts from ending up in a register that is too high to
encode.
Fixes mis-rendering in multiple playcanv.as webgl apps.
Note that the regression was not actually 733bee57eb8's fault, but
that was the commit that exposed the problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3108
Fixes: 733bee57eb8 ("glsl: lower samplers with highp coordinates correctly")
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
|
|
|
|
|
|
|
|
|
|
| |
This is mainly the "piglit optimization" (ie, since piglit launches an
separate process for for each test). It was never wired up for a6xx,
and makes register class setup unnecessarily complicated. Remove it to
simplify the next patch.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
|
|
|
|
|
|
|
|
|
| |
Refactor a bit the limit checking in the bindless case, and add tex/samp
limit checking for the non-bindless case, to ensure we do not try to
prefetch textures which cannot be encoded in the # of bits available.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
|
|
|
|
|
|
|
|
| |
I keep re-typing this from time to time when debugging various things.
Which is dumb.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
|
|
|
|
|
|
|
|
| |
We advertize 4096 vec4s of GL uniform storage, but the HW can only store
512 vec4s in the const buffer.
Closes: #3049
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GLES blob on the p3a limits constlen to 512 between VS and FS across
a6xx gpu ids (615, 630, 640, and 650). Experimentally, exceeding that
limit in any one stage results in rendering corruption or GPU hangs
(though my most detailed testing had a loop limit in a uniform, so that
may the cause of the hang). Clamp the limit we use inside of a shader so
we don't exceed it within a stage.
This commit doesn't resovle limiting inter-stage. Experimentally, I've
found that I can push up to a total of ~768 vec4s between VS and FS on
a630, with or without uniform updates between each draw. We'll need to do
some shader key-based limiting of constlen at draw time to respect that
limit, but that's left for future work, and this commit is enough for the
google earth case that initiated this work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
|
| |
The const state setup needs to be able to push its driver params, so
account for them in the analyze_ubo_ranges.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out the GL uniforms file is larger than the hardware constant
file, so we need to limit how many UBOs we lower to constbuf loads. To do
actual UBO loads, we'll need to be able to upload UBO 0's pointer or
descriptor.
No difference on nohw 1 UBO update drawoverhead case (n=35).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
|
| |
The analysis pass gives us vec4-aligned size, and all of our other
constbuf allocations here are in vec4 units, so we can just divide by 16.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
|
| |
If we filled the constbuf up with UBOs, we may need to avoid generating
more immediate push constants.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
| |
There was duplicated handling in the callers that we can just move inside.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
|
|
|
|
|
|
|
|
|
| |
Unlike other conditions which prevent early-discard of fragments, kill
does not prevent early LRZ test. Split `has_kill` from `no_earlyz` so
we can take advantage of this.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.
In the following patches we will use this to support the removal
of dead uniforms in GLSL.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
|
|
|
|
|
|
|
|
|
|
| |
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
|
|
|
|
|
|
|
|
|
|
|
| |
First element is not a scalar. Just initialize the struct like we do
elsewhere.
src/freedreno/ir3/disasm-a3xx.c:958:33: warning: suggest braces around
initialization of subobject [-Wmissing-braces]
Reviewed-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5174>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The closed GL driver uses resinfo on images with the writeonly flag (using
the texture-path's getsize only for readonly images). The closed vulkan
driver seems to use resinfo regardless. Using resinfo doesn't need any
fixups after the instruction. It also avoids one of the needs for the
TEX_CONST state for the image, which is awkward to set up in the GL
driver.
The new handler goes into ir3_a6xx to be next to the other current image
code, but the a4xx version is left in place because it wants a bunch of
sampler helpers.
Fixes assertion failure in dEQP-VK.image.image_size.buffer.readonly_32.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
|
|
|
|
|
|
|
| |
There was an open coded version for ldc, and now we can drop that. I
needed to do it for resinfo as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
|
|
|
|
|
|
|
|
| |
All the users of the unsigned result just wanted an ir3_instruction to
reference. Move a6xx's helpers to ir3_image.c and inline the old unsigned
results version.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
|
|
|
|
|
|
|
|
| |
Noticed comparing our RESINFO asm to qcom's for the same test, and if I
drop this bit their disasm switches from immediate to reg. ldgb seems to
have the same behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
|