| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A while ago, we started deferring GEM object closure and VMA release
until buffers were idle. This had some unforeseen interactions with
external buffers.
We keep imported buffers in hash tables, so if we have repeated imports
of the same GEM object, we map those to the same iris_bo structure.
This is critical for several reasons. Unfortunately, we broke this
assumption. When freeing a non-idle external buffer, we would drop it
from the hash tables, then move it to the zombie list. If someone
reimported the same GEM object, we would not find it in the hash tables,
and go ahead and make a second iris_bo for that GEM object. But the old
iris_bo would still be in the zombie list, and so we would eventually
call GEM_CLOSE on it - closing a BO that should have still been live.
To work around this, we defer removing a BO from the hash tables until
it's actually fully closed. This has the strange effect that an
external BO may be on the zombie list, and yet be resurrected before
it can be properly cleaned up. In this case, we remove it from the
list so it won't be freed.
Fixes severe instability in Weston, which was hitting EINVALs and
ENOENTs from execbuf2, due to batches referring to a GEM object that
had been closed, or at least had its VMA torched.
Fixes: 457a55716ea ("iris: Defer closing and freeing VMA until buffers are idle.")
|
|
|
|
|
|
| |
Everybody importing an external buffer was looking it up in the hash
table, then referencing it. We can just do that in the helper instead,
which also gives us a convenient spot to stash extra code shortly.
|
|
|
|
|
|
|
|
|
|
| |
The STM32MP157 features a Vivante GC400 GPU supported by etnaviv.
Add a DRM entry point for the STM display controller, so mesa
can be used with it.
Signed-off-by: Ahmad Fatoum <[email protected]>
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This issue was found by cppcheck
Signed-off-by: Andrii Simiklit <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
results from radeonsi NIR:
Totals from affected shaders:
SGPRS: 704 -> 464 (-34.09 %)
VGPRS: 2056 -> 672 (-67.32 %)
Spilled SGPRs: 24 -> 0 (-100.00 %)
Spilled VGPRs: 28406 -> 0 (-100.00 %)
Private memory VGPRs: 0 -> 3182 (0.00 %)
Scratch size: 1064 -> 3228 (203.38 %) dwords per thread
Code Size: 935260 -> 40180 (-95.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 28 -> 70 (150.00 %)
Wait states: 0 -> 0 (0.00 %)
results from radv:
Totals from affected shaders:
SGPRS: 80 -> 48 (-40.00 %)
VGPRS: 204 -> 108 (-47.06 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 256 (0.00 %) dwords per thread
Code Size: 15792 -> 9504 (-39.82 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1 -> 2 (100.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This adds an additional work around for the game to fix the blocky
shadows as reported in bug 105282
Acked-by: Eric Engestrom <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105282
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The load uniform/temporary operations output only to a pipeline
register, which must be consumed by another op in the same instruction
later.
The current implementation delays the decision of who will consume this
result to until the scheduling step. If the consumer node is not able to
use the pipeline register, a mov node may have to be created, during the
scheduler step.
As part of the ppir scheduler simplification, and now that the ppir
scheduler supports pipeline register dependencies, this can be
simplified by always creating a single mov node outputting to a normal
register that can be used directly by all consumers.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The select operation relies on the select condition coming from the
result of the the alu scalar mult slot, in the same instruction.
The current implementation creates a mov node to be the predecessor of
select, and then relies on an exception during scheduling to ensure that
both ops are inserted in the same instruction.
Now that the ppir scheduler supports pipeline register dependencies,
this can be simplified by making the mov explicitly output to the fmul
pipeline register, and the scheduler can place it without an exception.
Since the select condition can only be placed in the scalar mult slot,
differently than a regular mov, define a separate op for it.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ppir scheduler grew to be rather complicated and containing many
exceptions as it also has to take care of inserting additional nodes
when it is mandatory for nodes to be in the same instruction.
As such, the lima lowering and scheduling process can be difficult to
understand and maintain.
The ppir lowering step created nodes hoping that the scheduler would
notice the exception and do the right thing.
This proposal adds a simple refactor to the scheduler so that it places
nodes with pipeline registers in the same instruction.
With the scheduler handling this in a general way, it is possible to
create same-instruction dependencies by using pipeline registers during
the lowering stage.
This is simpler to maintain because now we can make these dependencies
explicit in a single place (lowering), and we can drop exceptions from
scheduling.
Reducing the complexity of the scheduler is also useful as preparatory
work to support control flow in ppir.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
| |
Right now, all it does is provide the new standard `static_assert()` name.
Fixes: fbf7c38da35afe7f1de0 ("egl/wayland: use bitset.h for `formats` bit set")
Signed-off-by: Eric Engestrom <[email protected]>
Tested-by: Bhushan Shah <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Utgard PP is vec4, but some operations are scalar, utilize
NIR vec to scalar lowering pass and indicate operations that we
want to lower.
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers
read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out
which bits to pull out of the prog data and stuff where. Therefore,
they need to be called with the final set of _NPixelDispatchEnable bits
after we've done the workaround for SIMD32 and 16x MSAA. Otherwise, if
you end up with a somewhat odd combination of enables, the GRF start reg
and KSP data ends up in the wrong slots. In particular, running
SIMD32-only is broken but several other combinations are as well.
Fixes: 5445c176e27ba "iris: Disable SIMD32 when using a 16x MSAA..."
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
These days it is not GLX only and it does not work with all TLS
implementations.
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
We already had one in the vec4 code, we just had move it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
NetBSD expects a `void *` argument [1] as the printf-style arguments to
the formatting string, so we need to cast the `const` away.
[1] https://netbsd.gw.com/cgi-bin/man-cgi?pthread_setname_np++NetBSD-current
Suggested-by: Kamil Rytarowski <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
Unused as of last commit.
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.
Next commit will remove the ones that were only added for that reason.
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
| |
I have no idea who thought this was a good idea.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that register spilling is in place, this is reasonable. It turns out
for some shaders, it's actually better to cap at 8 work registers and
extra >8 uniform reigsters and tolerate the spilling, since the extra
resulting threads make up for the spillage. So incidentally, the shader
that spills here is in -bterrain, which jumps from 19fps to 21fps as a
result of this change.
total instructions in shared programs: 3513 -> 3448 (-1.85%)
instructions in affected programs: 776 -> 711 (-8.38%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 3.25 x̃: 2
helped stats (rel) min: 3.57% max: 16.00% x̄: 8.37% x̃: 7.19%
95% mean confidence interval for instructions value: -4.28 -2.22
95% mean confidence interval for instructions %-change: -10.02% -6.73%
Instructions are helped.
total bundles in shared programs: 2067 -> 2024 (-2.08%)
bundles in affected programs: 515 -> 472 (-8.35%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 6 x̄: 2.37 x̃: 2
helped stats (rel) min: 2.13% max: 17.86% x̄: 10.19% x̃: 11.11%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23%
95% mean confidence interval for bundles value: -3.01 -1.29
95% mean confidence interval for bundles %-change: -12.13% -6.91%
Bundles are helped.
total quadwords in shared programs: 3468 -> 3426 (-1.21%)
quadwords in affected programs: 764 -> 722 (-5.50%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 5 x̄: 2.26 x̃: 2
helped stats (rel) min: 1.41% max: 12.50% x̄: 6.76% x̃: 7.14%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.08% max: 1.08% x̄: 1.08% x̃: 1.08%
95% mean confidence interval for quadwords value: -2.83 -1.37
95% mean confidence interval for quadwords %-change: -8.08% -4.65%
Quadwords are helped.
total registers in shared programs: 383 -> 360 (-6.01%)
registers in affected programs: 112 -> 89 (-20.54%)
helped: 19
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 1.21 x̃: 1
helped stats (rel) min: 12.50% max: 27.27% x̄: 20.63% x̃: 20.00%
95% mean confidence interval for registers value: -1.47 -0.95
95% mean confidence interval for registers %-change: -22.39% -18.87%
Registers are helped.
total threads in shared programs: 432 -> 451 (4.40%)
threads in affected programs: 19 -> 38 (100.00%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.73 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.41 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].
total loops in shared programs: 4 -> 4 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 0 -> 4
spills in affected programs: 0 -> 4
helped: 0
HURT: 2
total fills in shared programs: 0 -> 7
fills in affected programs: 0 -> 7
helped: 0
HURT: 2
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
No functional changes, just breaks out a megamonster function and fixes
the indentation.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need three independent sources to support indirect SSBO writes (as
well as textures with both LOD/bias and offsets). Now is a good time to
make sources just an array so we don't have to rewrite a ton of code if
we ever needed a fourth source for some reason.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I know, there's no such thing as a load/store op that only
takes its argument in r27. We just need to set the appropriate arg_1
field in the RA to specify other registers if we want them.
To facilitate this, various RA-related changes are needed across the
compiler ; this should also fix indirect offsets which were implicitly
interpreted as "r27-only" despite not even passing through RA yet. One
ripple effect change is switching the move insertion point and adjusting
the liveness analysis accordingly, so while this was intended as a
purely functional change, there are some shader-db changes:
total instructions in shared programs: 3511 -> 3498 (-0.37%)
instructions in affected programs: 563 -> 550 (-2.31%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.93% max: 5.00% x̄: 2.58% x̃: 2.33%
95% mean confidence interval for instructions value: -1.27 -0.90
95% mean confidence interval for instructions %-change: -3.23% -1.93%
Instructions are helped.
total bundles in shared programs: 2067 -> 2067 (0.00%)
bundles in affected programs: 398 -> 398 (0.00%)
helped: 7
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.54% max: 10.00% x̄: 5.04% x̃: 5.56%
HURT stats (abs) min: 1 max: 2 x̄: 1.75 x̃: 2
HURT stats (rel) min: 2.13% max: 4.26% x̄: 3.72% x̃: 4.26%
95% mean confidence interval for bundles value: -0.95 0.95
95% mean confidence interval for bundles %-change: -5.21% 1.50%
Inconclusive result (value mean confidence interval includes 0).
total quadwords in shared programs: 3464 -> 3454 (-0.29%)
quadwords in affected programs: 1199 -> 1189 (-0.83%)
helped: 18
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.03% max: 5.26% x̄: 2.44% x̃: 1.79%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 2.56% max: 2.82% x̄: 2.63% x̃: 2.56%
95% mean confidence interval for quadwords value: -0.98 0.07
Inconclusive result (value mean confidence interval includes 0).
total registers in shared programs: 383 -> 373 (-2.61%)
registers in affected programs: 56 -> 46 (-17.86%)
helped: 12
HURT: 2
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 9.09% max: 33.33% x̄: 29.58% x̃: 33.33%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 20.00% max: 50.00% x̄: 35.00% x̃: 35.00%
95% mean confidence interval for registers value: -1.13 -0.29
95% mean confidence interval for registers %-change: -35.07% -5.63%
Registers are helped.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Load/store's main "argument 0" already has its swizzle handled
correctly (for stores, that is). But the tinier arguments, the compact
ones with a component select but not a full swizzle, those are not yet
handled. Let's do something about that!
|
|
|
|
|
|
|
|
|
| |
Rather than an ersatz thing that sort of looks like successors but is in
fact just the source order traversal with some backward jumps hacked in
for loops... construct an actual flow graph so we can do analysis
sanely.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
3-bits out of 8 down!
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
r27 isn't the special one, usually.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The 16-bit field can be decomposed to two independent 8-bit fields, each
representing a single (additional) argument to the load/store op,
generally used for encoding registers. Addressable registers here are
substantially limited compared to the main register in a load/store op.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
So we can use it with imageless framebuffers.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
That way we can use it for imageless framebuffers.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than alloacting a huge (64MB) polygon list on context creation
and sharing it across framebuffers, we instead allocate polygon lists as
BOs (which consistently hit the cache) sized appropriately; for about a
month, we've known how to calculate the polygon list size so this has
only recently become possible.
The good news is we can render to truly massive framebuffers without
crashing and, more importantly, we eliminate the 64MB upfront overhead.
If a list that size isn't actually needed, it's not allocated.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
|
|
|
|
|
|
|
| |
Allows us to pass BOs without checking if they're NULL or not.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
The only user of this function always passes true.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
The FB desc will be emitted/attached on the first draw targetting
this new FB.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The wallpaper blit is a bit special in that the operation is targetting
the current FB, but the u_blitter logic creates a new surface for it
which makes util_framebuffer_state_equal() return false. In that case
we don't want a new FB descriptor to be emitted/attached, so let's just
copy the new state into ctx->pipe_framebuffer and exit the function.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If the current FB matches the new one there's nothing to be done in
panfrost_set_framebuffer_state(). By bailing out early in that case we
avoid emitting new FB descriptors (the old ones are still valid).
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No need to emit SFBD/MFBD at frame invalidation. They can be emitted
when the framebuffer is attached, which saves us a potential FB desc
re-allocation if a new FB is bound after the swap.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This guarantees that new draws targetting the same framebuffer will
get a new job instance.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ctx->job is supposed to serve as a cache to avoid an hash table lookup
everytime we access the job attached to the currently bound FB, except
it was never assigned to anything but NULL.
Fix that by adding the missing assignment in panfrost_get_job_for_fbo().
Also add a missing NULL assignment in the ->set_framebuffer_state()
path.
While at it, add extra assert()s to make sure ctx->job is consistent.
Fixes: 59c9623d0a75 ("panfrost: Import job data structures from v3d")
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
When the application does not ask for robust buffer access.
Only implemented the check in radv.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These versions still need wrapper but already have both success and
failure ordering.
(Compile tested on llvm 3.3, 3.7, 3.8.)
v2: don't duplicate whole function (suggested by Brian).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111102
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|