| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This reverts commit 4f857423b3c095516e553b976b41969c2b9721fa.
It caused GPU hangs on all affected platforms, in e.g.
Piglit bin/stencil-twoside -auto -fbo.
|
|
|
|
|
|
|
| |
This reverts commit 19546108d3dd5541a189e36df4ea83b3f519e48f.
This commit breaks the build because lima implements
->set_damage_region(). I guess we'll need more discussion before
removing the ->set_damage_region() hook.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 492ffbed63a2a62759224b1c7d45aa7923d8f542.
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update(), leading to a damage region update on the
wrong pipe_resource object.
Let's not expose the ->set_damage_region() method until the core is
fixed to handle that properly.
Cc: [email protected]
Signed-off-by: Boris Brezillon <[email protected]>
Acked-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
| |
In preparation for testing drivers other than Panfrost in LAVA labs.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Include Panfrost's gitlab.ci.yml file from Mesa's main .gitlab-ci.yml so
we test on devices with Panfrost.
This uses LAVA to schedule jobs in the devices and will be the base for
testing Etnaviv, Lima, etc.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a port of Nanley's 904c2a617d86944fbdc2c955f327aacd0b3df318
from i965 to iris.
One concern is that iris uses larger batches, and also emits far fewer
commands, so we may come closer to the 500 limit within a batch, and
could need to supplement this with actual counting. Manhattan 3.0 had
239 3DSTATE_CONSTANT_PS packets in a batch, Unigine Valley had 155.
So it seems like we're still in the realm of safety.
|
|
|
|
|
| |
We'll need this for a workaround shortly. While refactoring, also
improve the comment slightly.
|
|
|
|
|
|
|
| |
This should improve texture sampling performance on GC3000.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Update to etna_viv commit 7ff8029.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the instruction interpolateAtCentroid is used the extra interpolator
must also be enabled in the state.
Fixes: fs-interpolateatcentroid-block
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have track inter-batch dependencies, the flush done in
panfrost_set_framebuffer_state() is no longer needed. Let's get rid of
it.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have all the pieces in place to support pipelining batches
we can get rid of the drmSyncobjWait() at the end of
panfrost_batch_submit().
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have to flush all batches when we're only interested in
reading/writing a specific BO. Thanks to the
panfrost_flush_batches_accessing_bo() and panfrost_bo_wait() helpers
we can now flush only the batches touching the BO we want to access
from the CPU.
This fixes the dEQP-GLES2.functional.fbo.render.texsubimage.* tests.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This is needed if we want to free the panfrost_batch object at submit
time in order to not have to GC the batch on the next job submission.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Will be useful to make the ioctl(WAIT_BO) call conditional on BOs that
are not exported/imported (meaning that all GPU accesses are known
by the context).
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow us to only flush batches touching a specific resource,
which is particularly useful when the CPU needs to access a BO.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
And use it in panfrost_flush() to flush all batches, and not only the
one currently bound to the context.
We also replace all internal calls to panfrost_flush() by
panfrost_flush_all_batches() ones.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The panfrost_fence logic currently waits on the last submitted batch,
but the batch serialization that was enforced in
panfrost_batch_submit() is about to go away, allowing for several
batches to be pipelined, and the last submitted one is not necessarily
the one that will finish last.
We need to make sure the fence logic waits on all flushed batches, not
only the last one.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The idea is to track which BO are being accessed and the type of access
to determine when a dependency exists. Thanks to that we can build a
dependency graph that will allow us to flush batches in the correct
order.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'll soon need to freeze a batch not only when it's flushed, but also
when another batch depends on us, so let's add a helper to avoid
duplicating the logic.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We just replace the per-context out_sync object by a pointer to the
the fence of the last last submitted batch. Pipelining of batches will
come later.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
So we can implement fine-grained dependency tracking between batches.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
So we can store the flags as data and keep the BO as a key. This way
we keep track of the type of access done on BOs.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type of access being done on a BO has impacts on job scheduling
(shared resources being written enforce serialization while those
being read only allow for job parallelization) and BO lifetime (the
fragment job might last longer than the vertex/tiler ones, if we can,
it's good to release BOs earlier so that others can re-use them
through the BO re-use cache).
Let's pass extra access flags to panfrost_batch_add_bo() and
panfrost_batch_create_bo() so the batch submission logic can take the
appropriate when submitting batches. Note that this information is not
used yet, we're just patching callers to pass the correct flags here.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We know a shader will be used by a batch when
panfrost_patch_shader_state() is called, so let's add the shader BO at
that time.
Suggested-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The winsys might supply dimensions that are different than
those we calculate. In additional, it may supply virtualized
modifiers.
In practice, a stride != bpp * width and virtualized modifiers don't
happen yet, but the plan is to move in that direction.
Also make virgl_resource_layout static.
Reviewed by: Robert Tarasov <[email protected]>
|
|
|
|
|
|
|
| |
This commit makes no functional changes, just adds the revelant
plumbing.
Reviewed by: Robert Tarasov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware has a bug with triangle strips and it is signalled by the
flag BUG_FIXED8 whether this bug has been fixed. So only enable triangle
strips when this flag is set.
Thanks: Jonathan Marek and Christian Gmeiner for the pointers
v2: Add TODO to indicate that the handling should be refined
(Jonathan & Christian)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We can't just check for the BO base address, we need to check for the
full address including any offset we may have applied. When updating
the address, we need to include the offset again.
Fixes: 5ad0c88dbe3 ("iris: Replace buffer backing storage and rebind to update addresses.")
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A while back, Michael Larabel noticed that Paraview's Wavelet Volume
case runs significantly slower on iris than i965. It turns out this
is because we enable CCS_E for 32-bit floating point formats, while
i965 disables it, with an oblique comment saying that we benchmarked
it (on what exactly?) and determined that it was a loss.
Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed
large framerate drops when enabling CCS_E for either format. However,
several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit
floating point formats, with no apparent ill effects.
So, disable compression for 32-bit float formats for now, but leave it
enabled for 16-bit float formats as they seem to be working fine.
Improves performance in Paraview's Wavelet Volume test by 62% on a
Skylake GT4e.
Fixes: 3cfc6a207bd ("iris: Fill out res->aux.possible_usages")
|
|
|
|
|
| |
Cc: 19.2 <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Cc: 19.2 <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Fixes: d92689c46f0d2da05ae6 ("etnaviv: nir: add native integers (HALTI2+)")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
| |
Subtractions are already implemented as additions anyway.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Upcoming changes to sub optimization will make this pass required. Over
the course of that series, we see uniforms +.46%, instructions -.24%
(seems like a fine tradeoff -- uniforms are 1/2 the size of instructions
as far as cache occupancy)
Reviewed-by: Daniel Schürmann <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
This patch is based on https://github.com/msys2/MINGW-packages/blob/28e3f85e09b6947ea80036c49f6c38f1394f93ca/mingw-w64-mesa/link-ole32.patch but with tweaks to avoid MSVC build break when applied.
v2: Create Mingw platform alias pointing to windows host platform define to avoid spurious crosscompilation;
v3: Fix obviously wrong compiler flags for swr driver;
v4: Update original patch URL because it has been relocated;
v5: Don't bother patching autools stuff as it's not used by MSYS2 Mingw-w64 build and it's days are numbered anyway;
v6: After Mingw posix flag fix in 295851eb things are far simpler as we don't need more linking of uuid, ole32, version and shell32 than what is already in place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like blob uses following values for uniforms buffer:
0 for 8 bytes
1 for 16 bytes
2 for 24 bytes
2 for 32 bytes
3 for 40 bytes
3 for 48 bytes
3 for 56 bytes
3 for 64 bytes
4 for 72 bytes
It all looks like log2(size / 8) rounded up, so let's do the same.
Fixes: 931fc2a7b3f9("lima: do not set the PP uniforms address lowest bits")
Reviewed-by: Icenowy Zheng <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
| |
Fixes the following piglit test: fragdepth_gles2 (for ETNA_MESA_DEBUG=nir)
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Fixes the following piglit test: fragdepth_gles2
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
It just makes thing more complicated for no reason.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Allows some simplification.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Note it can still be improved a bit:
* Use alu swizzle to determine if src is scalar
* Take into account new immediates in the multiple uniform src lowering
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This can improve performance by allowing the LAST_VARYING_2X bit to be
set when possible (and possibility more benefits on HALTI5 where the
number of components is set for each varying).
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
LOAD starts reading into the first enabled destination component, and
doesn't skip disabled components, so we need to allocate a destination with
contiguous components.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|