| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Include Panfrost's gitlab.ci.yml file from Mesa's main .gitlab-ci.yml so
we test on devices with Panfrost.
This uses LAVA to schedule jobs in the devices and will be the base for
testing Etnaviv, Lima, etc.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a port of Nanley's 904c2a617d86944fbdc2c955f327aacd0b3df318
from i965 to iris.
One concern is that iris uses larger batches, and also emits far fewer
commands, so we may come closer to the 500 limit within a batch, and
could need to supplement this with actual counting. Manhattan 3.0 had
239 3DSTATE_CONSTANT_PS packets in a batch, Unigine Valley had 155.
So it seems like we're still in the realm of safety.
|
|
|
|
|
| |
We'll need this for a workaround shortly. While refactoring, also
improve the comment slightly.
|
|
|
|
|
|
|
|
| |
Just following the spec. Somewhat unclear whether this applies to NULL
surfaces.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This doesn't seem to affect anything.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears we never had a test in piglit or deqp sampling from a null
surface...
It turns out this triggers a hang on IVB only. Updating the null
surface format to R32_UINT fixes the hang on ivb and doesn't affect
other platforms, so set it by default for all platforms.
Signed-off-by: Lionel Landwerlin <[email protected]>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1872
Cc: <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This should improve texture sampling performance on GC3000.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Update to etna_viv commit 7ff8029.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're missing the offset of the slice in the subslice mask...
This worked for most platforms that don't have first slice fused off
because we would reread the same mask from slice0 again and again...
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1869
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were supplying __DRI2_THROTTLE_SWAPBUFFER, rather than the obvious
choice of __DRI2_THROTTLE_COPYSUBBUFFER. This meant that we hit the
swap-based frame throttling. glXCopySubBuffer doesn't seem like it's
intended to be a frame boundary, so we'd like to avoid this throttling.
Tested-by: Michel Dänzer <[email protected]> # DRI3 only
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glXCopySubBufferMESA copies data from the back buffer to the front,
so it needs to perform a MSAA downsampling operation just like
glXSwapBuffers would.
Currently, the CopySubBuffer implementations supply a throttle reason
of __DRI2_THROTTLE_SWAPBUFFERS, so they hit this path and work today.
But we'd like to avoid swapbuffer throttling in this case, so the next
patch will change that reason.
Tested-by: Michel Dänzer <[email protected]> # DRI3 only
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Prodea Alexandru-Liviu <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Cc: <[email protected]>
When building in a MSYS2 Mingw-w64 environment Mesa3D sets wrong default build options which inevitably lead to build failure.
|
|
|
|
|
|
|
|
| |
Not much to do to enable this, just make sure to always write to the
GGTT :)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I thought I fixed this, but I guess I must have broken it again.
Fixes various dEQP-VK.draw.* tests
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Working on the algebraic implementation, I was being driven nuts by my
editor not highlighting and handling indentation for the C code. It turns
out that it's basically not pass-specific code, and we can move it over to
the relevant .c file. Replaces 30KB of code with 34KB of data on my i965
build. No perf diff on shader-db (n=3)
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6cb ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Having passes generate these is just making more work for copy
propagation (and thus probably calling more optimization passes)
later. Noticed while trying to debug nir_opt_algebraic()
top-to-bottom having O(n^2) behavior due to not finding new matches in
replacement code.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Feature parity with the drm, x11, and wayland platforms.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1870
Tested-by: Pekka Paalanen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Similar to the previous commit, we should also initialize
needs_helper_invocations here.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This matches what we do for uses_sample_qualifier, and what we
do in ir_set_program_inouts.cpp as well.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).
Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)
At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.
v2: remove fmod/frem handling in init_context()
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
| |
Fixes eebe091d292609de8aaf5b5c537d867b23908947
("scons/windows: Enable compute shaders when possible.")
Signed-off-by: Prodea Alexandru-Liviu <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
0 is __DRI2_THROTTLE_SWAPBUFFER, which doesn't really make sense here.
Avoids dri_flush() throttling twice for the same glFlush call with front
buffer rendering, as described in
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2057 .
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the instruction interpolateAtCentroid is used the extra interpolator
must also be enabled in the state.
Fixes: fs-interpolateatcentroid-block
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 3b265f61f5f61f08718fe5bb4b2726f9b8e016cc
("meson: gallium media state trackers require libdrm with x11")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1878
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we have live_out calculated per block as metadata, calculating
liveness of an instruction at a given point in the program becomes O(n)
to the size of the block worst-case, rather than O(n) the program.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This needs to be correct or the analysis fails.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Callers should have liveness info ready. Ideally we'd have a nice
metadata tracking framework like NIR to handle this automatically, but
for now this will allow us to make forward progress... when we're about
to do something with liveness, invalidate everything ahead to force a
clean calculation.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to explicitly invalidate liveness analysis results so
we can cache liveness results.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
By definition, once liveness analysis has occurred:
live_out = OR {succ} succ->live_in
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There are unfortunately two distinct liveness analysis passes in the
compiler right now -- one good (but complex) pass used by RA based on
solving data flow equations, and one awful (but simple) pass used for
dead code elimination and bundling based on an abstract walk of the AST.
Let's move RA's pass into shared code so we can work on unifying.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This allows us to fill in ctx->temp_count explicitly, even if we haven't
squished down the MIR.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
We already enforce this with the SSA/register distinction in the
backend. There is no need to duplicate this logic merely for an assert.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Reviewed-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
| |
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
| |
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have track inter-batch dependencies, the flush done in
panfrost_set_framebuffer_state() is no longer needed. Let's get rid of
it.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have all the pieces in place to support pipelining batches
we can get rid of the drmSyncobjWait() at the end of
panfrost_batch_submit().
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have to flush all batches when we're only interested in
reading/writing a specific BO. Thanks to the
panfrost_flush_batches_accessing_bo() and panfrost_bo_wait() helpers
we can now flush only the batches touching the BO we want to access
from the CPU.
This fixes the dEQP-GLES2.functional.fbo.render.texsubimage.* tests.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This is needed if we want to free the panfrost_batch object at submit
time in order to not have to GC the batch on the next job submission.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Will be useful to make the ioctl(WAIT_BO) call conditional on BOs that
are not exported/imported (meaning that all GPU accesses are known
by the context).
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow us to only flush batches touching a specific resource,
which is particularly useful when the CPU needs to access a BO.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
And use it in panfrost_flush() to flush all batches, and not only the
one currently bound to the context.
We also replace all internal calls to panfrost_flush() by
panfrost_flush_all_batches() ones.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The panfrost_fence logic currently waits on the last submitted batch,
but the batch serialization that was enforced in
panfrost_batch_submit() is about to go away, allowing for several
batches to be pipelined, and the last submitted one is not necessarily
the one that will finish last.
We need to make sure the fence logic waits on all flushed batches, not
only the last one.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|