| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
| |
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have track inter-batch dependencies, the flush done in
panfrost_set_framebuffer_state() is no longer needed. Let's get rid of
it.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we have all the pieces in place to support pipelining batches
we can get rid of the drmSyncobjWait() at the end of
panfrost_batch_submit().
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have to flush all batches when we're only interested in
reading/writing a specific BO. Thanks to the
panfrost_flush_batches_accessing_bo() and panfrost_bo_wait() helpers
we can now flush only the batches touching the BO we want to access
from the CPU.
This fixes the dEQP-GLES2.functional.fbo.render.texsubimage.* tests.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This is needed if we want to free the panfrost_batch object at submit
time in order to not have to GC the batch on the next job submission.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Will be useful to make the ioctl(WAIT_BO) call conditional on BOs that
are not exported/imported (meaning that all GPU accesses are known
by the context).
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow us to only flush batches touching a specific resource,
which is particularly useful when the CPU needs to access a BO.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
And use it in panfrost_flush() to flush all batches, and not only the
one currently bound to the context.
We also replace all internal calls to panfrost_flush() by
panfrost_flush_all_batches() ones.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The panfrost_fence logic currently waits on the last submitted batch,
but the batch serialization that was enforced in
panfrost_batch_submit() is about to go away, allowing for several
batches to be pipelined, and the last submitted one is not necessarily
the one that will finish last.
We need to make sure the fence logic waits on all flushed batches, not
only the last one.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The idea is to track which BO are being accessed and the type of access
to determine when a dependency exists. Thanks to that we can build a
dependency graph that will allow us to flush batches in the correct
order.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'll soon need to freeze a batch not only when it's flushed, but also
when another batch depends on us, so let's add a helper to avoid
duplicating the logic.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We just replace the per-context out_sync object by a pointer to the
the fence of the last last submitted batch. Pipelining of batches will
come later.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
So we can implement fine-grained dependency tracking between batches.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
So we can store the flags as data and keep the BO as a key. This way
we keep track of the type of access done on BOs.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type of access being done on a BO has impacts on job scheduling
(shared resources being written enforce serialization while those
being read only allow for job parallelization) and BO lifetime (the
fragment job might last longer than the vertex/tiler ones, if we can,
it's good to release BOs earlier so that others can re-use them
through the BO re-use cache).
Let's pass extra access flags to panfrost_batch_add_bo() and
panfrost_batch_create_bo() so the batch submission logic can take the
appropriate when submitting batches. Note that this information is not
used yet, we're just patching callers to pass the correct flags here.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We know a shader will be used by a batch when
panfrost_patch_shader_state() is called, so let's add the shader BO at
that time.
Suggested-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CTS finally has agreed to drop the requirement for a
565-no-depth-no-stencil config for ES 3.0. Hence we can now remove the
code to satisfy this requirement using a pbuffer-only visual with
whatever other buffers the driver happens to have given us.
This reverts commit 82607f8a900796871470ac4f1a04e154392e4898,
commit 6ad31c4ff33d92f6359b196a94ace99682272111 and
commit dacb11a585face5ca179c34cfc588a71a425c1e0.
v2:
- Reference the VK-GL-CTS issue (Eric E.).
v3:
- Don't revert
fc21394bc4d ("egl: Quiet warning about front buffer rendering for pixmaps/pbuffers")
(Kenneth).
References: VK-GL-CTS issue 1601.
Cc: [email protected]
Signed-off-by: Andres Gomez <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
| |
This script is for updating post version bump.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This script is responsible for generating an entire page in the
docs/relnotes/ directory. It includes a template for the page, and uses
mako to fill in the necessary bits. It is designed to be purely fire and
forget, calculating previous versions, shortlogs, bug fixes, and dates.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next patch is going to introduce a tool that creates the entire
release html page for us, without any user intervention. As such we
can't be editing it. To that end the script will read the
new_features.txt file to get a list of new features.
This is a flat text file, one entry per line.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On 64 bits platforms, some atomic operations like __sync_fetch_and_add()
have constant time, but on 32 bits platforms they are implemented with a
loop and might take much longer.
Additionally, it seems like if their operands are not aligned to 64
bits, they also require extra memory accesses. From the Intel
Architecture's Developer Manual Vol. 1, 4.1.1:
"A word or doubleword operand that crosses a 4-byte boundary or a
quadword operand that crosses an 8-byte boundary is considered
unaligned and requires two separate memory bus cycles for access."
Forcing the u64 field to be aligned to 64 bits seems to make the unit
tests that are stressing this finish much faster.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This replaces to old Bugzilla: tag, which no longer makes sense because
we don't use bugzilla anymore.
Reviewed-by: Eric Anholt <[email protected]> (v1)
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v1 by Topi Pohjolainen
v2,v3 by Anuj Phogat:
- Apply for gen >= 11
- Remove wa_bug_xxx function
- Use helper functions
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
RADV and RadeonSI both lower these two NIR instructions.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lowers fmod and frem at NIR level like RadeonSI. fmod is
already lowered directly in NIR->LLVM, and frem will be lowered by
LLVM anyways.
This fixes a LLVM crash with:
dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... because it's wrong to do so. The error path out of
dri2_initialize_drm ends with dri2_display_destroy, which calls
functions in the vtable we're trying to set up, so if we dlclose the
driver then those function pointers will point off into space and things
crash.
Noticed this because after !1923 eglinfo would crash when setting up the
GBM platform. This was something of a cascade failure, because my kernel
is too old for DRM_IOCTL_I915_GETPARAM to work without DRM_AUTH, so i965
wouldn't load. platform_drm.c then got very confused when it tries to
load swrast as a dri2 driver.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need the continue CS for referencing the tess/GDS/sample position BOs.
Fixes: 46e52df34d3 "radv: add tessellation ring allocation support. (v2)"
Fixes: e1dc3ab7534 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout"
Fixes: 1171b304f30 "radv: overhaul fragment shader sample positions."
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of a single cache shared between all jobs, but reduce the
maximum cache size to 1.5G (from 5G).
Rationale for smaller cache:
Pulling & pushing a 5G cache could take a long time. Consider
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/684010 (click the "Show
complete raw" button to see timestamps): Pulling the cache took
1569927241-1569927194 = 47 seconds, pushing it 1569927671-1569927519
= 152, for a total of 199 seconds. The actual build took comparable
1569927518-1569927243 = 275 seconds, despite no cache hits from ccache.
In other words, the cache transfers almost doubled the job duration,
and they would have negated any build time benefits from ccache even
with a high cache hit rate.
Also, the smaller caches avoid blowing up storage requirements for them
too much.
Rationale for per-job caches:
Making a single cache significantly smaller might result in cached
build products from one job getting evicted by another job, reducing
the likelihood of cache hits from previous pipelines.
v2:
* Move up "ccache --max-size=1500M" call (Eric Engestrom)
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
To truly to do this correctly, we'll have to fix the discrepancy between
drm_virtgpu_3d_transfer_to_host and virtio_gpu_transfer_host_3d. However,
this is a good starting point.
Since virtio-gpu only supports self-import and export, this should be fine.
Let's only do WINSYS_HANDLE_TYPE_FD for this currently.
Reviewed by: Robert Tarasov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The winsys might supply dimensions that are different than
those we calculate. In additional, it may supply virtualized
modifiers.
In practice, a stride != bpp * width and virtualized modifiers don't
happen yet, but the plan is to move in that direction.
Also make virgl_resource_layout static.
Reviewed by: Robert Tarasov <[email protected]>
|
|
|
|
|
|
|
| |
This commit makes no functional changes, just adds the revelant
plumbing.
Reviewed by: Robert Tarasov <[email protected]>
|
|
|
|
|
|
|
| |
It's not used anywhere, and stride isn't really an intrinsic
property of a GEM buffer.
Reviewed by: Robert Tarasov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i915 will report ENODEV on generations prior to Haswell because there
is no point in reporting values on those. This is prior any fusing
could happen on parts with identical PCI ids.
This query call was previously only triggered on generations that
support performance queries, which happens to match generation for
which i915 reports topology, but the commit pointed below started
using it on all generations.
Signed-off-by: Lionel Landwerlin <[email protected]>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1860
Cc: <[email protected]>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
Random hangs no longer happen, I'm actually not sure if they were
related to this.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Forgot to amend the commit before updating the MR.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
This was actually the wrong fix.
This reverts commit 0a313cc285c2939de9cac07f045b0b699bc208ca.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Make sure to export the expected clear values to the depth
stencil attachment.
This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The number of vertices has to be adjusted with the output primitive
type.
This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The GS outputs are stored differently in the LDS storage, they
are indexed by out_idx which is incremented for each stored DWORD.
Thus, we need a different path for exporting the stream outputs.
This fixes a bunch of CTS failures when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
It's unnecessary to store/load more components that needed.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LDS storage allocated for stream outputs is 4 * N, where N
is the number of outputs. So, we have to store/load with N as index
and not with the output location as index.
This doesn't fix anything known but it should fix out-of-bounds
access and it also reduces the number of outputs written to the
LDS storage.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
The buffer isn't necessarily used before.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
The last of these was deleted in 44a8e5135470fa51ae36 ("d3d1x: Remove.")
over 6 years ago.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware has a bug with triangle strips and it is signalled by the
flag BUG_FIXED8 whether this bug has been fixed. So only enable triangle
strips when this flag is set.
Thanks: Jonathan Marek and Christian Gmeiner for the pointers
v2: Add TODO to indicate that the handling should be refined
(Jonathan & Christian)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|