| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113>
|
|
|
|
|
|
|
|
| |
For the sake of our testing infrastructure, disable this extension
for TGL until we can sort out a hang in Vulkan CTS.
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older
platforms that don't support MUL with 32x32 types and use vec4.
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms
that don't support MUL with 32x32 types, including ICL and TGL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes: 1b6991ba1d8 ("anv: Implement VK_KHR_buffer_device_address")
Reviewed-by: Lionel Landwerlin <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
|
|
|
|
|
|
|
| |
Fixes: bea4d4c78c3 ("anv: add VK_EXT_sampler_filter_minmax support")
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Was hoping to find potential issues but nothing. Still probably a good
idea.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our current implementation of performance queries is fairly harsh
because it completely flushes and invalidates the 3d pipeline caches
at the beginning and end of each query. An argument can be made that
this is how performance should be measured but it probably doesn't
reflect what the application is actually doing and the actual cost of
draw calls.
A more appropriate approach is to just stall the pipeline at
scoreboard, so that we measure the effect of a draw call without
having the pipeline in a completely pristine state for every draw
call.
v2: Use end of pipe PIPE_CONTROL instruction for Iris (Ken)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was initially intended to fix issues with the query timings going
occassionally high.
It turns out there was a bug in the attribution of OA reports to our
context when parsing the OA data. This led to reports flagged with
other context IDs to be included in our queries results.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since f9a3d9738b12 temporary BO_WSI are definitely a thing so we have
an assert wrong.
Take that opportunity to expand a bit on an existing comment.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: f9a3d9738b12 ("anv: Use BO fences/semaphores for AcquireNextImage")
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Ivan Briano <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We appear to have got lucky that the only type of temporary fence
payload we could have was a syncobj and that would only happen when
the type of the permanent payload was also a syncobj.
This code was broken if that assumption changed and it did in commit
f9a3d9738b12.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Ivan Briano <[email protected]>
|
|
|
|
|
|
| |
We've been keeping up with the spec updates.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support
to be actually usable. However, it doesn't strictly require that we
support any external handle types. We should be able to advertise 1.1
even on old kernels that don't have syncobj support.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we
don't, we only have BO waits and those aren't quite as nice. This
commit adds a flag to _anv_queue_submit to wait for the queue to drain
before returning. This gives us the behavior we need to implement
DeviceWaitIdle.
Fixes: 246261f0add "anv: prepare the driver for delayed submissions"
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.
iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone, So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
TCCNTLREG contains additional cache programming settings. In
particular, there are several write combining controls we'd like to use.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Ivan Briano <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.
Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 6af8a4acc4a4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 181be14d4303 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Also removing the FIXME comments after matching the numbers with
updated documentation.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's a very odd case to hit in the real world. However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.
Fixes: bc612536eb2f "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.
Fixes: 3119b96bdf57 "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733cd "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing a dummy submit on the command buffer for the fence or a
dummy semaphore and trusting in implicit sync, this commit moves us to
take advantage of implicit sync and just use the WSI image BO as the
fence. Both semaphores and fences require a tiny bit of extra plumbing
to do this but the result is that we can get rid of a bunch of the extra
synchronization we're doing today.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 83b943cc2f24, we started making all VkDeviceMemory BOs resident all
the time. One unfortunate side-effect of this is that every
vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which
means that X server or Wayland compositor, instead of waiting on the
last vkQueueSubmit to actually write the buffer, now waits on the last
vkQueueSubmit to from that driver instance relative to whenever the
compositor's GL driver instance calls execbuf. This potentially leads
to a lot of extra synchronization that we didn't intend to have.
Instead, this commit makes it so that we leave WSI memory objects with
EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and
set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of
vkQueuePresent. This should hopefully result in tighter integration
with the compositor, lower latency, and better performance.
Testing with DOOM 2016, this seems to reduce latency by at least a frame
if not two and makes the game much more responsive. Testing was,
however, subjective, so we don't have any hard data on that.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called. This
is likely true for fences but it seems somewhat fragile.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The primary difference between the KHR and EXT versions of the extension
is that the KHR provides the address at AllocateMemory time for replay
so we can replay it safely without moving to a sparse address model.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This function has a lot of possible extensions and some of them we can
easily handle on-the-fly so it's easier to just have a loop than to find
each structure manually.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When a BO is flagged as having a client visible address, we put it in
its own heap. We also support the client explicitly specifying an
address in said heap. If an address collision happens, we return false
from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We already have a mechanism for specifying that we want a fixed address
provided by the driver internals. We're about to let the client start
specifying addresses in some very special scenarios as well so we want
to pass this through to the allocation function.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Our VMA allocations are really independent from the memory heaps we
expose via the API. The only thing that really matters is the GTT size
so we can make the high heap the right size.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
util_vma_heap_alloc will already return 0 if it doesn't have enough
space. The only thing the vma_*_available tracking was doing was
preventing us from allocating too much on any given heap. Now that
we're tracking that in the heap itself, we can drop these.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We're already tracking the amount of memory used in each heap. This
commit just makes us start rejecting memory allocations if the heap
would grow too large.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Fixes: a44744e01d73 "anv: Require a dedicated allocation for..."
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I think the reason why we only do this for primaries is that we didn't
expect to have blorp calls in secondaries. However, you are allowed to
have a full render pass in a secondary command buffer so resolves and
clears can end up in there. We should just always invalidate.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This separates "has" from "use" which will make the next commit a bit
cleaner.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In ee77938733cd, we started using the BO cache for anv_bo_pool and
stopped using the bo_flags parameter. However, we never dropped it from
the struct or the init function.
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|