| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
All these are actually always used.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can rely on only one kind of synchronization object (drm-syncobj)
when it is available. This reduces the number of file descriptors we
use in our implementation.
This will be required later for timeline semaphores implementation, at
this point we won't ever want to use anything else but syncobjs.
v2: Only use has_syncobj for semaphores (Jason)
v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We spend a lot of time in the driver adding things to hash sets to track
residency. The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them. In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency. Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order. However, without relocations, the
overhead of that is pretty small.
If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point. We're better off swapping
out other apps and just letting ours run a whole frame.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As requested by Ken ;)
v2: Also decode simple batches (Caio)
Fix u_vector usage issues (Lionel)
v3: Make binding/instruction/state/surface available (Lionel)
v4: Going through device pools for simple batches (Lionel)
Centralize search BO callbacks into anv_device.c (Lionel)
v5: Clear decoded batch buffer var after use (Caio)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
| |
This reverts commit e4d88396d259c4ec6032d2834d1c9073d55e9b45.
Apologies, I pushed the wrong commit.
|
|
|
|
|
|
| |
As requested by Ken ;)
Signed-off-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now have multiple BOs in the block pool, but sometimes we still
reference only the first one in some instructions, and use relative
offsets in others. So we must be sure to add all the BOs from the block
pool to the validation list when submitting commands.
v2:
- Don't add block pool BOs to the dependency list right before
execbuf (Jason)
- Call anv_execbuf_add_bo() to each BO in the block pools (Jason)
- Use anv_execbuf_add_bo_set() to add surface state dependencies to
execbuf.
v3:
- Add comment to the non-softpin case (Jason).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This part of the anv_execbuf_add_bo() code is totally independent of the
BO being added. Let's split it out, so we can reuse it later.
v3: rename to anv_execbuf_add_bo_set (Jason).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Change block_pool->bo to be a pointer, and update its usage everywhere.
This makes it simpler to switch it later to a list of BOs.
v3:
- Use a static "bos" field in the struct, instead of malloc'ing it.
This will be later changed to a fixed length array of BOs.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We will need the anv_block_pool_map to find the map relative to some BO
that is not at the start of the block pool.
v2: just return a pointer instead of a struct (Jason)
v4: Update comment (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
normal. If we are near enough to the end, this can cause us to start a
new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We
always reserve enough space at the end for an MI_BATCH_BUFFER_START so
we can just increment cmd_buffer->batch.end prior to emitting the
command.
Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
Tested-by: Alex Smith <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Broadwell and above, we have to use different MOCS settings to allow
the kernel to take over and disable caching when needed for external
buffers. On Broadwell, this is especially important because the kernel
can't disable eLLC so we have to do it in userspace. We very badly
don't want to do that on everything so we need separate MOCS for
external and internal BOs.
In order to do this, we add an anv-specific BO flag for "external" and
use that to distinguish between buffers which may be shared with other
processes and/or display and those which are entirely internal. That,
together with an anv_mocs_for_bo helper lets us choose the right MOCS
settings for each BO use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507
Cc: [email protected]
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
| |
This just separates the reloc list vs. BO set cases and lets us avoid an
allocation if relocs->deps->entries == 0.
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
| |
Co-authored-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we did this weird thing where we left space and an empty
relocation for use in a hypothetical MI_BATCH_BUFFER_START that would be
added to the secondary later. Then, when it came time to chain it into
the primary, we would back that out and emit an MI_BATCH_BUFFER_START.
This worked well but it was always a bit hacky, fragile and ugly. This
commit instead adds a helper for rewriting the MI_BATCH_BUFFER_START at
the end of an anv_batch_bo and we use that helper for both batch bo list
cloning and handling returns from secondaries. The new helper doesn't
actually modify the batch in any way but instead just adjusts the
relocation as needed.
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The only reason we were calling it in the middle was that one of the
cases for figuring out the secondary command buffer execution type
wanted batch_bo->length which gets set by batch_bo_finish. It's easy
enough to recalculate and now batch_bo_finish is called in a sensible
location.
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
References to pinned BOs won't need to be relocated at a later
point, so just write the final value of the reference into the bo
directly.
Add a `set` to the relocation lists for tracking dependencies that
were previously tracked by relocations. When a batch is executed, we
add the referenced pinned BOs to the exec list.
v2: - visit bos from the dependency set in a deterministic order (Jason)
v3: - compar => compare, drat (Jason)
- Reworded commit message, provided by (Jordan)
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Soft pinning lets us satisfy the binding table address
requirements without using both sides of a growing state_pool.
If you do use both sides of a state pool, then you need to read
the state pool's center_bo_offset (with the device mutex held) to
know the final offset of relocations that target the state pool
bo.
By having a separate pool for binding tables that only grows in
the forward direction, the center_bo_offset is always 0 and
relocations don't need an update pass to adjust relocations with
the mutex held.
v2: - don't introduce a separate state flag for separate binding tables (Jason)
- replace bo and map accessors with a single binding_table_pool accessor (Jason)
v3: - assert bt_block->offset >= 0 for the separate binding table (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand):
- Split the blorp bit into it's own patch and re-order a bit
- Use anv_address helpers
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel used to have execbuf parameters to program the INSTPM bit
for whether 3DSTATE_CONSTANT_* should be relative to dynamic state
base address or an absolute address. However, they never worked in
the presence of hardware contexts, so I deleted them a while back.
It doesn't make sense to set this flag, as it doesn't exist anymore.
It also never did anything anyway - the flag is zero, so |'ing it in
did nothing. The default is relative anyway.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unless you have data, the compiler knows better than you whether a
function should be inlined.
No difference in the resulting binary with gcc-6.3.0 or clang-4.0.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In order to implement VK_KHR_external_fence, we need to back our fences
with something that's shareable. Since the kernel wait interface for
sync objects already supports waiting for multiple fences in one go, it
makes anv_WaitForFences much simpler if we only have one type of fence.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
It only applies to legacy BO fences.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
This is just a refactor, similar to what we did for semaphores, in
preparation for handling VK_KHR_external_fence.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit changes fences to work a bit more like BO semaphores.
Instead of the fence being a batch, it's simply a BO that gets added
to the validation list for the last execbuf call in the QueueSubmit
operation. It's a bit annoying finding the last submit in the execbuf
but this allows us to avoid the dummy execbuf.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Probably harmless, but will overwrite errno with a failure status
code. Reported by coverity.
CID 1416600: Argument cannot be negative (NEGATIVE_RETURNS)
Fixes: 5c4e4932e02 (anv: Implement support for exporting semaphores as FENCE_FD)
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The anv_execbuf_add_bo() call can actually fail in practice, which
should cause the QueueSubmit operation to fail. Reported by Coverity.
CID: 1416606: Unchecked return value (CHECKED_RETURN)
Fixes: 017cdb10cf (anv: Submit a dummy batch when only semaphores are provided.)
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Vulkan allows you to do a submit whose only job is to wait on and
trigger semaphores. The easiest way for us to support that right
now is to insert a dummy execbuf.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This patch adds an implementation based on DRM BOs. We don't actually
advertise the extension yet because we want to add a couple more paths
first.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes 32-bit builds of the driver. Commit 08413a81b93dc537fb0c3
changed things so that we now put struct anv_states in the u_vector for
binding tables. On 64-bit builds, sizeof(struct anv_state) is a power
of two but it isn't on 32-bit builds.
Fixes: 08413a81b93dc537fb0c34327ad162f07e8c3427
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
I want to use these in the OpenGL driver as well.
v2: Add to COMMON_FILES in Makefile.sources (caught by Emil)
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reason we were doing this was to ensure that the kernel did the
appropriate cross-ring synchronization and flushing. However, the
kernel only looks at EXEC_OBJECT_WRITE to determine whether or not to
insert a fence. It only cares about the domain for determining whether
or not it needs to clflush the BO before using it for scanout but the
domain automatically gets set to RENDER internally by the kernel if
EXEC_OBJECT_WRITE is set.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
CID: 1405919 (Error handling issues)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Now that the state stream is allocating off of the state pool, there's
no reason why we need the block pool to be separate.
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
| |
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the state_stream is now pulling from a state_pool, the only thing
pulling directly off the block pool is the state pool so we can just
move the block_size there. The one exception is when we allocate
binding tables but we can just reference the state pool there as well.
The only functional change here is that we no longer grow the block pool
immediately upon creation so no BO gets allocated until our first state
allocation.
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
| |
This implementation allocates a 4k BO for each semaphore that can be
exported using OPAQUE_FD and uses the kernel's already-existing
synchronization mechanism on BOs.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|