aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/vulkan
Commit message (Collapse)AuthorAgeFilesLines
...
* anv: Add allocator support for client-visible addressesJason Ekstrand2019-12-056-10/+107
| | | | | | | | | | When a BO is flagged as having a client visible address, we put it in its own heap. We also support the client explicitly specifying an address in said heap. If an address collision happens, we return false from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add an explicit_address parameter to anv_device_alloc_boJason Ekstrand2019-12-056-7/+26
| | | | | | | | | | We already have a mechanism for specifying that we want a fixed address provided by the driver internals. We're about to let the client start specifying addresses in some very special scenarios as well so we want to pass this through to the allocation function. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stop advertising two heaps just for the VF cache WAJason Ekstrand2019-12-052-67/+6
| | | | | Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Set up VMA heaps independently from memory heapsJason Ekstrand2019-12-052-31/+16
| | | | | | | | | Our VMA allocations are really independent from the memory heaps we expose via the API. The only thing that really matters is the GTT size so we can make the high heap the right size. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stop tracking VMA allocationsJason Ekstrand2019-12-052-13/+5
| | | | | | | | | | util_vma_heap_alloc will already return 0 if it doesn't have enough space. The only thing the vma_*_available tracking was doing was preventing us from allocating too much on any given heap. Now that we're tracking that in the heap itself, we can drop these. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Disallow allocating above heap sizesJason Ekstrand2019-12-051-9/+27
| | | | | | | | | We're already tracking the amount of memory used in each heap. This commit just makes us start rejecting memory allocations if the heap would grow too large. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Don't leak when set_tiling failsJason Ekstrand2019-12-051-3/+4
| | | | | | Fixes: a44744e01d73 "anv: Require a dedicated allocation for..." Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use PIPE_CONTROL flushes to implement the gen8 VF cache WAJason Ekstrand2019-12-056-20/+245
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Apply cache flushes after setting index/draw VBsJason Ekstrand2019-12-051-2/+35
| | | | | Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Always invalidate the VF cache in BeginCommandBufferJason Ekstrand2019-12-051-2/+1
| | | | | | | | | | I think the reason why we only do this for primaries is that we didn't expect to have blorp calls in secondaries. However, you are allowed to have a full render pass in a secondary command buffer so resolves and clears can end up in there. We should just always invalidate. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* blorp: Pass the VB size to the VF cache workaroundJason Ekstrand2019-12-051-0/+1
| | | | | Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add a has_softpin booleanJason Ekstrand2019-12-052-3/+6
| | | | | | | | This separates "has" from "use" which will make the next commit a bit cleaner. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Drop bo_flags from anv_bo_poolJason Ekstrand2019-12-053-14/+3
| | | | | | | | | In ee77938733cd, we started using the BO cache for anv_bo_pool and stopped using the bo_flags parameter. However, we never dropped it from the struct or the init function. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Fix error message format stringIan Romanick2019-12-041-5/+2
| | | | | | | | See also 246261f0addf Reviewed-by: Eric Engestrom <[email protected]> CID: 1455892 Fixes: 246261f0add ("anv: prepare the driver for delayed submissions")
* anv: Use 3DSTATE_CONSTANT_ALL when possible.Rafael Antognolli2019-12-041-3/+90
| | | | | | | | | | | | | | | | | | | | | | | | Use this new instruction introduced in Gen12. The instruction itself is smaller, and it also allows us to emit a single instruction to all stages that have the same push constant buffers (e.g. when they don't have constant buffers). There's one restriction to use this instruction, though: the length field is only 5 bits long, so we need to check whether we can use it, and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32. v2: - Rebased on top of the lasted changes from Jason. - Added review suggestions by Caio. - Removed struct push_bos and merged some code into anv_nir_compute_push_layout(). v3: - Remove code churn due to gen8+ workaround in anv_nir_compute_push_layout(). This code has been removed in an earlier commit, and implemented in cmd_buffer_emit_push_constant(). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Move code for emitting push constants into its own function.Rafael Antognolli2019-12-041-43/+57
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Add get_push_range_address() helper.Rafael Antognolli2019-12-041-59/+70
| | | | | | | | | | | Add a helper function to get the push range address. Once we have a separate function for emitting gen12 push constants, we can use this helper and avoid duplicating code. v3: Do not add range->start to the address in gen7 (Caio). v4: Do not drop range->start from gen7 (Caio, Jason). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Move gen8+ push constant packet workaround.Rafael Antognolli2019-12-042-21/+31
| | | | | | | | | | | | | | | | | | | Store push_ranges in ascending order, and only "shift" them to the end of the array during state packet emission. We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet. So instead of applying the workaround here just for GEN < 12 (which requires and extra loop through all the ranges to figure out if we should shift them or not), we simply move the whole logic to the state emission code. At that point, in a later commit, we are already looping through all of the ranges anyway to check which packet we will be using, so we might as well implement the workaround there, where it is going to be used. v3: Move gen8+ workaround to the state emission code (Caio). v4: Add explanation of why we moved the workaroudn (Caio). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Respect the always_flush_cache driconf optionJason Ekstrand2019-12-033-0/+12
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Set up SBE_SWIZ properly for gl_ViewportJason Ekstrand2019-12-031-2/+2
| | | | | | | | | gl_Viewport is also in the VUE header so we need to whack the read offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that case as well. Cc: [email protected] Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Push constants are relative to dynamic state on IVBJason Ekstrand2019-11-261-0/+17
| | | | | | Fixes: aecde2351 "anv: Pre-compute push ranges for graphics pipelines" Closes: #2136 Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/entrypoints: Better handle promoted extensionsJason Ekstrand2019-11-261-9/+25
| | | | | | | | | | | In the case of promoted extensions we can end up with an entrypoint that we support being an alias of an entrypoint we do not support. For instance, if an extension gets promoted from EXT to KHR, the EXT entry- points may be aliases of the KHR ones. We want to leave everything as EXT until we get around to advertising the KHR so that we don't break things when we update the XML and headers. Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: move data.image.access to data.accessMarek Olšák2019-11-191-2/+2
| | | | | | The size of the data structure doesn't change. Reviewed-by: Connor Abbott <[email protected]>
* anv: add missing "fall-through" annotationEric Engestrom2019-11-191-0/+1
| | | | | | | CoverityID: 1455884 Fixes: c1c346f1667375e9330a ("anv: implement VK_KHR_separate_depth_stencil_layouts") Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* intel: Add workaround for stencil state.Rafael Antognolli2019-11-191-0/+14
| | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]>
* anv: Emit a NULL vertex for zero base_vertex/instanceJason Ekstrand2019-11-181-11/+16
| | | | | | | | | | If both are zero (the common case), we can emit a null vertex buffer rather than emitting a vertex buffer with zeros in it. The packing of the VERTEX_BUFFER_STATE is faster because no relocation is emitted and we can avoid creating the vertex buffer which means one less anv_state_stream_alloc. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use an anv_state for the next binding tableJason Ekstrand2019-11-182-12/+15
| | | | | | | | | | This is a bit more natural because we're already getting an anv_state most places in the pipeline. The important part here, however, is that we're no longer calling anv_block_pool_map on every alloc_binding_table call. While it's probably pretty cheap, it is potentially a linear walk over the list of BOs and it was showing up in profiles. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: More carefully dirty state in BindPipelineJason Ekstrand2019-11-187-25/+101
| | | | | | | | | | | | | | | Instead of blindly dirtying descriptors and push constants the moment we see a pipeline change, check to see if it actually changes the bind layout or push constant layout. This doubles the runtime performance of one CPU-limited example running with the Dawn WebGPU implementation when running on my laptop. NOTE: This effectively reverts beca63c6c07. While it was a nice optimization, it was based on prog_data and we can't do that anymore once we start allowing the same binding table to be used with multiple different pipelines. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: More carefully dirty state in BindDescriptorSetsJason Ekstrand2019-11-184-22/+51
| | | | | | | | | | | | Instead of dirtying all graphics or all compute based on binding point, we're now much more careful. We first check to see if the actual descriptor set changed and then only dirty the stages used by that descriptor set. For dynamic offsets, we keep a bitfield per-stage of which offsets are actually used in that stage and we only dirty push constants and descriptors if that stage has dynamic offsets AND those offsets actually change. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use a switch statement for binding table setupJason Ekstrand2019-11-181-117/+127
| | | | | | | | | It theoretically could be more efficient but the real point here is that it's no longer really a matter of dealing with special cases and then the "real" thing. The way we're handling binding tables, it's more of a multi-step process and a switch is more natural. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Rework push constant handlingJason Ekstrand2019-11-1810-227/+176
| | | | | | | | | | | | | | | | | | This substantially reworks both the state setup side of push constant handling and the pipeline compile side. The fundamental change here is that we're no longer respecting the prog_data::param array and instead are just instructing the back-end compiler to leave the array alone. This makes the state setup side substantially simpler because we can now just memcpy the whole block of push constants and don't have to upload one DWORD at a time. This also means that we can compute the full push constant layout up-front and just trust the back-end compiler to not mess with it. Maybe one day we'll decide that the back-end compiler can do useful things there again but for now, this is functionally no different from what we had before this commit and makes the NIR handling cleaner. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Re-arrange push constant data a bitJason Ekstrand2019-11-183-23/+46
| | | | | | | | | | This moves the compute stuff into a anv_push_constants::cs sub-struct. It also moves dynamic offsets into the push constants. This means we have to duplicate the data per-stage but that doesn't seem like the end of the world and one day we may wish to make dynamic offsets per-stage anyway. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Add a flag to avoid compacting push constantsJason Ekstrand2019-11-181-0/+1
| | | | | | | In vec4, we can just not run the pass. In fs, things are a bit more deeply intertwined. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Pre-compute push ranges for graphics pipelinesJason Ekstrand2019-11-187-64/+136
| | | | | | | | | It turns off that emitting push constants is one of the hottest paths in the driver and ANY work we do there costs us. By pre-computing things a bit ahead of time, we shave 5% off the runtime of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stop bounds-checking pushed UBOsJason Ekstrand2019-11-181-28/+10
| | | | | | | | | | | | | | | The bounds checking is actually less safe than just pushing the data. If the bounds checking actually ever kicks in and it's not on the last UBO push range, then the shrinking will cause all subsequent ranges to be pushed to the wrong place in the GRF. One of the behaviors we definitely don't want is for OOB UBO access to result in completely unrelated UBOs returning garbage values. It's safer to just push the UBOs as-requested. If we're really concerned about robustness, we can emit shader code to do bounds checking which should be stupid cheap (a CMP followed by SEL). Cc: [email protected] Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Delete dead shader constant pushing codeJason Ekstrand2019-11-182-13/+7
| | | | | | | | As of 2d78e55a8c5481, nir_intrinsic_load_constant with a constant offset is constant-folded so we should never end up with any that trigger brw_nir_analyze_ubo_ranges. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Flatten descriptor bindings in anv_nir_apply_pipeline_layoutJason Ekstrand2019-11-186-76/+54
| | | | | | | | This lets us stop tracking the pipeline layout. It also means less indirection on a very hot path. As an extra bonus, we can make some of our data structures smaller. No measurable CPU overhead improvement. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Input attachments are always single-planeJason Ekstrand2019-11-181-2/+3
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Assume layout != NULLJason Ekstrand2019-11-181-21/+19
| | | | | | | | | In the early days of the driver we allowed layout to be VK_NULL_HANDLE and used that for some internal pipelines when we wanted to be lazy. Vulkan doesn't actually allow NULL layouts, however, so there's no reason to have this check. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Initialize depth_bounds_test_enable when not explicitly setCaio Marcelo de Oliveira Filho2019-11-131-2/+1
| | | | | | | | | This was causing uninitialized value to end up propagated to the 3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet building due to the value being greater than 1. Fixes: 939ddccb7a5 ("anv: Add support for depth bounds testing.") Reviewed-by: Plamena Manolova <[email protected]>
* anv: Use mocs settings from isl_dev.Rafael Antognolli2019-11-126-74/+15
| | | | | | | v2: Remove device->default_mocs and external_mocs (Jason). Reviewed-by: Jordan Justen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* anv: implement VK_KHR_timeline_semaphoreLionel Landwerlin2019-11-115-72/+734
| | | | | | | | | | | | | | | | | v2: Fix inverted condition in vkGetPhysicalDeviceExternalSemaphoreProperties() v3: Add anv_timeline_* helpers (Jason) v4: Avoid variable shadowing (Jason) Split timeline wait/signal device operations (Jason/Lionel) v5: s/point/signal_value/ (Jason) Drop piece of drm-syncobj timeline code (Jason) v6: Add missing sync_fd semaphore signaling (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Plumb timeline semaphore signal/wait values through from the APIJason Ekstrand2019-11-112-3/+22
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/wsi: signal the semaphore in the acquireNextImageLionel Landwerlin2019-11-111-4/+20
| | | | | | | | | | | We seem to have forgotten about the semaphore in the acquireNextImageInfo. v2: Signal semaphore/fence regardless of presentation status (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Lock around fetching sync file FDs from semaphoresJason Ekstrand2019-11-111-13/+26
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: prepare the driver for delayed submissionsLionel Landwerlin2019-11-114-376/+616
| | | | | | | | | | | | | | | | | | | | | | | | Timeline semaphore introduce support for wait before signal behavior, which means that it is now allowed to call vkQueueSubmit() with wait semaphores not yet submitted for execution. Our kernel driver requires all of the wait primitives to be created before calling the execbuf ioctl. As a result, we must delay submissions in the userspace driver. This change store the necessary information to be able to delay a VkSubmitInfo submission to the kernel driver. v2: Fold count++ into array access (Jason) Move queue list to another patch (Jason) v3: Document cleanup of temporary semaphores (Jason) v4: Track semaphores of SYNC_FD type that needs updating after delayed submission v5: Don't forget to update sync_fd in signaled semaphores after submission (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: refcount semaphoresLionel Landwerlin2019-11-112-6/+26
| | | | | | | | | | | | | | Delayed submissions required by timeline semaphores mean we need to be able to update the sync fd backed semaphores in a delayed fashion. This could mean a race between the application destroying the semaphore and the submission code trying to update it with the new sync fd. This change prepares semaphores to be refcounted, we'll most likely only take a reference for cases where we signal a sync fd semaphore. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: prepare driver to report submission error through queuesLionel Landwerlin2019-11-115-24/+60
| | | | | | | | | | | | | | | | | When we will submit to i915 from a submission thread, we won't be able to directly report the error to the user (in particular through the debug report callbacks). So prepare 2 paths to report errors device -> notifying the user immediately, queue -> notifying the user the next time an entry point is called. In this change we still report directly for both paths, this will change in the next commit. v2: Split NULL batch parameter handling in anv_queue_submit_simple_batch() in a different commit Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: allow NULL batch parameter to anv_queue_submit_simple_batchLionel Landwerlin2019-11-112-19/+17
| | | | | | | We can reuse device->trivial_batch_bo Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: move queue init/finish to anv_queue.cLionel Landwerlin2019-11-113-22/+30
| | | | | | | | Prepare the queue initialization to take on more responsabilities and possibly fail. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>