summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* i965/iris/perf: factor out frequency register captureLionel Landwerlin2019-12-182-14/+24
| | | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113>
* anv/gen12: Temporarily disable VK_KHR_buffer_device_address (and EXT)Caio Marcelo de Oliveira Filho2019-12-171-2/+4
| | | | | | | | For the sake of our testing infrastructure, disable this extension for TGL until we can sort out a hang in Vulkan CTS. Acked-by: Jason Ekstrand <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* intel/vec4: Fix lowering of multiplication by 16-bit constantCaio Marcelo de Oliveira Filho2019-12-171-2/+14
| | | | | | | | | | | | | | Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older platforms that don't support MUL with 32x32 types and use vec4. Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Fix lowering of dword multiplication by 16-bit constantCaio Marcelo de Oliveira Filho2019-12-171-2/+4
| | | | | | | | | | | | | | | Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms that don't support MUL with 32x32 types, including ICL and TGL. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186 Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Export VK_KHR_buffer_device_address only when really supportedIván Briano2019-12-161-1/+1
| | | | | | | | Fixes: 1b6991ba1d8 ("anv: Implement VK_KHR_buffer_device_address") Reviewed-by: Lionel Landwerlin <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
* anv: Export filter_minmax support only when it's really supportedIván Briano2019-12-161-1/+1
| | | | | | | Fixes: bea4d4c78c3 ("anv: add VK_EXT_sampler_filter_minmax support") Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
* anv: drop unused parameter from apply layout passLionel Landwerlin2019-12-163-4/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: constify pipeline layout in nir passesLionel Landwerlin2019-12-163-10/+10
| | | | | | | | Was hoping to find potential issues but nothing. Still probably a good idea. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Lower 64-bit MOVs after lower_load_payload()Caio Marcelo de Oliveira Filho2019-12-141-0/+5
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>
* anv: drop unused #includeEric Engestrom2019-12-131-1/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: replace `0` pointer with `NULL`Eric Engestrom2019-12-131-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: add ASSERTED annotation to avoid "unused variable" warningEric Engestrom2019-12-131-3/+4
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965/iris: perf-queries: don't invalidate/flush 3d pipelineLionel Landwerlin2019-12-132-12/+6
| | | | | | | | | | | | | | | | | | | Our current implementation of performance queries is fairly harsh because it completely flushes and invalidates the 3d pipeline caches at the beginning and end of each query. An argument can be made that this is how performance should be measured but it probably doesn't reflect what the application is actually doing and the actual cost of draw calls. A more appropriate approach is to just stall the pipeline at scoreboard, so that we measure the effect of a draw call without having the pipeline in a completely pristine state for every draw call. v2: Use end of pipe PIPE_CONTROL instruction for Iris (Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: drop batchbuffer flushing at query beginLionel Landwerlin2019-12-131-8/+0
| | | | | | | | | | | | This was initially intended to fix issues with the query timings going occassionally high. It turns out there was a bug in the attribution of OA reports to our context when parsing the OA data. This led to reports flagged with other context IDs to be included in our queries results. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: fix assumptions about temporary fence payloadLionel Landwerlin2019-12-121-9/+14
| | | | | | | | | | | | Since f9a3d9738b12 temporary BO_WSI are definitely a thing so we have an assert wrong. Take that opportunity to expand a bit on an existing comment. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: f9a3d9738b12 ("anv: Use BO fences/semaphores for AcquireNextImage") Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ivan Briano <[email protected]>
* anv: fix fence underlying primitive checksLionel Landwerlin2019-12-121-3/+13
| | | | | | | | | | | | | | We appear to have got lucky that the only type of temporary fence payload we could have was a syncobj and that would only happen when the type of the permanent payload was also a syncobj. This code was broken if that assumption changed and it did in commit f9a3d9738b12. Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ivan Briano <[email protected]>
* anv: Bump the advertised patch version to 129Jason Ekstrand2019-12-111-1/+1
| | | | | | We've been keeping up with the spec updates. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Unconditionally advertise Vulkan 1.1Jason Ekstrand2019-12-111-4/+1
| | | | | | | | | Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support to be actually usable. However, it doesn't strictly require that we support any external handle types. We should be able to advertise 1.1 even on old kernels that don't have syncobj support. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Flush the queue on DeviceWaitIdleJason Ekstrand2019-12-111-3/+16
| | | | | | | | | | | When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we don't, we only have BO waits and those aren't quite as nice. This commit adds a flag to _anv_queue_submit to wait for the queue to drain before returning. This gives us the behavior we need to implement DeviceWaitIdle. Fixes: 246261f0add "anv: prepare the driver for delayed submissions" Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: add mi_builder_test for gen12Eric Engestrom2019-12-111-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Make get_state_size take a full 64-bit address and a baseKenneth Graunke2019-12-102-11/+20
| | | | | | | | | | | | i965 wants to use an offset from a base because everything is in a single buffer whose address may be relocated, and all base addresses are set to the start of that buffer. iris wants to use a full 64-bit address, because state lives in separate buffers which may be in the shader, surface, and dynamic memory zones, where addresses grow downward from the top of a 4GB zone, So it's very possible for a 32-bit offset to exist relative to multiple bases, leading to the wrong state size.
* anv: Enable Gen11 Color/Z write merging optimizationKenneth Graunke2019-12-101-0/+12
| | | | | | | | | | | | | | | | | | | TCCNTLREG contains additional L3 cache write merging optimizations. The default value on my system appears to be: - URB Partial Write Merging (bit 0) - L3 Data Partial Write Merging (bit 2) - TC Disable (bit 3) Windows drivers appear to set bit 1 as well to enable "Color/Z Partial Write Merging". This should solve an issue we were seeing where MRT benchmarks were using substantially more bandwidth than they ought. However, we have not observed it to cause measurable FPS gains. It is unclear whether we should be setting bit 0 or bit 3, so for now we leave those at the hardware default value. Acked-by: Jason Ekstrand <[email protected]>
* intel/genxml: Add a partial TCCNTLREG definitionKenneth Graunke2019-12-101-0/+7
| | | | | | | TCCNTLREG contains additional cache programming settings. In particular, there are several write combining controls we'd like to use. Acked-by: Jason Ekstrand <[email protected]>
* ANV: Stop advertising smoothLines support on gen10+Jason Ekstrand2019-12-101-1/+9
| | | | Reviewed-by: Ivan Briano <[email protected]>
* anv: fix incorrect VMA alignment for CCS main surfacesLionel Landwerlin2019-12-101-3/+14
| | | | | | | | | | | | | | Maybe finer way of dealing with this requirement would be to increase the number of pdevice->memory.types[] to add a category for special alignment cases. Meanwhile this fixes the problem of CCS surface alignment and it's probably not going to cause issues given the size of our address space. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 6af8a4acc4a4 ("anv: Add aux-map translation for gen12+") Reviewed-by: Jason Ekstrand <[email protected]>
* anv: fix missing gen12 handlingLionel Landwerlin2019-12-101-0/+3
| | | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 181be14d4303 ("anv: Build for gen12") Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel: Add pci-ids for Jasper LakeAnuj Phogat2019-12-091-0/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Add device info for 1x4x6 Jasper LakeAnuj Phogat2019-12-091-4/+21
| | | | | | | | Also removing the FIXME comments after matching the numbers with updated documentation. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Re-emit all compute state on pipeline switchJason Ekstrand2019-12-071-0/+7
| | | | | | | | | It's a very odd case to hit in the real world. However, there are some CTS tests which switch back and forth between dispatch and clear without changing the pipeline. Fixes: bc612536eb2f "anv: Emit a dummy MEDIA_VFE_STATE before switching..." Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Re-capture all batch and state buffersJason Ekstrand2019-12-071-6/+3
| | | | | | | | | When we moved from allocating BOs directly to using the BO cache, we lost the EXEC_OBJECT_CAPTURE flag on all our state buffers. Fixes: 3119b96bdf57 "anv: Allocate block pool BOs from the cache" Fixes: ee77938733cd "anv: Allocate batch and fence buffers from..." Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Return VK_ERROR_OUT_OF_DEVICE_MEMORY for too-large buffersJason Ekstrand2019-12-061-0/+9
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use BO fences/semaphores for AcquireNextImageJason Ekstrand2019-12-063-41/+86
| | | | | | | | | | | Instead of doing a dummy submit on the command buffer for the fence or a dummy semaphore and trusting in implicit sync, this commit moves us to take advantage of implicit sync and just use the WSI image BO as the fence. Both semaphores and fences require a tiny bit of extra plumbing to do this but the result is that we can get rid of a bunch of the extra synchronization we're doing today. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add a fence_reset_reset_temporary helperJason Ekstrand2019-12-062-2/+14
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use submit-time implicit sync instead of allocate-timeJason Ekstrand2019-12-062-15/+18
| | | | | | | | | | | | | | | | | | | | | | | In 83b943cc2f24, we started making all VkDeviceMemory BOs resident all the time. One unfortunate side-effect of this is that every vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which means that X server or Wayland compositor, instead of waiting on the last vkQueueSubmit to actually write the buffer, now waits on the last vkQueueSubmit to from that driver instance relative to whenever the compositor's GL driver instance calls execbuf. This potentially leads to a lot of extra synchronization that we didn't intend to have. Instead, this commit makes it so that we leave WSI memory objects with EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of vkQueuePresent. This should hopefully result in tighter integration with the compositor, lower latency, and better performance. Testing with DOOM 2016, this seems to reduce latency by at least a frame if not two and makes the game much more responsive. Testing was, however, subjective, so we don't have any hard data on that. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Always add in EXEC_OBJECT_WRITE when specified in extra_flagsJason Ekstrand2019-12-061-0/+5
| | | | | | | | Otherwise, we're trusting in the execbuf_add_bo which sets EXEC_OBJECT_WRITE to to always be the first one that gets called. This is likely true for fences but it seems somewhat fragile. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Implement VK_KHR_buffer_device_addressJason Ekstrand2019-12-052-7/+53
| | | | | | | | | The primary difference between the KHR and EXT versions of the extension is that the KHR provides the address at AllocateMemory time for replay so we can replay it safely without moving to a sparse address model. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use a pNext loop in AllocateMemoryJason Ekstrand2019-12-051-25/+45
| | | | | | | | | This function has a lot of possible extensions and some of them we can easily handle on-the-fly so it's easier to just have a loop than to find each structure manually. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add allocator support for client-visible addressesJason Ekstrand2019-12-056-10/+107
| | | | | | | | | | When a BO is flagged as having a client visible address, we put it in its own heap. We also support the client explicitly specifying an address in said heap. If an address collision happens, we return false from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add an explicit_address parameter to anv_device_alloc_boJason Ekstrand2019-12-056-7/+26
| | | | | | | | | | We already have a mechanism for specifying that we want a fixed address provided by the driver internals. We're about to let the client start specifying addresses in some very special scenarios as well so we want to pass this through to the allocation function. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stop advertising two heaps just for the VF cache WAJason Ekstrand2019-12-052-67/+6
| | | | | Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Set up VMA heaps independently from memory heapsJason Ekstrand2019-12-052-31/+16
| | | | | | | | | Our VMA allocations are really independent from the memory heaps we expose via the API. The only thing that really matters is the GTT size so we can make the high heap the right size. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stop tracking VMA allocationsJason Ekstrand2019-12-052-13/+5
| | | | | | | | | | util_vma_heap_alloc will already return 0 if it doesn't have enough space. The only thing the vma_*_available tracking was doing was preventing us from allocating too much on any given heap. Now that we're tracking that in the heap itself, we can drop these. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Disallow allocating above heap sizesJason Ekstrand2019-12-051-9/+27
| | | | | | | | | We're already tracking the amount of memory used in each heap. This commit just makes us start rejecting memory allocations if the heap would grow too large. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Don't leak when set_tiling failsJason Ekstrand2019-12-051-3/+4
| | | | | | Fixes: a44744e01d73 "anv: Require a dedicated allocation for..." Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use PIPE_CONTROL flushes to implement the gen8 VF cache WAJason Ekstrand2019-12-056-20/+245
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Apply cache flushes after setting index/draw VBsJason Ekstrand2019-12-051-2/+35
| | | | | Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Always invalidate the VF cache in BeginCommandBufferJason Ekstrand2019-12-051-2/+1
| | | | | | | | | | I think the reason why we only do this for primaries is that we didn't expect to have blorp calls in secondaries. However, you are allowed to have a full render pass in a secondary command buffer so resolves and clears can end up in there. We should just always invalidate. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* blorp: Pass the VB size to the VF cache workaroundJason Ekstrand2019-12-052-6/+8
| | | | | Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add a has_softpin booleanJason Ekstrand2019-12-052-3/+6
| | | | | | | | This separates "has" from "use" which will make the next commit a bit cleaner. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Drop bo_flags from anv_bo_poolJason Ekstrand2019-12-053-14/+3
| | | | | | | | | In ee77938733cd, we started using the BO cache for anv_bo_pool and stopped using the bo_flags parameter. However, we never dropped it from the struct or the init function. Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>