summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: Use mocs settings from isl_dev.Rafael Antognolli2019-11-126-74/+15
| | | | | | | v2: Remove device->default_mocs and external_mocs (Jason). Reviewed-by: Jordan Justen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add MOCS settings to isl_device.Rafael Antognolli2019-11-122-0/+57
| | | | | | | Centralize mocs settings into isl. Reviewed-by: Jordan Justen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* intel/blorp: Fix usage of uninitialized memory in key hashingDanylo Piliaiev2019-11-121-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The automatically generated padding in structs contains undefined values, force pack the structs to eliminate the padding. Otherwise structs with the same values may generate different hashes. Valgrind output: Conditional jump or move depends on uninitialised value(s) util_fast_urem32 (fast_urem_by_const.h:71) hash_table_search (hash_table.c:262) _mesa_hash_table_search (hash_table.c:296) anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318) anv_pipeline_cache_search (anv_pipeline_cache.c:335) lookup_blorp_shader (anv_blorp.c:38) blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112) blorp_mcs_partial_resolve (blorp_clear.c:1205) anv_image_mcs_op (anv_blorp.c:1742) anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774) transition_color_buffer (genX_cmd_buffer.c:1159) cmd_buffer_end_subpass (genX_cmd_buffer.c:4840) Uninitialised value was created by a stack allocation blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103) Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: implement VK_KHR_timeline_semaphoreLionel Landwerlin2019-11-115-72/+734
| | | | | | | | | | | | | | | | | v2: Fix inverted condition in vkGetPhysicalDeviceExternalSemaphoreProperties() v3: Add anv_timeline_* helpers (Jason) v4: Avoid variable shadowing (Jason) Split timeline wait/signal device operations (Jason/Lionel) v5: s/point/signal_value/ (Jason) Drop piece of drm-syncobj timeline code (Jason) v6: Add missing sync_fd semaphore signaling (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Plumb timeline semaphore signal/wait values through from the APIJason Ekstrand2019-11-112-3/+22
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/wsi: signal the semaphore in the acquireNextImageLionel Landwerlin2019-11-111-4/+20
| | | | | | | | | | | We seem to have forgotten about the semaphore in the acquireNextImageInfo. v2: Signal semaphore/fence regardless of presentation status (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Lock around fetching sync file FDs from semaphoresJason Ekstrand2019-11-111-13/+26
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: prepare the driver for delayed submissionsLionel Landwerlin2019-11-114-376/+616
| | | | | | | | | | | | | | | | | | | | | | | | Timeline semaphore introduce support for wait before signal behavior, which means that it is now allowed to call vkQueueSubmit() with wait semaphores not yet submitted for execution. Our kernel driver requires all of the wait primitives to be created before calling the execbuf ioctl. As a result, we must delay submissions in the userspace driver. This change store the necessary information to be able to delay a VkSubmitInfo submission to the kernel driver. v2: Fold count++ into array access (Jason) Move queue list to another patch (Jason) v3: Document cleanup of temporary semaphores (Jason) v4: Track semaphores of SYNC_FD type that needs updating after delayed submission v5: Don't forget to update sync_fd in signaled semaphores after submission (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: refcount semaphoresLionel Landwerlin2019-11-112-6/+26
| | | | | | | | | | | | | | Delayed submissions required by timeline semaphores mean we need to be able to update the sync fd backed semaphores in a delayed fashion. This could mean a race between the application destroying the semaphore and the submission code trying to update it with the new sync fd. This change prepares semaphores to be refcounted, we'll most likely only take a reference for cases where we signal a sync fd semaphore. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: prepare driver to report submission error through queuesLionel Landwerlin2019-11-115-24/+60
| | | | | | | | | | | | | | | | | When we will submit to i915 from a submission thread, we won't be able to directly report the error to the user (in particular through the debug report callbacks). So prepare 2 paths to report errors device -> notifying the user immediately, queue -> notifying the user the next time an entry point is called. In this change we still report directly for both paths, this will change in the next commit. v2: Split NULL batch parameter handling in anv_queue_submit_simple_batch() in a different commit Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: allow NULL batch parameter to anv_queue_submit_simple_batchLionel Landwerlin2019-11-112-19/+17
| | | | | | | We can reuse device->trivial_batch_bo Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: move queue init/finish to anv_queue.cLionel Landwerlin2019-11-113-22/+30
| | | | | | | | Prepare the queue initialization to take on more responsabilities and possibly fail. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: expose timeout helpers outside of anv_queue.cLionel Landwerlin2019-11-112-50/+51
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: detach batch emission allocation from deviceLionel Landwerlin2019-11-111-56/+40
| | | | | | | | In the future we'll have 2 different allocations depending on whether we're using threaded submission or not. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: remove list items on batch finiLionel Landwerlin2019-11-111-1/+4
| | | | | | | | | | | | | This doesn't seem to fix anything because those destroy() calls happen right before the command buffer object & its list of batch_bo is also destroyed. Still looks a bit cleaner. v2: Found a second occurence Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2) Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files") Cc: <[email protected]>
* anv: invalidate file descriptor of semaphore sync fd at vkQueueSubmitLionel Landwerlin2019-11-111-2/+4
| | | | | | | | | | | | | | | | | | We always close the in_fence at the end the anv_cmd_buffer_execbuf() so when we take it from the semaphore, let's not forget to invalidate it. Note that the code leaks the fence_in if we get any error before reaching the close(). Let's fix that in another patch or better, rewrite the whole thing! v2: drop redundant fd = -1 (Jason) v3: Update commit message (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Lower large local arrays to scratchJason Ekstrand2019-11-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 14929212 -> 14880028 (-0.33%) instructions in affected programs: 72428 -> 23244 (-67.91%) helped: 6 HURT: 2 helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624 helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08% HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178 HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97% 95% mean confidence interval for instructions value: -11947.03 -348.97 95% mean confidence interval for instructions %-change: -125.72% 202.37% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 368585300 -> 342557344 (-7.06%) cycles in affected programs: 28144921 -> 2116965 (-92.48%) helped: 6 HURT: 2 helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682 helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28% HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788 HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59% 95% mean confidence interval for cycles value: -5900438.73 -606550.27 95% mean confidence interval for cycles %-change: -140.79% 146.16% Inconclusive result (%-change mean confidence interval includes 0). total spills in shared programs: 9243 -> 8901 (-3.70%) spills in affected programs: 2718 -> 2376 (-12.58%) helped: 4 HURT: 4 total fills in shared programs: 21831 -> 10141 (-53.55%) fills in affected programs: 11804 -> 114 (-99.03%) helped: 6 HURT: 2 total sends in shared programs: 815912 -> 815912 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 1 GAINED: 3 The helped shaders are all compute shaders in Aztec Ruins. There is also a compute shader in synmark2 OglCSDof that's helped but it doesn't show up in above shader-db results because it went from SIMD8 to SIMD16. That shader improves enough to yield an 15-20% performance boost to the benchmark as a whole on my KBL laptop. The hurt shaders are a couple shaders in Kerbal Space Program and a couple in Aztec Ruins. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Implement the new load/store_scratch intrinsicsJason Ekstrand2019-11-115-17/+241
| | | | | | | | | | | | | | | | | This commit fills in a number of different pieces: 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the new intrinsics. This involves simple plumbing work as well as a tiny bit of extra logic to always scalarize scratch intrinsics 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics into byte/dword scattered read/write messages which use the A32 stateless model. 3. Add code to lower_surface_logical_send to handle dword scattered messages and the A32 stateless model. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Plumb devinfo through lower_mem_access_bit_sizesJason Ekstrand2019-11-113-9/+14
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: refactor surface header setupJason Ekstrand2019-11-111-23/+16
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add DWord scattered read/write opcodesJason Ekstrand2019-11-115-0/+66
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizesJason Ekstrand2019-11-111-37/+15
| | | | | | | The new helper solves most of the annoying problems with data wrangling in brw_nir_lower_mem_access_bit_sizes. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Unify GetDeviceQueue and GetDeviceQueue2Ricardo Garcia2019-11-111-4/+8
| | | | | | | | | Avoid duplicating some checks and code by making anv_GetDeviceQueue a subcase of anv_GetDeviceQueue2, like radv does. Signed-off-by: Ricardo Garcia <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* Revert "intel/blorp: Fix usage of uninitialized memory in key hashing"Kenneth Graunke2019-11-071-6/+1
| | | | | | | This reverts commit 4432a2d14d80081d062f7939a950d65ea3a16eed. Pretty much every SKQP test dies with this assertion: skqp: ../src/mesa/drivers/dri/i965/brw_program_cache.c:102: hash_key: Assertion `item->key_size % 4 == 0' failed.
* intel/blorp: Fix usage of uninitialized memory in key hashingDanylo Piliaiev2019-11-071-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The automatically generated padding in structs contains undefined values, force pack the structs to eliminate the padding. Otherwise structs with the same values may generate different hashes. Valgrind output: Conditional jump or move depends on uninitialised value(s) util_fast_urem32 (fast_urem_by_const.h:71) hash_table_search (hash_table.c:262) _mesa_hash_table_search (hash_table.c:296) anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318) anv_pipeline_cache_search (anv_pipeline_cache.c:335) lookup_blorp_shader (anv_blorp.c:38) blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112) blorp_mcs_partial_resolve (blorp_clear.c:1205) anv_image_mcs_op (anv_blorp.c:1742) anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774) transition_color_buffer (genX_cmd_buffer.c:1159) cmd_buffer_end_subpass (genX_cmd_buffer.c:4840) Uninitialised value was created by a stack allocation blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103) Cc: <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/gen_decoder: Fix unused-but-set-variable warningKai Wasserbäch2019-11-071-2/+2
| | | | | | | | | | | This commit fixes the following warning: ../src/intel/common/gen_decoder.c: In function ‘gen_spec_load_from_path’: ../src/intel/common/gen_decoder.c:741:11: warning: variable ‘len’ set but not used [-Wunused-but-set-variable] 741 | size_t len, filename_len = strlen(path) + 20; | ^~~ Signed-off-by: Kai Wasserbäch <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* anv: implement VK_KHR_separate_depth_stencil_layoutsLionel Landwerlin2019-11-066-27/+90
| | | | | | | | | | | | | | | | | | | v2: Use ternary to simplify code (Jason) v3: Reorder switch cases to follow existing section ordering (Nanley) Add missing comment in cmd_buffer_end_subpass() about new layout (Nanley) v4: Fix layout comparison for stencil case (Nanley) Update a few more comments (Nanley) Move VK_IMAGE_LAYOUT_STENCIL_ATTACHMENT_OPTIMAL_KHR in color attachment case for future stencil-CCS support (Nanley) v5: Missed comments update (Nanley) Updated relnotes.txt (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* meson: move the generic symbols check arguments to a common variableEric Engestrom2019-11-051-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviwed-by: Dylan Baker <dylan@pnwbakers>
* meson: add variable to control the symbols checksEric Engestrom2019-11-051-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviwed-by: Dylan Baker <dylan@pnwbakers>
* intel/compiler: remove the operand restriction for src1 on GLKPaulo Zanoni2019-11-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit 5847de6e9afe implemented a restriction that applies to ICL, but wrongly marked it as also applying to GLK. Reviewers or MR !1125 pointed this, and the commit history shows removal of GLK to parts of the patch, but it turns there was still a left-over GLK check in the code. This code was breaking some of the i8vec2 tests on GLK, for example: dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2 Removing the GLK check solves the issue for GLK. I don't see a reason on why implementing this restriction would actually break GLK, so there's still more to investigate here since this bug may be affecting ICL+, but let's apply the real GLK fix while we analyze and discuss the other possible issues. Fixes: 5847de6e9afe ("intel/compiler: don't use byte operands for src1 on ICL") BSpec: 3017 Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* anv: Properly handle host query reset of performance queriesLionel Landwerlin2019-11-041-32/+20
| | | | | | | | | | | | The host query reset entry point didn't use the availability offset for performance queries. To fix this, reorder the availability of performance queries to match other queries. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query") Reviewed-by: Jason Ekstrand <[email protected]>
* anv: remove incorrect polygonMode=point early-outErik Faye-Lund2019-11-011-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is incorrect, because polygonMode only applies if the final primitive type is a polygon; polygonMode doesn't apply to line-primitives as the comment suggests. The Vulkan 1.1 spec, section 26.11, "Polygons" defines that polygons are separate from points and line segments: " A polygon results from the decomposition of a triangle strip, triangle fan or a series of independent triangles. Like points and line segments, polygon rasterization is controlled by several variables in the VkPipelineRasterizationStateCreateInfo structure. " Further, section 26.11.2, "Polygon Mode", only define polygonMode to apply to polygons: " Possible values of the VkPipelineRasterizationStateCreateInfo::polygonMode property of the currently active pipeline, specifying the method of rasterization for polygons, are: " This seems to clearly define that polygonMode doesn't apply to points and lines, so let's make sure that we don't early out with the wrong value. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Move the RT BTI flush workaround to begin_subpassJason Ekstrand2019-10-311-23/+18
| | | | | | | Now that we're no longer compacting binding table entries, the only time they can possibly change is when we actually switch subpasses. Reviewed-by: Rafael Antognolli <[email protected]>
* anv: Stop compacting render targets in the binding tableJason Ekstrand2019-10-311-88/+62
| | | | | | | | Instead, always emit one entry for every color attachment in the subpass or one NULL if there are no color attachments. This will let us adjust an Ice Lake workaround so we don't get a stall on every draw call. Reviewed-by: Rafael Antognolli <[email protected]>
* anv: Don't claim the null RT as a valid color targetJason Ekstrand2019-10-311-6/+6
| | | | | | | If it's NULL, we can let the compiler go ahead and delete it or flag it as NULL. Reviewed-by: Rafael Antognolli <[email protected]>
* anv: Don't delete fragment shaders that write sample maskJason Ekstrand2019-10-311-1/+3
| | | | | | | Also, use color_outputs_valid rather than nr_color_outputs since it should be a bit more accurate. Reviewed-by: Rafael Antognolli <[email protected]>
* anv: Use the new BO alloc API for AndroidJason Ekstrand2019-10-311-28/+15
| | | | | Fixes: a44f5ee0d8b "anv: Rework the internal BO allocation API" Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add missing xmlconfig headers dependencyEric Engestrom2019-10-311-0/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Dylan Baker <[email protected]>
* anv: Zero released anv_bo structsJason Ekstrand2019-10-311-1/+12
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use a bitset for tracking residencyJason Ekstrand2019-10-312-79/+87
| | | | | | | | | Now that we can conveniently map between GEM handles and struct anv_bo pointers, we can use a simple bitset for residency tracking instead of the complex hash set. This shaves about 3% off of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Set the batch allocator for compute pipelinesJason Ekstrand2019-10-311-2/+5
| | | | | | | Otherwise relocations just up and crash. Fixes: a3153162a9b "anv: Delay allocation of relocation lists" Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add a device parameter to anv_execbuf_add_boJason Ekstrand2019-10-311-19/+32
| | | | | | | We're about to start needing to lookup BO pointers by GEM handle so we need access to the device. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Drop anv_bo_init and anv_bo_init_newJason Ekstrand2019-10-313-49/+35
| | | | | | | BOs are now only ever allocated through the BO cache so there's no need to have these exposed. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Allocate misc BOs from the cacheJason Ekstrand2019-10-318-63/+52
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Allocate scratch BOs from the cacheJason Ekstrand2019-10-312-41/+18
| | | | | | | | While we're here, we get rid of the locking and use a lock-free algorithm. The chances of spilling contention are low and this is actually a bit simpler in some ways. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Allocate batch and fence buffers from the cacheJason Ekstrand2019-10-315-200/+125
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Allocate descriptor buffers from the BO cacheJason Ekstrand2019-10-313-36/+14
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Set more flags on descriptor pool buffersJason Ekstrand2019-10-311-1/+8
| | | | | | | the ASYNC flag, in particular, has the potential to help performance because it means less sync tracking in the kernel. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Allocate query pool BOs from the cacheJason Ekstrand2019-10-312-26/+16
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use the query_slot helper in vkResetQueryPoolEXTJason Ekstrand2019-10-311-1/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>