summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* gitlab-ci: Move artifact preparation to separate scriptMichel Dänzer2019-11-123-32/+36
| | | | | | | | | | | | | | | | | It's currently only needed for the meson-main and meson-arm64 jobs, not the other meson build jobs. Also remove MESON_SHADERDB, just run .gitlab-ci/run-shader-db.sh directly from the meson-main job. v2: * Also run prepare-artifacts.sh in meson-arm64 script v3: * Move tarball creation into the new script as well, as it prevented ccache --show-stats from running in after_script Reviewed-by: Eric Engestrom <[email protected]> # v1 Reviewed-by: Eric Anholt <[email protected]>
* gitlab-ci: Use ninja -j4 for building dEQPMichel Dänzer2019-11-121-1/+1
| | | | | | | | By default, ninja tries to saturate all cores of the runner host machine, which could overload it due to other jobs running in parallel. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* spirv: Fix the MSVC buildJason Ekstrand2019-11-121-1/+1
| | | | | | Fixes: 9cc4c2c91649b "spirv: Add a vtn_decorate_pointer helper" Tested-by: Erik Faye-Lund <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: patch up deref-vars when lowering clip-planesErik Faye-Lund2019-11-121-0/+1
| | | | | | | | Otherwise, we fail validation and potentially generate invalid code. Let's fix up the mode of the accesses to the variable. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* ac: handle pointer types to LDS in ac_get_elem_bits()Samuel Pitoiset2019-11-121-0/+5
| | | | | | | | This fixes crashes with some dEQP-VK.spirv_assembly.instruction.spirv1p4.* tests. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* freedreno: add Adreno 640 IDJonathan Marek2019-11-113-0/+11
| | | | | | | A640 seems to work without any other changes (glmark and vkcube). Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* radv: fix radv secure compile feature breaks compilation on armhf EABI and ↵Luis Mendes2019-11-121-0/+8
| | | | | | | | | | | | | | | aarch64 __NR_select is not defined the same way across architectures, sometimes is not even defined, like in armhf EABI and aarch64. Signed-off-by: Luis Mendes <[email protected]> Acked-by: Timothy Arceri <[email protected]> Acked-by: Samuel Pitoiset <[email protected]> Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2042
* st/mesa: remove unused TGSI-only debug printing functionsMarek Olšák2019-11-117-68/+0
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: add ST_DEBUG=nir to print NIR shadersMarek Olšák2019-11-112-1/+11
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: print TCS/TES/GS/CS TGSI in the right place & keep disk cache enabledMarek Olšák2019-11-112-6/+5
| | | | | | | The old place only printed on a disk cache miss, which is why the disk cache was disabled. Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: remove \n being only printed in debug builds after printed TGSIMarek Olšák2019-11-111-12/+4
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: rename DEBUG_TGSI -> DEBUG_PRINT_IRMarek Olšák2019-11-114-7/+7
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: fix Sanctuary and Tropics by disabling ARB_gpu_shader5 for themMarek Olšák2019-11-116-1/+12
| | | | | | | They use the "sample" keyword as a variable name. Cc: 19.2 19.3 <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* anv: implement VK_KHR_timeline_semaphoreLionel Landwerlin2019-11-115-72/+734
| | | | | | | | | | | | | | | | | v2: Fix inverted condition in vkGetPhysicalDeviceExternalSemaphoreProperties() v3: Add anv_timeline_* helpers (Jason) v4: Avoid variable shadowing (Jason) Split timeline wait/signal device operations (Jason/Lionel) v5: s/point/signal_value/ (Jason) Drop piece of drm-syncobj timeline code (Jason) v6: Add missing sync_fd semaphore signaling (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Plumb timeline semaphore signal/wait values through from the APIJason Ekstrand2019-11-112-3/+22
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/wsi: signal the semaphore in the acquireNextImageLionel Landwerlin2019-11-111-4/+20
| | | | | | | | | | | We seem to have forgotten about the semaphore in the acquireNextImageInfo. v2: Signal semaphore/fence regardless of presentation status (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Lock around fetching sync file FDs from semaphoresJason Ekstrand2019-11-111-13/+26
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: prepare the driver for delayed submissionsLionel Landwerlin2019-11-114-376/+616
| | | | | | | | | | | | | | | | | | | | | | | | Timeline semaphore introduce support for wait before signal behavior, which means that it is now allowed to call vkQueueSubmit() with wait semaphores not yet submitted for execution. Our kernel driver requires all of the wait primitives to be created before calling the execbuf ioctl. As a result, we must delay submissions in the userspace driver. This change store the necessary information to be able to delay a VkSubmitInfo submission to the kernel driver. v2: Fold count++ into array access (Jason) Move queue list to another patch (Jason) v3: Document cleanup of temporary semaphores (Jason) v4: Track semaphores of SYNC_FD type that needs updating after delayed submission v5: Don't forget to update sync_fd in signaled semaphores after submission (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: refcount semaphoresLionel Landwerlin2019-11-112-6/+26
| | | | | | | | | | | | | | Delayed submissions required by timeline semaphores mean we need to be able to update the sync fd backed semaphores in a delayed fashion. This could mean a race between the application destroying the semaphore and the submission code trying to update it with the new sync fd. This change prepares semaphores to be refcounted, we'll most likely only take a reference for cases where we signal a sync fd semaphore. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: prepare driver to report submission error through queuesLionel Landwerlin2019-11-115-24/+60
| | | | | | | | | | | | | | | | | When we will submit to i915 from a submission thread, we won't be able to directly report the error to the user (in particular through the debug report callbacks). So prepare 2 paths to report errors device -> notifying the user immediately, queue -> notifying the user the next time an entry point is called. In this change we still report directly for both paths, this will change in the next commit. v2: Split NULL batch parameter handling in anv_queue_submit_simple_batch() in a different commit Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: allow NULL batch parameter to anv_queue_submit_simple_batchLionel Landwerlin2019-11-112-19/+17
| | | | | | | We can reuse device->trivial_batch_bo Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: move queue init/finish to anv_queue.cLionel Landwerlin2019-11-113-22/+30
| | | | | | | | Prepare the queue initialization to take on more responsabilities and possibly fail. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: expose timeout helpers outside of anv_queue.cLionel Landwerlin2019-11-112-50/+51
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: detach batch emission allocation from deviceLionel Landwerlin2019-11-111-56/+40
| | | | | | | | In the future we'll have 2 different allocations depending on whether we're using threaded submission or not. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: remove list items on batch finiLionel Landwerlin2019-11-111-1/+4
| | | | | | | | | | | | | This doesn't seem to fix anything because those destroy() calls happen right before the command buffer object & its list of batch_bo is also destroyed. Still looks a bit cleaner. v2: Found a second occurence Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2) Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files") Cc: <[email protected]>
* anv: invalidate file descriptor of semaphore sync fd at vkQueueSubmitLionel Landwerlin2019-11-111-2/+4
| | | | | | | | | | | | | | | | | | We always close the in_fence at the end the anv_cmd_buffer_execbuf() so when we take it from the semaphore, let's not forget to invalidate it. Note that the code leaks the fence_in if we get any error before reaching the close(). Let's fix that in another patch or better, rewrite the whole thing! v2: drop redundant fd = -1 (Jason) v3: Update commit message (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: fix radv_nir_get_max_workgroup_size when nir=NULLRhys Perry2019-11-111-1/+4
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Fixes: 84a1a2578 ('compiler: pack shader_info from 160 bytes to 96 bytes') Reviewed-by: Samuel Pitoiset <[email protected]>
* mesa: check framebuffer completeness only after state updateLionel Landwerlin2019-11-111-6/+6
| | | | | | | | | | | | The change made in 88d665830f27 ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv") correctly updated the state prior to checking the framebuffer completeness on glClearBufferiv but not in glClearBufferfi. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Fixes: 88d665830f27 ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv") Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/2072
* glsl: Check earlier for MaxTextureImageUnits and MaxImageUniformsCaio Marcelo de Oliveira Filho2019-11-112-12/+24
| | | | | | | | | | | | | | Currently the linker do all the work then check for the limits, which means num_textures and num_images in shader_info may have to store more than the limit. This breaks down now since shader_info was packed and doesn't expect to store larger invalid values. To fix this, pull the check before we set the counts in shader_info. Add necessary plumbing to make sure we bail once those errors are found. Fixes: 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes") Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Check earlier for MaxShaderStorageBlocks and MaxUniformBlocksCaio Marcelo de Oliveira Filho2019-11-111-16/+16
| | | | | | | | | | | | | | | Currently the linker do all the work then check for the limits, which means num_ssbos and num_ubos in shader_info may have to store more than the limit. This breaks down now since shader_info was packed and doesn't expect to store larger invalid values. To fix this, pull the check before we set the counts in shader_info. One drawback of this approach is that for some cases we might not see the collected errors from various stages, but bail as soon as a stage breaks the limits. Fixes: 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes") Reviewed-by: Timothy Arceri <[email protected]>
* util: Use ZSTD for shader cache if possibleDylan Baker2019-11-114-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows ZSTD instead of ZLIB to be used for compressing the shader cache. On a 72 core system emulating skl with a full shader-db (with i965): ZSTD: 1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache) 225.40s user 10.87s system 3810% cpu 6.201 total (warm cache) 154M (235M on disk) ZLIB: 2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache) 229.15s user 10.63s system 3906% cpu 6.139 total (warm cache) 163M (244M on disk) Tim Arceri sees (8 core ryzen and a full shader-db): ZSTD: 2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache) 418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache) 454.3 MB (681.7 MB on disk) ZLIB: 3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache) 425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache) 470.3 MB (701.4 MB on disk) Reviewed-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Eric Anholt <[email protected]>
* egl: avoid local modifications for eglext.h Khronos standard header fileLaurent Carlier2019-11-112-11/+11
| | | | | | | | | | Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2 So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers. Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external" Cc: [email protected] Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* egl: move #include of local headers out of Khronos headersEric Engestrom2019-11-113-3/+4
| | | | | | Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* intel/fs: Lower large local arrays to scratchJason Ekstrand2019-11-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 14929212 -> 14880028 (-0.33%) instructions in affected programs: 72428 -> 23244 (-67.91%) helped: 6 HURT: 2 helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624 helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08% HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178 HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97% 95% mean confidence interval for instructions value: -11947.03 -348.97 95% mean confidence interval for instructions %-change: -125.72% 202.37% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 368585300 -> 342557344 (-7.06%) cycles in affected programs: 28144921 -> 2116965 (-92.48%) helped: 6 HURT: 2 helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682 helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28% HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788 HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59% 95% mean confidence interval for cycles value: -5900438.73 -606550.27 95% mean confidence interval for cycles %-change: -140.79% 146.16% Inconclusive result (%-change mean confidence interval includes 0). total spills in shared programs: 9243 -> 8901 (-3.70%) spills in affected programs: 2718 -> 2376 (-12.58%) helped: 4 HURT: 4 total fills in shared programs: 21831 -> 10141 (-53.55%) fills in affected programs: 11804 -> 114 (-99.03%) helped: 6 HURT: 2 total sends in shared programs: 815912 -> 815912 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 1 GAINED: 3 The helped shaders are all compute shaders in Aztec Ruins. There is also a compute shader in synmark2 OglCSDof that's helped but it doesn't show up in above shader-db results because it went from SIMD8 to SIMD16. That shader improves enough to yield an 15-20% performance boost to the benchmark as a whole on my KBL laptop. The hurt shaders are a couple shaders in Kerbal Space Program and a couple in Aztec Ruins. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Implement the new load/store_scratch intrinsicsJason Ekstrand2019-11-115-17/+241
| | | | | | | | | | | | | | | | | This commit fills in a number of different pieces: 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the new intrinsics. This involves simple plumbing work as well as a tiny bit of extra logic to always scalarize scratch intrinsics 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics into byte/dword scattered read/write messages which use the A32 stateless model. 3. Add code to lower_surface_logical_send to handle dword scattered messages and the A32 stateless model. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Plumb devinfo through lower_mem_access_bit_sizesJason Ekstrand2019-11-113-9/+14
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: refactor surface header setupJason Ekstrand2019-11-111-23/+16
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add DWord scattered read/write opcodesJason Ekstrand2019-11-115-0/+66
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizesJason Ekstrand2019-11-111-37/+15
| | | | | | | The new helper solves most of the annoying problems with data wrangling in brw_nir_lower_mem_access_bit_sizes. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add tests for nir_extract_bitsJason Ekstrand2019-11-112-0/+167
|
* nir/builder: Add a nir_extract_bits helperJason Ekstrand2019-11-111-37/+80
| | | | | | | | | | This new helper is better than nir_bitcast_vector because it's able to take a (mostly) arbitrary range from the source vector. The only requirement is that first_bit has to be aligned to the smaller of the two bit sizes. It wouldn't be hard to lift that requirement but it's reasonable for now. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* egl: fix _EGL_NATIVE_PLATFORM fallbackEric Engestrom2019-11-111-9/+0
| | | | | | | | | When the X11 or Haiku platforms were compiled in, they would bypass the `_EGL_NATIVE_PLATFORM` fallback by always returning themselves instead. Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* anv: Unify GetDeviceQueue and GetDeviceQueue2Ricardo Garcia2019-11-111-4/+8
| | | | | | | | | Avoid duplicating some checks and code by making anv_GetDeviceQueue a subcase of anv_GetDeviceQueue2, like radv does. Signed-off-by: Ricardo Garcia <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* panfrost: Select format-specific blending intrinsicsAlyssa Rosenzweig2019-11-113-9/+41
| | | | | | | | | | | If we have an accelerated path for a particular framebuffer format, let's use it to save a bunch of instructions in a blend shader. [Tomeu: Only use the faster intrinsic on >T760] Signed-off-by: Alyssa Rosenzweig <[email protected]> Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* pan/midgard: Pack load/store masksAlyssa Rosenzweig2019-11-111-2/+30
| | | | | | | | | While most load/store operations on 32-bit/vec4 intriniscally, some are not and have special type-size-dependent semantics for the mask. We need to convert into this native format. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* pan/midgard: Implement nir_intrinsic_load_output_u8_as_fp16_panAlyssa Rosenzweig2019-11-111-0/+20
| | | | | | | | We can use the native Midgard ops for this, depending what chip we're on. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* pan/midgard: Identify ld_color_buffer_u8_as_fp16*Alyssa Rosenzweig2019-11-112-2/+7
| | | | | | | | | | There are two versions of this opcode, depending what version of the ISA you're using. I'm not sure if there's a semantic difference; I think there might be some slight subtleties but it's too early to know at this stage. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* nir: Add load_output_u8_as_fp16_pan intrinsicAlyssa Rosenzweig2019-11-111-0/+6
| | | | | | | | | This is a single opcode, at least on newer Midgard chips. It's easier to have this represented in NIR rather than trying to optimize out the conversions, so let's add the intrinsic. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* panfrost: Set depth and stencil for SFBD based on the formatTomeu Vizoso2019-11-114-21/+36
| | | | | Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* zink: correct depth-stencil formatErik Faye-Lund2019-11-111-1/+1
| | | | | | | | | | | | | | | | | | When using packed vulkan-formats on little-endian systems, we need to swap the components for the gallium formats. And since Zink isn't big-endian safe yet, little-endian is the only endianess we care about right now. This fixes a bunch of piglit tests, amongs others: - spec@arb_depth_texture@depth-level-clamp - spec@arb_depth_texture@depthstencil-render-miplevels * d=z24 - spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit - spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels - spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels - spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels Signed-off-by: Erik Faye-Lund <[email protected]> Fixes: 8d46e35d16e ("zink: introduce opengl over vulkan")