aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* winsys/svga: Limit the maximum DMA hardware buffer sizeThomas Hellstrom2019-10-241-1/+4
| | | | | | | | | | | | | | | | | | | The kernel total GMR/DMA size is limited, but it's definitely possible for the kernel to allow a larger buffer allocation to succeed, but command submission using that buffer as a GMR would fail typically causing an application crash. So have the winsys limit the size of GMR/DMA buffers. The pipe driver will then resort to allocating smaller buffers and perform the DMA transfer in multiple bands, also allowing for the pre-flush mechanism to kick in. This avoids the related application crashes. Fixes: e7843273fae ("winsys/svga: Update to vmwgfx kernel module 2.1") Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* svga: Fix banded DMA upload unmapThomas Hellstrom2019-10-241-1/+1
| | | | | | | | | | | | | | | | Even with banded DMA uploads, st->hwbuf is always non-NULL, but when we've allocated a software buffer to hold the full upload, unmapping of the hardware buffer has already been done before svga_texture_transfer_unmap_dma(), and the code was performing an unmap of an already mapped buffer. Fix this by testing for software buffer not present. Fixes: a9c4a861d5d ("svga: refactor svga_texture_transfer_map/unmap functions") Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gitlab-ci: Update kernel for LAVA jobs to 5.4-rc4Tomeu Vizoso2019-10-242-2/+2
| | | | | | | | | | | Update to 5.4-rc4 so we can test Panfrost on devices with Mali T720 and T820. A bug was found that prevented things working at all on RK3288 devices, so we carry a patch for now in my personal fork. Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Daniel Stone <[email protected]>
* glsl: remove propagate_invariance() call from the linkerTimothy Arceri2019-10-241-2/+0
| | | | | | This was added in 586f4a42e78f and became redundant with 34ab9b0947cd Reviewed-by: Marek Olšák <[email protected]>
* nir: improve nir_variable packingTimothy Arceri2019-10-241-1/+3
| | | | | | | | | | | | | Before: /* size: 136, cachelines: 3, members: 10 */ After: /* size: 128, cachelines: 2, members: 10 */ Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: fix nir_variable_data packingTimothy Arceri2019-10-241-8/+8
| | | | | | | | | | | | | Before: /* size: 60, cachelines: 1, members: 29 */ After: /* size: 56, cachelines: 1, members: 29 */ Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* radeonsi/nir: implement pipe_screen::finalize_nirMarek Olšák2019-10-235-19/+32
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: use pipe_screen::finalize_nirMarek Olšák2019-10-236-25/+89
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* tgsi_to_nir: use pipe_screen::finalize_nirMarek Olšák2019-10-231-4/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* gallium: add pipe_screen::finalize_nirMarek Olšák2019-10-235-0/+43
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: update VS shader_info for NIR after lowering passesMarek Olšák2019-10-231-0/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: assign driver locations for VS inputs for NIR before cachingMarek Olšák2019-10-235-9/+15
| | | | | | | fix up edge flags in the NIR pass, because st/mesa doesn't touch the inputs after caching Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: don't lower_global_vars_to_local for VS if there are no dead inputsMarek Olšák2019-10-231-2/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: move some NIR lowering before shader cachingMarek Olšák2019-10-233-14/+23
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* util/u_queue: skip util_queue_finish if num_threads is 0Marek Olšák2019-10-231-0/+7
| | | | | | | This fixes a deadlock in pthread_barrier_destroy. Cc: 19.1 19.2 <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* util/disk_cache: finish all queue jobs in destroy instead of killing themMarek Olšák2019-10-231-0/+1
| | | | | | | If there are queued shaders to be written to disk, wait for that. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Rework edgeflag handlingKenneth Graunke2019-10-232-7/+28
| | | | | | | | | | We were relying on specific pass ordering in st to avoid setting inputs_read/outputs_written for edge flags. Instead, just assume that it happens and throw out the results we don't want. We should probably revisit this and try and add a vertex element property like I originally wanted so we can avoid having it be associated with the VS altogether.
* gallium/noop: implement get_disk_shader_cache and get_compiler_optionsMarek Olšák2019-10-231-0/+18
| | | | trivial
* aco: take LDS into account when calculating num_wavesRhys Perry2019-10-234-7/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | pipeline-db (Vega): SGPRS: 344 -> 344 (0.00 %) VGPRS: 424 -> 524 (23.58 %) Spilled SGPRs: 84 -> 80 (-4.76 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 52812 -> 52484 (-0.62 %) bytes LDS: 135 -> 135 (0.00 %) blocks Max Waves: 56 -> 53 (-5.36 %) v2: consider WGP, rework to be clearer and apply the "maximum 16 workgroups per CU" limit properly v2: use "SIMD" instead of "EU" v2: fix spiller by introducing "Program::max_waves" v2: rename "lds_size" to "lds_limit" v3: make max_waves actually independant of register usage v3: fix issue where max_waves was way too high v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1) v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp" v4: fix typo from "workgroups_per_cu" rename Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> (v3)
* aco: increase accuracy of SGPR limitsRhys Perry2019-10-236-28/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed number of SGPRs and has 106 addressable SGPRs. pipeline-db (Vega): SGPRS: 5912 -> 6232 (5.41 %) VGPRS: 1772 -> 1780 (0.45 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 88228 -> 87904 (-0.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 559 -> 571 (2.15 %) piepline-db (Navi): SGPRS: 341256 -> 363384 (6.48 %) VGPRS: 171536 -> 170960 (-0.34 %) Spilled SGPRs: 832 -> 581 (-30.17 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14207332 -> 14190872 (-0.12 %) bytes LDS: 33 -> 33 (0.00 %) blocks Max Waves: 18072 -> 18251 (0.99 %) v2: unconditionally count vcc as an extra sgpr on GFX10+ v3: pass SGPRs rounded to 8 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: round vgprs/sgprs before calculating max_wavesRhys Perry2019-10-231-4/+8
| | | | | | | | | | | | | | | | | | | | Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9. pipeline-db (ACO/Vega): SGPRS: 11000 -> 11000 (0.00 %) VGPRS: 3120 -> 3120 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 164328 -> 164328 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1125 -> 1000 (-11.11 %) v2: consider wave32 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* docs: Add new Intel extensionLionel Landwerlin2019-10-231-0/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Caio Marcelo de Oliveira Filho <[email protected]>
* Revert "vc4: do not report alpha-test as supported"Erik Faye-Lund2019-10-232-4/+14
| | | | | | This reverts commit a79b93269cf340ce4d23b5b34100039bcaafc841. Reviewed-by: Jose Maria Casanova <[email protected]>
* Revert "v3d: do not report alpha-test as supported"Erik Faye-Lund2019-10-233-3/+11
| | | | | | | This reverts commit 9d0523b569bb7208c6e74cafc0f3945415d94336. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jose Maria Casanova <[email protected]>
* Revert "nir: drop support for using load_alpha_ref_float"Erik Faye-Lund2019-10-231-11/+14
| | | | | | | This reverts commit 5af272b47469398762e984e27f65fc4ecc293d28. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jose Maria Casanova <[email protected]>
* Revert "nir: drop unused alpha_ref_float"Erik Faye-Lund2019-10-232-0/+2
| | | | | | | This reverts commit e8095f2af0736b5937674ca319f29cc9dabb17d4. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jose Maria Casanova <[email protected]>
* radv: fix a performance regression with graphics depth/stencil clearsSamuel Pitoiset2019-10-232-25/+123
| | | | | | | | | | | | | | | | | | | | I recently changed the slow depth/stencil clear path to make sure depth values are explicitly exported by the fragment shader. This is actually only useful when VK_EXT_depth_range_unrestricted is enabled. While this path is correct, it introduced a performance regression with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and probably more titles. This is because it prevents the hardware to do some optimizations like discarding fragments. This commit re-introduces the previous (a bit faster) slow depth/stencil clear path and it selects the unrestricted path only if VK_EXT_depth_range_unrestricted is enabled. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix vkUpdateDescriptorSets with inline uniform blocksSamuel Pitoiset2019-10-231-0/+8
| | | | | | | | | | | | | | | descriptorCount is the number of bytes into the descriptor, so it shouldn't be used as an index. srcArrayElement/dstArrayElement specify the starting byte offset within the binding to copy from/to. This fixes new CTS tests: dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_* dEQP-VK.binding_model.descriptor_copy.*.mix_3 dEQP-VK.binding_model.descriptor_copy.*.mix_array1 Fixes: 8d2654a4197 ("radv: Support VK_EXT_inline_uniform_block.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix 3D imagesSamuel Pitoiset2019-10-233-17/+17
| | | | | | | | | | | GFX10 does act like GFX9 actually. This fixes dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*. Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: re-enable fast depth/stencil clears with separate aspectsSamuel Pitoiset2019-10-231-3/+2
| | | | | | | | | It used to cause weird issues on GFX10 in the past with vkmark and Wreckfest, and they can't be reproduced now. Shadow Of Mordor (Vulkan beta) hits that path and it works fine. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not emit rbplus if attachments are undefinedSamuel Pitoiset2019-10-231-0/+3
| | | | | | | | | | | Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer on Raven and other chips that allow rbplus. This just prevents a crash and rbplus probaby needs more work. Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add an assertion in radv_gfx10_compute_bin_size()Samuel Pitoiset2019-10-231-0/+1
| | | | | | | To prevent out of bounds access. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not create meta pipelines with 16 samplesSamuel Pitoiset2019-10-232-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The driver only supports up to 8 samples, so it's useless to create more pipelines than needed. This fixes a conditional jump reported by Valgrind on GFX10: ==194282== Conditional jump or move depends on uninitialised value(s) ==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242) ==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334) ==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440) ==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764) ==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788) ==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114) ==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297) ==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277) ==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363) ==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080 This is caused by an out of bound access of 'fmask_array' (ie. index is 4 as for 16 samples). Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: implement VK_INTEL_performance_queryLionel Landwerlin2019-10-239-18/+535
| | | | | | | | | | | | | | | | | | | | | v2: Introduce the appropriate pipe controls Properly deal with changes in metric sets (using execbuf parameter) Record marker at query end v3: Fill out PerfCntr1&2 v4: Introduce vkUninitializePerformanceApiINTEL v5: Use new execbuf extension mechanism v6: Fix comments in genX_query.c (Rafael) Use PIPE_CONTROL workarounds (Rafael) Refactor on the last kernel series update (Lionel) v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: add mdapi writes for register perf countersLionel Landwerlin2019-10-231-0/+36
| | | | | | | Those are not part of the OA reports. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/genxml: add RPSTAT register for core frequencyLionel Landwerlin2019-10-238-0/+40
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/genxml: add generic perf counters registersLionel Landwerlin2019-10-234-0/+72
| | | | | | | | | | We have 2 of those we can configure to source programmable events. Those are not part of the OA reports. Configuration happens in i915 through the metric set selected by the application. On the Mesa side we'll just sample those and do a diff. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: add support for querying kernel loaded configurationsLionel Landwerlin2019-10-232-27/+181
| | | | | | | We use this as a communication mechanism between MDAPI & Anv. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* drm-uapi: Update headers from drm-nextLionel Landwerlin2019-10-234-9/+141
| | | | | | | | | | | | | Pull new updates from drm-next as of the following commit: commit f1b4a9217efd61d0b84c6dc404596c8519ff6f59 Merge: 400e91347e1d f3a36d469621 Author: Dave Airlie <[email protected]> Date: Tue Oct 22 15:04:00 2019 +1000 Merge tag 'du-next-20191016' of git://linuxtv.org/pinchartl/media into drm-next Signed-off-by: Lionel Landwerlin <[email protected]>
* intel/perf: move registers to their own headerLionel Landwerlin2019-10-236-26/+58
| | | | | | | Will conflict with the genxml RPSTAT register. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: extract register configurationLionel Landwerlin2019-10-233-16/+24
| | | | | | | | We want to query the content of register configurations from the kernel. Let's pull this out of the query. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: expose some utility functionsLionel Landwerlin2019-10-232-31/+55
| | | | | | | | | The Vulkan performance query extension is a bit lower level than the GL one. Expose some of the functions to do the result accumulation directly in the Anv driver. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: add mdapi maker helperLionel Landwerlin2019-10-231-0/+28
| | | | | | | A simple utility to put the marker at the right location. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* st/mesa: Silence chatty debug printfKenneth Graunke2019-10-221-2/+4
| | | | | | Other debug_printf's in this file are in if (0) blocks. Trivial.
* st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORMChris Wilson2019-10-222-25/+15
| | | | | | | | This is useful for PBO texture upload with GL_RGB and GL_UNSIGNED_BYTE. v2: Vasily Khoruzhick provided an update for the Lima CI expectations. Reviewed-by: Marek Olšák <[email protected]>
* anv: fix unwind of vkCreateDevice failLionel Landwerlin2019-10-221-3/+3
| | | | | | | | | | | | We're skipping the context destruction in some cases which is the grand scheme of thing is not that important because closing device->fd will destroy the associated context as well. Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: <[email protected]> Fixes: b30e01aef56 ("anv: fix memory leak on device destroy")
* Revert "aco: only emit waitcnt on loop continues if we there was some load ↵Rhys Perry2019-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | or export" We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this waitcnt was necessary for barriers and correctly waiting for SMEM before s_dcache_wb on GFX10. Totals from affected shaders: SGPRS: 33200 -> 33200 (0.00 %) VGPRS: 31376 -> 31376 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2431804 -> 2433956 (0.09 %) bytes LDS: 316 -> 316 (0.00 %) blocks Max Waves: 1609 -> 1609 (0.00 %) This reverts commit 2c050b49b3d776f054f1265d5523cabb61f22fc3. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: add missing bld.scc()Rhys Perry2019-10-221-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: keep can_reorder/barrier when combining addition into SMEMRhys Perry2019-10-221-0/+2
| | | | | | | | | | | | | | | | | | Affects 30 shaders in the pipeline-db (all youngblood). Totals from affected shaders: SGPRS: 2656 -> 2456 (-7.53 %) VGPRS: 2260 -> 2260 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 240680 -> 240944 (0.11 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 90 -> 90 (0.00 %) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: add a few missing checks in value numberingRhys Perry2019-10-221-1/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>