summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: do not create meta pipelines with 16 samplesSamuel Pitoiset2019-10-232-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The driver only supports up to 8 samples, so it's useless to create more pipelines than needed. This fixes a conditional jump reported by Valgrind on GFX10: ==194282== Conditional jump or move depends on uninitialised value(s) ==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242) ==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334) ==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440) ==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764) ==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788) ==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114) ==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297) ==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277) ==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363) ==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080 This is caused by an out of bound access of 'fmask_array' (ie. index is 4 as for 16 samples). Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: implement VK_INTEL_performance_queryLionel Landwerlin2019-10-239-18/+535
| | | | | | | | | | | | | | | | | | | | | v2: Introduce the appropriate pipe controls Properly deal with changes in metric sets (using execbuf parameter) Record marker at query end v3: Fill out PerfCntr1&2 v4: Introduce vkUninitializePerformanceApiINTEL v5: Use new execbuf extension mechanism v6: Fix comments in genX_query.c (Rafael) Use PIPE_CONTROL workarounds (Rafael) Refactor on the last kernel series update (Lionel) v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: add mdapi writes for register perf countersLionel Landwerlin2019-10-231-0/+36
| | | | | | | Those are not part of the OA reports. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/genxml: add RPSTAT register for core frequencyLionel Landwerlin2019-10-238-0/+40
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/genxml: add generic perf counters registersLionel Landwerlin2019-10-234-0/+72
| | | | | | | | | | We have 2 of those we can configure to source programmable events. Those are not part of the OA reports. Configuration happens in i915 through the metric set selected by the application. On the Mesa side we'll just sample those and do a diff. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: add support for querying kernel loaded configurationsLionel Landwerlin2019-10-232-27/+181
| | | | | | | We use this as a communication mechanism between MDAPI & Anv. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* drm-uapi: Update headers from drm-nextLionel Landwerlin2019-10-234-9/+141
| | | | | | | | | | | | | Pull new updates from drm-next as of the following commit: commit f1b4a9217efd61d0b84c6dc404596c8519ff6f59 Merge: 400e91347e1d f3a36d469621 Author: Dave Airlie <[email protected]> Date: Tue Oct 22 15:04:00 2019 +1000 Merge tag 'du-next-20191016' of git://linuxtv.org/pinchartl/media into drm-next Signed-off-by: Lionel Landwerlin <[email protected]>
* intel/perf: move registers to their own headerLionel Landwerlin2019-10-236-26/+58
| | | | | | | Will conflict with the genxml RPSTAT register. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: extract register configurationLionel Landwerlin2019-10-233-16/+24
| | | | | | | | We want to query the content of register configurations from the kernel. Let's pull this out of the query. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: expose some utility functionsLionel Landwerlin2019-10-232-31/+55
| | | | | | | | | The Vulkan performance query extension is a bit lower level than the GL one. Expose some of the functions to do the result accumulation directly in the Anv driver. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/perf: add mdapi maker helperLionel Landwerlin2019-10-231-0/+28
| | | | | | | A simple utility to put the marker at the right location. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* st/mesa: Silence chatty debug printfKenneth Graunke2019-10-221-2/+4
| | | | | | Other debug_printf's in this file are in if (0) blocks. Trivial.
* st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORMChris Wilson2019-10-222-25/+15
| | | | | | | | This is useful for PBO texture upload with GL_RGB and GL_UNSIGNED_BYTE. v2: Vasily Khoruzhick provided an update for the Lima CI expectations. Reviewed-by: Marek Olšák <[email protected]>
* anv: fix unwind of vkCreateDevice failLionel Landwerlin2019-10-221-3/+3
| | | | | | | | | | | | We're skipping the context destruction in some cases which is the grand scheme of thing is not that important because closing device->fd will destroy the associated context as well. Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: <[email protected]> Fixes: b30e01aef56 ("anv: fix memory leak on device destroy")
* Revert "aco: only emit waitcnt on loop continues if we there was some load ↵Rhys Perry2019-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | or export" We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this waitcnt was necessary for barriers and correctly waiting for SMEM before s_dcache_wb on GFX10. Totals from affected shaders: SGPRS: 33200 -> 33200 (0.00 %) VGPRS: 31376 -> 31376 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2431804 -> 2433956 (0.09 %) bytes LDS: 316 -> 316 (0.00 %) blocks Max Waves: 1609 -> 1609 (0.00 %) This reverts commit 2c050b49b3d776f054f1265d5523cabb61f22fc3. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: add missing bld.scc()Rhys Perry2019-10-221-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: keep can_reorder/barrier when combining addition into SMEMRhys Perry2019-10-221-0/+2
| | | | | | | | | | | | | | | | | | Affects 30 shaders in the pipeline-db (all youngblood). Totals from affected shaders: SGPRS: 2656 -> 2456 (-7.53 %) VGPRS: 2260 -> 2260 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 240680 -> 240944 (0.11 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 90 -> 90 (0.00 %) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: add a few missing checks in value numberingRhys Perry2019-10-221-1/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: use ds_read2_b64/ds_write2_b64Rhys Perry2019-10-221-7/+24
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: properly combine additions into ds_write2_b64/ds_read2_b64Rhys Perry2019-10-221-1/+2
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: fix sparse store_lds()Rhys Perry2019-10-221-8/+40
| | | | | | | | | | p_extract_vector's second operand is in units of the definition size, not dwords. v2: move extract_subvector() to right before ds_write_helper Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: create load_lds/store_lds helpersRhys Perry2019-10-221-176/+195
| | | | | | | We'll want these for GS, since VS->GS IO on Vega is done using LDS. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: fix 64-bit p_extract_vector on 32-bit p_create_vectorRhys Perry2019-10-221-1/+2
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: small stage correctionsRhys Perry2019-10-222-11/+13
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* st/mesa: replace pipe_shader_state with tgsi_token* in st_vp_variantMarek Olšák2019-10-224-36/+48
| | | | | | we don't need more than that Reviewed-by: Kenneth Graunke <[email protected]>
* nir: allow nir_lower_uniforms_to_ubo to be run repeatedlyMarek Olšák2019-10-222-1/+7
| | | | | | for st/mesa Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno/ir3: fixup register footprint fixupRob Clark2019-10-221-1/+1
| | | | | | | | Small typo resulted in not converting footprint to vec4, meaning that we could potentially ask for quite a few more registers than required Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: handle scalarized varying inputsRob Clark2019-10-221-9/+12
| | | | | | | | | | | | | | | | | | | | | | If the load_interpolated_input is scalarized, we would be too conservative about deciding the tex instruction wasn't a candidate to pre-fetch: vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */) vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (0) /* interp_mode=0 */ vec1 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* packed:v_uv,v_uv1 */ vec1 32 ssa_3 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 1) /* base=0 */ /* component=1 */ /* packed:v_uv,v_uv1 */ vec2 32 ssa_8 = vec2 ssa_2, ssa_3 vec4 32 ssa_9 = tex ssa_8 (coord), 0 (texture), 0 (sampler) Really we don't care that the texcoord components come from different load_interpolated_input instructions, just that they have consecutive varying offsets. Reported-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* aco: refactor value numberingDaniel Schürmann2019-10-221-55/+53
| | | | | | | | | | Previously, we used one hashset per BB, so that we could always initialize the current hashset from the immediate dominator. This patch changes the behavior to a single hashmap using the block index per instruction to resolve dominance. Reviewed-by: Rhys Perry <[email protected]>
* mesa/st: assert that lowering is supportedErik Faye-Lund2019-10-221-0/+12
| | | | | | | | | Some of these lowerings aren't supported for drivers that supports tesselation and geometry shaders. Let's add a couple of asserts to make it obvious if these have been enabled when it's not possible. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gitlab-ci: Enable llvmpipe in ARM build jobsMichel Dänzer2019-10-222-2/+5
| | | | | | | | | v2: * Use LLVM 8 from buster-backports v3: * Use LLVM 7 again for armhf, llvmpipe is still broken there with LLVM 8 Acked-by: Eric Engestrom <[email protected]>
* gitlab-ci: Update the meson cross file for LLVM_VERSION as wellMichel Dänzer2019-10-221-3/+6
| | | | Cross builds don't use the llvm-config path from the native file.
* gitlab-ci: Use native aarch64 runner for ARM build jobsMichel Dänzer2019-10-223-41/+59
| | | | | | | | | | | | | | This allows running the regression tests. One downside is that we can't easily build the Vulkan overlay layer, because only x86 binaries of the glslang validator are available. If that's important, we could either use those binaries via qemu, or build it from source. v2: * Add :amd64 suffix to existing debian-9/10 job names (Eric Engestrom) Acked-by: Eric Engestrom <[email protected]> # v1
* gitlab-ci: Explicitly list debian-10 in needs: for .deqp-test templateMichel Dänzer2019-10-221-1/+3
| | | | | | | Apparently needs: in a definition overwrites inherited ones. So .deqp-test effectively didn't declare needs: for debian-10, which means any jobs based on .deqp-test could spuriously run after the debian-10 job failed or was cancelled.
* gitlab-ci: Bring ARM docker image install script in line with x86_64Michel Dänzer2019-10-221-2/+5
| | | | | | | | | | | | Use https:// URLs in the APT configuration. Drop --no-install-recommends, the image generation template disables installation of recommended packages in /etc/apt/apt.conf. Run apt-get autoremove at the end, cleaning up packages which were installed to satisfy dependencies but are no longer needed. Reviewed-by: Eric Engestrom <[email protected]>
* gitlab-ci: Sort ARM docker image packages in alphabetical orderMichel Dänzer2019-10-221-20/+20
| | | | | | No functional change. Reviewed-by: Eric Engestrom <[email protected]>
* radv: fix updating bound fast ds clear values with different aspectsSamuel Pitoiset2019-10-221-3/+13
| | | | | | | | | | | | | | | On GFX9, the driver is able to do an optimized fast depth/stencil clear with only one aspect (ie. clear the stencil part of a depth/stencil image). When this happens, the driver should only update the clear values of the given aspect. Note that it's currently only supported on GFX9 but I have some local patches that extend this optimized path for other gens. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967 Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel/compiler: Refactor disassembly of sources in 3src instructionSagar Ghuge2019-10-211-19/+10
| | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Don't move immediate in registerSagar Ghuge2019-10-211-0/+38
| | | | | | | | | On Gen12, we support mixed mode HF/F operands, and also 3 source instruction supports immediate value support, so keep immediate as it is, if it fits properly in 16 bit field. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Set bits according to source fileSagar Ghuge2019-10-211-2/+12
| | | | | | | | On Gen >= 12, if src0 or src2 holds immediate value, we need set src[0/2]_is_imm bits instead of register file. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Add Immediate support for 3 source instructionSagar Ghuge2019-10-211-21/+32
| | | | | | | | On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but not both. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* ci: Disable lima until its farm can get fixed.Eric Anholt2019-10-211-2/+2
| | | | | | | | | | It's been throwing the following error today: "<Fault -32603: 'Internal Server Error (contact server administrator for details): could not extend file "base/17952/18226": No space left on device\nHINT: Check free disk space.\n'>" Reviewed-by: Daniel Stone <[email protected]>
* intel: Add missing entry for brw_nir_lower_alpha_to_coverage in MakefileSagar Ghuge2019-10-211-0/+1
| | | | | | | Fixes: 7ecfbd4f6d4 ("nir: Add alpha_to_coverage lowering pass") Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* llvmpipe: handle compute shader launch with 0 threadsDave Airlie2019-10-211-0/+9
| | | | | | | | | If you set LP_NUM_THREADS=0 compute shaders would hang, just execute the workloads in sequence if we have no threads in the pool. Fixes: 1b24e3ba75 ("llvmpipe: add compute threadpool + mutex") Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: Add missing ir3_nir_lower_tex_prefetch.c to Android.mkMarijn Suijten2019-10-211-0/+1
| | | | | | | | | This file is created in 2a0d45ae6cf09d60c048d7854e3d082bf15e374f but addition to android makefiles was omitted. It breaks the build with missing references which are defined in this file. List the file in ir3_SOURCES to make the build succeed. Signed-off-by: Marijn Suijten <[email protected]>
* ac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointersSamuel Pitoiset2019-10-211-0/+1
| | | | | | | | | | | | This fixes some crashes with dEQP-VK.descriptor_indexing.* when read_first_invocation has its source from a descriptor. Most of these tests still fail because of an LLVM bug (they work with ACO). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: run opt_algebraic in a loopRhys Perry2019-10-211-3/+8
| | | | | | | | | | | | | | | | | Totals from affected shaders: SGPRS: 13920 -> 13656 (-1.90 %) VGPRS: 12972 -> 12960 (-0.09 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1005680 -> 1000648 (-0.50 %) bytes LDS: 91 -> 91 (0.00 %) blocks Max Waves: 688 -> 688 (0.00 %) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: use nir_lower_idiv_preciseRhys Perry2019-10-211-1/+1
| | | | | | | v7: rename _nv50/_llvm to _fast/_precise Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* nir/lower_idiv: add new llvm-based pathRhys Perry2019-10-218-17/+136
| | | | | | | | | | | | | | | | | v2: make variable names snake_case v2: minor cleanups in emit_udiv() v2: fix Panfrost build failure v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature v4: remove nir_op_urcp v5: drop nv50 path v5: rebase v6: add back nv50 path v6: add comment for nir_lower_idiv_path enum v7: rename _nv50/_llvm to _fast/_precise v8: fix etnaviv build failure Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* intel/compiler: Remove emit_alpha_to_coverage workaround from backendSagar Ghuge2019-10-212-84/+13
| | | | | | | | | | | | | Remove emit_alpha_to_coverage workaround from backend compiler and start using ported workaround from NIR. v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho) Fixes piglit test on HSW: - arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>