summaryrefslogtreecommitdiffstats
path: root/src/amd/vulkan
Commit message (Collapse)AuthorAgeFilesLines
* radv: implement VK_EXT_external_memory_hostFredrik Höglund2018-02-086-8/+137
| | | | | | | Ported from the radeonsi GL_AMD_pinned_memory implementation. Signed-off-by: Fredrik Höglund <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: run nir_opt_shrink_loadSamuel Pitoiset2018-02-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLVM can't shrink loads. Polaris10: Totals from affected shaders: SGPRS: 62528 -> 59955 (-4.11 %) VGPRS: 44708 -> 44616 (-0.21 %) Spilled SGPRs: 16 -> 8 (-50.00 %) Code Size: 1355504 -> 1355172 (-0.02 %) bytes Max Waves: 11710 -> 11670 (-0.34 %) Vega10: Totals from affected shaders: SGPRS: 51448 -> 50371 (-2.09 %) VGPRS: 39140 -> 39048 (-0.24 %) Spilled SGPRs: 16 -> 16 (0.00 %) Code Size: 1307188 -> 1304296 (-0.22 %) bytes Max Waves: 11312 -> 11292 (-0.18 %) This reduces SGPRs spilling in MadMax, and it also reduces number of SGPRs in DOW3 and F12017. The number of waves slightly decreases in F1 but I don't see any performance changes after benchmarking it. Talos and Serious Sam are not affected because they don't use any push constants. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't support tc-compat on multisample d32s8 at all.Dave Airlie2018-02-061-2/+2
| | | | | | | | | | | | RX550 fails dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_2 So increase the range of the workaround. Fixes: f4c534ef6 (radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1)) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* amd: remove support for LLVM 3.9Marek Olšák2018-02-021-4/+0
| | | | | | | | | | | Only these are supported: - LLVM 4.0 - LLVM 5.0 - LLVM 6.0 - master (7.0) Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't expose VK_KHX_multiview on android.Bas Nieuwenhuizen2018-02-011-1/+1
| | | | | | | | | | deqp does not allow any KHX extensions, and since deqp is included in android-cts, android does not allow any khx extensions. So disable VK_KHX_multiview on android. Reviewed-by: Samuel Pitoiset <[email protected]> CC: 18.0 <[email protected]>
* radv: do not insert shaders in cache when it's disabledSamuel Pitoiset2018-02-011-5/+24
| | | | | | | | | | | | | When the application doesn't provide its own pipeline cache, the driver uses a in-memory cache but it shouldn't insert any entries when the cache is explicitely disabled by the user. Found while running my experimental pipeline-db tool with a ton of shaders, the memory footprint was just huge, and sometimes the process was even killed... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use separate bindings for graphics and compute descriptorsSamuel Pitoiset2018-02-013-53/+125
| | | | | | | | | | | | | The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store the bind point when creating descriptors with templatesSamuel Pitoiset2018-02-012-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not dump meta shader statsSamuel Pitoiset2018-01-312-21/+18
| | | | | | | That's quite useless and that pollutes the output. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove predication on cache flushesMatthew Nicholls2018-01-314-18/+13
| | | | | | | | | This can lead to a situation where cache flushes could get conditionally disabled while still clearing the flush_bits, and thus flushes due to application pipeline barriers may never get executed. Fixes: a6c2001ace (radv: add support for cmd predication.) Signed-off-by: Dave Airlie <[email protected]>
* radv: Merge raster state with PM4 generation.Bas Nieuwenhuizen2018-01-302-75/+50
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move gs state out of pipeline.Bas Nieuwenhuizen2018-01-302-43/+43
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out cliprect rule generation.Bas Nieuwenhuizen2018-01-302-25/+33
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge VGT_GS_MODE computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-28/+25
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out processing the vertex input state.Bas Nieuwenhuizen2018-01-301-35/+43
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move tessellation state out of pipeline.Bas Nieuwenhuizen2018-01-302-50/+58
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move blend state out of pipeline.Bas Nieuwenhuizen2018-01-302-67/+72
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out generating VGT_SHADER_STAGES_EN.Bas Nieuwenhuizen2018-01-302-24/+27
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out the ia_multi_vgt_param precomputation.Bas Nieuwenhuizen2018-01-303-91/+106
| | | | | | | | | Also moved everything in a struct and then return the struct from the helper function, so it is clear in the caller what part of the pipeline gets modified. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out db_shader_control computation.Bas Nieuwenhuizen2018-01-302-22/+22
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Compute shader_z_format when emitting it.Bas Nieuwenhuizen2018-01-302-8/+3
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge depth stencil state with PM4 generation.Bas Nieuwenhuizen2018-01-302-73/+58
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge ps_input_cntl computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-83/+79
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vtx_reuse_depth computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-8/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vs state computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-58/+34
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge binning state generation with pm4 emission.Bas Nieuwenhuizen2018-01-302-35/+19
| | | | | | | We don't need the pipeline state struct anymore. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Constify some pipeline helpers.Bas Nieuwenhuizen2018-01-302-6/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add PM4 pregeneration for compute pipelines.Bas Nieuwenhuizen2018-01-302-58/+68
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Record a PM4 sequence for graphics pipeline switches.Bas Nieuwenhuizen2018-01-303-451/+483
| | | | | | | | | | | | | This gives about 2% performance improvement on dota2 for me. This is mostly a mechanical copy and replacement, but at bind time we still do: 1) Some stuff that is only based on num_samples changes. 2) Some command buffer state setting. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Determine unneeded dynamic states.Bas Nieuwenhuizen2018-01-303-38/+64
| | | | | | | Which avoids setting or emitting them. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: add vs_inputs_dual_locations compiler optionTimothy Arceri2018-01-301-0/+1
| | | | | | | | | | | | | Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <[email protected]>
* radv/gfx9: fix block compression texture views. (v2)Dave Airlie2018-01-301-4/+49
| | | | | | | | | | | | | | | | This ports a fix from amdvlk, to fix the sizing for mip levels when block compressed images are viewed using uncompressed views. My original fix didn't power the clamping, but it looks like the clamping is required to stop the sizing going too large. Fixes: dEQP-VK.image.texel_view_compatible.graphic.extended*bc* Doesn't crash DOW3 anymore. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."' Signed-off-by: Dave Airlie <[email protected]>
* radv: Signal fence correctly after sparse binding.Bas Nieuwenhuizen2018-01-291-14/+32
| | | | | | | | | It did not signal syncobjs in the fence, and also signalled too early if there was work on the queue already, as we have to wait till that work is done. Fixes: d27aaae4d2 "radv: Add external fence support." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix RADV_DEBUG=syncshaders on GFX9Samuel Pitoiset2018-01-261-1/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix a GPU hang with RADV_DEBUG=syncshadersSamuel Pitoiset2018-01-261-8/+7
| | | | | | | | | The GPU hangs when the driver forces a PS_PARTIAL_FLUSH after a dispatch call (and vice versa for graphics). Something has changed in the kernel driver because it used to work. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/shader: scan if fragment shaders write memorySamuel Pitoiset2018-01-261-3/+3
| | | | | | | It's better to do that in ac_shader_info. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1)Dave Airlie2018-01-261-1/+2
| | | | | | | | | | | | | | | | | | This seems to be broken, at least the cts tests fail. This fixes: dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_4 dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_8 2 samples seems to pass fine, amdvlk doesn't appear to enable TC for possibly some other reasons here. This is most likely a hack. v1.1: add a bit of explaination text. (Samuel) Fixes: ad3d98da9 (radv: enable tc compatible htile for d32s8 also.) Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: add multisample Z optimisation from amdvlkDave Airlie2018-01-251-0/+3
| | | | | | | | This was just found while reading for other stuff, src/core/hw/gfxip/gfx6/gfx6DepthStencilView.cpp. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move spi_baryc_cntl to pipelineDave Airlie2018-01-253-5/+5
| | | | | | | | | | | | We need to enable the pos float location 2 mode anytime we have persample not just when forced by the frag shader. This fixes: dEQP-VK.pipeline.multisample.min_sample_shading* Fixes: 58c97a079 (radv: enable location at sample when persample is forced.) Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: fix sample_mask_in loading. (v3.1)Dave Airlie2018-01-242-5/+26
| | | | | | | | | | | | | | This is ported from radeonsi and fixes: dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.bit_* v2: don't call this path for radeonsi, it does it in the epilog. use the radeonsi code path. v3: handle NULL pCreateInfo->pMultisampleState properly (Samuel) v3.1: set ps_iter_samples default to 1 (Bas) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: bdcbe7c76 (radv: add sample mask input support) Signed-off-by: Dave Airlie <[email protected]>
* radv: don't use hw resolves for r16g16 norm formats.Dave Airlie2018-01-241-1/+4
| | | | | | | | | | | | | radeonsi has a workaround for this, but it uses a R16A16 format, which vulkan doesn't have, we could probably come up with a work around but for now just avoid hw resolves. Fixes: dEQP-VK.renderpass.suballocation.multisample.r16g16_*norm* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: 2a04f5481d (radv/meta: select resolve paths) Signed-off-by: Dave Airlie <[email protected]>
* radv: don't use hw resolve for integer image formatsDave Airlie2018-01-241-0/+5
| | | | | | | | | | | | | | From reading AMDVLK it currently never uses hw resolve paths. This patch takes from radeonsi which doesn't use hw resolve for integer formats, and does the same for radv. This fixes: dEQP-VK.renderpass.suballocation.multisample*uint tests. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: 2a04f5481d (radv/meta: select resolve paths) Signed-off-by: Dave Airlie <[email protected]>
* radv: add fs_key meta format support to resolve passes.Dave Airlie2018-01-242-30/+61
| | | | | | | | | | | | Some of the hw resolve passes need the SPI color format setup correctly. This fixes lots of 16-bit and 32-bit format tests in dEQP-VK.renderpass.suballocation.multisample* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: f4e499ec7914 "radv: add initial non-conformant radv vulkan driver" Signed-off-by: Dave Airlie <[email protected]>
* radv: add an option that allows to dump pre-optimization irSamuel Pitoiset2018-01-223-0/+4
| | | | | | | With RADV_DEBUG=preoptir. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: restore previous stencil reference after depth-stencil clearMatthew Nicholls2018-01-221-0/+6
| | | | | | Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Alex Smith <[email protected]>
* radv: Don't allow 3d or 1d depth/stencil textures.Bas Nieuwenhuizen2018-01-221-0/+3
| | | | | | | | | | addrlib asserts when that happens, and supporting it is not required so lets not allow this for now. It also assert on fmask, but we don't have the number of samples here. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Init variant entry with memset.Bas Nieuwenhuizen2018-01-221-0/+1
| | | | | | | | This gets memcpy'd and written driectly, and due to alignment, this resulted in uninitialized gaps. This makes those gaps go away. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Fix bufimage failure deallocation.Bas Nieuwenhuizen2018-01-221-4/+6
| | | | | | | The inidividual init parts don't clean up their own stuff on failure. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Fix fragment resolve init memory allocation failure paths.Bas Nieuwenhuizen2018-01-221-8/+6
| | | | | CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Fix freeing meta state if the device pipeline cache fails to allocate.Bas Nieuwenhuizen2018-01-221-1/+3
| | | | | CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>