aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: enable AMD_shader_ballot with RADV_PERFTEST_SHADER_BALLOT ↵Daniel Schürmann2019-06-135-1/+9
| | | | | | ('shader_ballot') Reviewed-by: Connor Abbott <[email protected]>
* amd/common: add support for AMD_shader_ballot functionsDaniel Schürmann2019-06-131-0/+20
| | | | Reviewed-by: Connor Abbott <[email protected]>
* spirv/nir: add support for AMD_shader_ballot and Groups capabilityDaniel Schürmann2019-06-131-2/+3
| | | | | | | | This commit also renames existing AMD capabilities: - gcn_shader -> amd_gcn_shader - trinary_minmax -> amd_trinary_minmax Reviewed-by: Connor Abbott <[email protected]>
* radv: enable shader_subgroup_vote & shader_subgroup_ballot extensionsDaniel Schürmann2019-06-131-0/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* radv: flush and invalidate CB before resetting query pools on GFX9Samuel Pitoiset2019-06-131-0/+4
| | | | | | | | | | | | We have to emit a CACHE_FLUSH_AND_INV_TS_EVENT to be sure all prior GPU work is done. While we are at it, also flush and invalidate DB. This fixes the following CTS (when the small hint is disabled): dEQP-VK.query_pool.statistics_query.reset_before_copy.* Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: Always disable DCC on shareable images.Bas Nieuwenhuizen2019-06-131-3/+1
| | | | | | | Do not want it for perf reasons. Always have to disable DCC when transferring to external queue. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Skip transitions coming from external queue.Bas Nieuwenhuizen2019-06-131-0/+3
| | | | | | | | Transitions to external queue should do the transition & make sure it works on all queues. Fixes: 8ebc7dcb59a "radv: Allow fast clears with concurrent queue mask for some layouts." Reviewed-by: Samuel Pitoiset <[email protected]>
* amd/rtld: layout and relocate LDS symbolsNicolai Hähnle2019-06-122-19/+235
| | | | | | | | | | | Upcoming changes to LLVM will emit LDS objects as symbols in the ELF symbol table, with relocations that will be resolved with this change. Callers will also be able to define LDS symbols that are shared between shader parts. This will be used by radeonsi for the ESGS ring in gfx9+ merged shaders. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: use ARRAY_SIZE for the LLVM command line optionsNicolai Hähnle2019-06-121-2/+2
| | | | | | This is more convenient for changing it around during debug. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add ac_compile_module_to_elfNicolai Hähnle2019-06-122-7/+83
| | | | | | | A new variant of ac_compile_module_to_binary that allows us to keep the entire ELF around. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use ac_shader_configNicolai Hähnle2019-06-121-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add a more powerful runtime linkerNicolai Hähnle2019-06-125-0/+655
| | | | | | | | | Using an explicit linker instead of just concatenating .text sections will allow us to start using .rodata sections and explicit descriptions of data on LDS that is shared between stages. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: clarify ac_shader_binary::lds_sizeNicolai Hähnle2019-06-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: extract ac_parse_shader_binary_configNicolai Hähnle2019-06-122-34/+47
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radv: fix VK_EXT_memory_budget if one heap isn't availableSamuel Pitoiset2019-06-121-27/+33
| | | | | | | | | | | When the visible VRAM size is equal to the VRAM size only two heaps are exposed. This fixes dEQP-VK.api.info.device.memory_budget. Cc: 19.0 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: fix occlusion queries on VegaMSamuel Pitoiset2019-06-121-21/+27
| | | | | | | | | | | | | | The number of render backends is 16 but the enabled mask is 0xaaaa. As noticed by Bas, allowing disabled render backends might break the OCCLUSION_QUERY packet. We don't use it yet but keep this in mind. This fixes dEQP-VK.query_pool.* and dEQP-VK.multiview.*. Cc: 19.0 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radeonsi: use the ac helper for index buffer stores in the culling shaderMarek Olšák2019-06-113-3/+5
|
* radv: assert on inline uniform blocks in radv_CmdPushDescriptorSetKHR()Samuel Iglesias Gonsálvez2019-06-111-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | According to the Vulkan spec, inline uniform blocks are not allowed to be updated through vkCmdPushDescriptorSetKHR(). These are the spec quotes from "13.2.1. Descriptor Set Layout" that are relevant for this case: "VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies that descriptor sets must not be allocated using this layout, and descriptors are instead pushed by vkCmdPushDescriptorSetKHR." "If flags contains VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all elements of pBindings must not have a descriptorType of VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT". There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid this case but it is implied in the creation of the descriptor set layout as aforementioned. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: remove extra assignment in radv_decompress_resolve_subpass_src()Samuel Pitoiset2019-06-111-1/+0
| | | | | | baseArrayLayer is defined twice, trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: add radv_get_resolve_pipeline() helper in the graphics pathSamuel Pitoiset2019-06-111-12/+29
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: do not decompress all image layers before resolving inside a subpassSamuel Pitoiset2019-06-111-3/+9
| | | | | | | | When decompressing resolve source images, we should rely on the framebuffer layer count instead of resolving all images layers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: initialize the aspect mask when decompressing resolve source imagesSamuel Pitoiset2019-06-111-0/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: perform proper layout transitions before resolvingSamuel Pitoiset2019-06-111-19/+19
| | | | | | | | Use an explicit pipeline barrier for doing layout transitions instead of duplicating some code. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: do not resolve all image layers with compute inside a subpassSamuel Pitoiset2019-06-111-4/+8
| | | | | | | | | When resolving inside a subpass, we should rely on the framebuffer layer count instead of resolving all images layers. This should improve performance of layered resolves a bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: Handle UNDEFINED format in image format list.Bas Nieuwenhuizen2019-06-101-0/+6
| | | | | | | | | | | | | Was watching a presentation on YT where this was used and it turns out it is not invalid. The only case it is actually valid as format in the creation of an image or image view is with Android Hardware Buffers which have their format specified externally. So we can just ignore all entries with VK_FORMAT_UNDEFINED. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Prevent out of bound shift on 32-bit builds.Bas Nieuwenhuizen2019-06-101-2/+2
| | | | | | | | uintptr_t is 32-bits then and shifting it by 32 bits results in undefined behavior IIRC. Fixes: b3c8de1c55c "radv: save all descriptor pointers into the trace BO" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix setting CB_SHADER_MASK for dual source blendingSamuel Pitoiset2019-06-101-2/+5
| | | | | | | | | | | CB_SHADER_MASK was computed without the second color buffer format which looks totally wrong to me. While we are at it, copy a comment from RadeonSI. Cc: 19.0 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: fix alpha-to-coverage when there is unused color attachmentsSamuel Pitoiset2019-06-101-1/+1
| | | | | | | | | | | | When alphaToCoverage is enabled, we should always write the alpha channel of MRT0 if it's unused. This now matches RadeonSI. This fixes the new CTS: dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible Cc: 19.0 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]
* radv: enable VK_EXT_sample_locationsSamuel Pitoiset2019-06-072-9/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: enable HTILE for images that might need variable sample locationsSamuel Pitoiset2019-06-071-7/+0
| | | | | | | This is now supported. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: handle sample locations during automatic layout transitionsSamuel Pitoiset2019-06-072-18/+168
| | | | | | | | | | | | | | | | From the Vulkan spec 1.1.109: "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. [...] and VkRenderPassSampleLocationsBeginInfoEXT can be chained from VkRenderPassBeginInfo to provide sample locations for layout transitions performed implicitly by a render pass instance." Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: determine the first subpass id for every attachmentsSamuel Pitoiset2019-06-072-1/+20
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: handle sample locations during explicit depth/stencil transitionsSamuel Pitoiset2019-06-071-7/+28
| | | | | | | | | | | | | | | | | | | From the Vulkan spec 1.1.109, "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. VkSampleLocationsInfoEXT can be chained from VkImageMemoryBarrier structures to provide sample locations for layout transitions performed by vkCmdWaitEvents and vkCmdPipelineBarrier calls." This handles explicit depth/stencil layout transitions performed with CmdWaitEvents() or CmdPipelineBarrier(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: allow the depth decompress pass to emit dynamic sample locationsSamuel Pitoiset2019-06-073-7/+31
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: allow to set dynamic sample locations to the depth decompress passSamuel Pitoiset2019-06-071-1/+8
| | | | | | | | If VK_EXT_sample_locations is used, the driver might need to emit the sample locations specified during layout transitions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: allow to save/restore sample locations during meta operationsSamuel Pitoiset2019-06-072-0/+14
| | | | | | | | This will be used for the depth decompress pass that might need to emit variable sample locations during layout transitions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* ac/nir: Remove stale TODOConnor Abbott2019-06-061-1/+7
| | | | | | While we're here, copy the comment explaining this from radeonsi. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: set the subpass before any initial subpass transitionsSamuel Pitoiset2019-06-061-2/+3
| | | | | | | | This might fix initial subpass transitions when multiview is used. Noticed while implementing sample locations during layout transitions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: use only one descriptor in the fmask expand passSamuel Pitoiset2019-06-051-24/+3
| | | | | | | | This removes one useless SMEM load operations which pointed to the same descriptor anyway. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: set ACCESS_NON_READABLE on the fmask expand pass output imageSamuel Pitoiset2019-06-051-0/+1
| | | | | | | The driver will emit GLC=1. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: remove one useless image type in the fmask expand shaderSamuel Pitoiset2019-06-051-6/+3
| | | | | | | Both input and output images use the same type. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* ac: rename LLVM <= 7 helpers for readabilityMarek Olšák2019-06-041-37/+37
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix a typo in ac_build_wg_scan_bottomMarek Olšák2019-06-041-1/+1
| | | | | Cc: 19.1 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Use bo metadata for imported image tiling on Android.Bas Nieuwenhuizen2019-06-043-14/+61
| | | | | | This way we handle linear images etc. correctly. Acked-by: Samuel Pitoiset <[email protected]>
* ac/nir: mark some texture intrinsics as convergentRhys Perry2019-06-041-0/+18
| | | | | | | | | | | | Otherwise LLVM can sink them and their texture coordinate calculations into divergent branches. v2: simplify the conditions on which the intrinsic is marked as convergent v3: only mark as convergent in FS and CS with derivative groups Cc: <[email protected]> Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: fix some compiler warningsRhys Perry2019-06-041-4/+4
| | | | | | | | | Fixes -Woverflow warnings with GCC 9.1.1 v2: use a cast instead of a bitwise and Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: do not use gfx fast depth clears for layered depth/stencil imagesSamuel Pitoiset2019-06-041-0/+1
| | | | | | | | | | The driver should only fast depth clears with the graphics path when the view covers all image layers, otherwise this might corrupt layers when HTILE is enabled. Cc: 19.0 19.1 [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac,radv: do not emit vec3 for raw load/store on SISamuel Pitoiset2019-06-044-8/+20
| | | | | | | | It's unsupported, only load/store format with vec3 are supported. Fixes: 6970a9a6ca9 ("ac,radv: remove the vec3 restriction with LLVM 9+")" Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/registers: don't use the si, cik, vi names, use gfxNMarek Olšák2019-06-036-1405/+1405
| | | | trivial
* amd/common: use generated register headerNicolai Hähnle2019-06-0316-16351/+19
|