summaryrefslogtreecommitdiffstats
path: root/src/amd/vulkan
Commit message (Collapse)AuthorAgeFilesLines
* radv: Derive android usage from create flags.Bas Nieuwenhuizen2019-10-103-0/+43
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Disallow sparse shared images.Bas Nieuwenhuizen2019-10-101-8/+7
| | | | | | | | | Since we really cannot share them ever. Also remove an unused switch. Fixes: b70829708ac "radv: Implement VK_KHR_external_memory" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/android: Add android hardware buffer queries.Bas Nieuwenhuizen2019-10-102-0/+182
| | | | | | | | | | | | Derived from the Intel code. For the internal format we just use the internal Vulkan format, as we have Vulkan formats for all android formats we care about. For the ycbcr properties we just do something. I do not have a real clue what would be recommended. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/android: Add android hardware buffer field to device memory.Bas Nieuwenhuizen2019-10-102-0/+13
| | | | | | | You cannot go from BO to Android hardware buffer, so for export we have to remember it. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add VK_ANDROID_external_memory_android_hardware_buffer.Bas Nieuwenhuizen2019-10-102-0/+14
| | | | | | Still disabled but now we can add entrypoints. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Unset vk_info in radv_image_create_layout.Bas Nieuwenhuizen2019-10-101-4/+8
| | | | | | For better test coverage of this corner case. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Handle slightly different image dimensions.Bas Nieuwenhuizen2019-10-101-11/+99
| | | | | | | | | | | | | The minigbm comment really says it all. We should fix minigbm as well, but for now this is the more robust solution. Note that this only changes width and height for the surface creation, not for the image and hence also not for the sampler, where it would wreak havoc due to the normalized coords. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Delay patching for imported images until layout time.Bas Nieuwenhuizen2019-10-101-26/+35
| | | | | | | | | | | | | We want this flexibility because in GFX10 we lose any stride fields, so we have to make sure our width/height are in alignment with the external image we import. Furthermore, we need the ability to inject tiling modifiers on import time which is strictly after create time for Android. So, with the layout & patch functions being fully independent of pCreateInfo, we can delay it until import/bind time. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out layout code from image creation.Bas Nieuwenhuizen2019-10-101-61/+77
| | | | | | So we can delay the layout until later in some import cases. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Handle device memory alloc failure with normal free.Bas Nieuwenhuizen2019-10-101-12/+22
| | | | | | Less duplication/complexity. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Cleanup buffer_from_fd.Bas Nieuwenhuizen2019-10-103-6/+3
| | | | | | Unused stride/offset args. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement & enable VK_EXT_texel_buffer_alignment.Bas Nieuwenhuizen2019-10-102-0/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: use a compute shader for copying timestamp query resultsSamuel Pitoiset2019-10-102-30/+227
| | | | | | | | | | | | | | When the timestamp is not ready (ie. UINT64_MAX), the availabily bit should be zero. The previous code used to copy the timestamp value as the availabily bit and that's completely wrong. Because it's not that simple to emit a conditional with the CP, the driver now uses a compute shader for copying timestamp query results. Fixes dEQP-VK.pipeline.timestamp.misc_tests.reset_query_before_copy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: sync before resetting query pools if timestamps have been writtenSamuel Pitoiset2019-10-101-0/+10
| | | | | | | | Otherwise, the GPU might write timestamp queries after the reset operation. This is similar to other query operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: get the device name from radeon_info::nameSamuel Pitoiset2019-10-101-39/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: bump minTexelBufferOffsetAlignment to 4Samuel Pitoiset2019-10-091-1/+1
| | | | | | | | | | The spec has probably been misinterpreted during RADV bringup. This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*. Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement VK_KHR_shader_clockSamuel Pitoiset2019-10-093-0/+9
| | | | | | NIR->LLVM and ACO already support nir_intrinsic_shader_clock. Signed-off-by: Samuel Pitoiset <[email protected]>
* amd: Move all amd/common code that depends on LLVM to amd/llvm.Timur Kristóf2019-10-081-2/+2
| | | | | | | | | | | | | This commit is a step towards the goal of being able to build RADV without LLVM. In the future we would like to offer the option to use RADV solely with ACO. There is still a need for the common AMD code located in amd/common but the LLVM specific parts need to be separated. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* radv/aco,aco: set lower_fmodRhys Perry2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This simplifies ACO and allows the lowered code to be optimized (in particular, constant folded). Totals from affected shaders: SGPRS: 1776 -> 1776 (0.00 %) VGPRS: 1436 -> 1436 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 203452 -> 203564 (0.06 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 103 -> 103 (0.00 %) At least some of the code size increase seems to be from literals being applied to instructions as a result of constant folding. v2: remove fmod/frem handling in init_context() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: enable lower_fmod for the LLVM pathSamuel Pitoiset2019-10-031-0/+1
| | | | | | | | | | | | This lowers fmod and frem at NIR level like RadeonSI. fmod is already lowered directly in NIR->LLVM, and frem will be lowered by LLVM anyways. This fixes a LLVM crash with: dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix warning in 32-bit build.Bas Nieuwenhuizen2019-10-031-2/+3
| | | | | | | | uintptr_t is 32 bits in a 32-bits build, resulting in shifting out of bounds. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix condition for skipping the continue CS.Bas Nieuwenhuizen2019-10-031-1/+2
| | | | | | | | | We need the continue CS for referencing the tess/GDS/sample position BOs. Fixes: 46e52df34d3 "radv: add tessellation ring allocation support. (v2)" Fixes: e1dc3ab7534 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout" Fixes: 1171b304f30 "radv: overhaul fragment shader sample positions." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/gfx10: fix the ESGS ring size symbolSamuel Pitoiset2019-10-021-19/+1
| | | | | | | | Random hangs no longer happen, I'm actually not sure if they were related to this. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix buildSamuel Pitoiset2019-10-021-1/+1
| | | | | | Forgot to amend the commit before updating the MR. Signed-off-by: Samuel Pitoiset <[email protected]>
* Revert "radv: disable viewport clamping even if FS doesn't write Z"Samuel Pitoiset2019-10-021-1/+3
| | | | | | | | This was actually the wrong fix. This reverts commit 0a313cc285c2939de9cac07f045b0b699bc208ca. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rework the slow depthstencil clear to write depth from PSSamuel Pitoiset2019-10-021-6/+12
| | | | | | | | | | Make sure to export the expected clear values to the depth stencil attachment. This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix NGG streamout with triangle strips for VSSamuel Pitoiset2019-10-024-1/+13
| | | | | | | | | | The number of vertices has to be adjusted with the output primitive type. This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix storing/loading NGG stream outputs for GSSamuel Pitoiset2019-10-021-12/+77
| | | | | | | | | | | The GS outputs are stored differently in the LDS storage, they are indexed by out_idx which is incremented for each stored DWORD. Thus, we need a different path for exporting the stream outputs. This fixes a bunch of CTS failures when NGG GS is force enabled. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: use the component mask when storing/loading NGG stream outputsSamuel Pitoiset2019-10-021-0/+6
| | | | | | | It's unnecessary to store/load more components that needed. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix storing/loading NGG stream outputs for VS and TESSamuel Pitoiset2019-10-021-8/+10
| | | | | | | | | | | | | The LDS storage allocated for stream outputs is 4 * N, where N is the number of outputs. So, we have to store/load with N as index and not with the output location as index. This doesn't fix anything known but it should fix out-of-bounds access and it also reduces the number of outputs written to the LDS storage. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: add missing counter buffer to the BO listSamuel Pitoiset2019-10-021-0/+2
| | | | | | | The buffer isn't necessarily used before. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: add radv_device::use_nggSamuel Pitoiset2019-10-023-3/+8
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: Don't lower subtractionsDaniel Schürmann2019-09-301-1/+0
| | | | | | | | | | | | | | | | | 40228 shaders in 20236 tests Totals: SGPRS: 2045512 -> 2046496 (0.05 %) VGPRS: 1430856 -> 1430464 (-0.03 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 77202840 -> 77151832 (-0.07 %) bytes LDS: 863 -> 863 (0.00 %) blocks Max Waves: 260729 -> 260754 (0.01 %) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* android: aco: add support for libmesa_acoMauro Rossi2019-09-281-1/+3
| | | | | | | | | | | | | | | | | | Android building rules are added in src/amd/Android.compiler.mk libmesa_aco static library is built conditionally to radeonsi as done for vulkan.radv module This will prevent Android build errors for non x86 systems filter-out compiler/aco_instruction_selection_setup.cpp source, as already included by compiler/aco_instruction_selection.cpp and would cause several multiple definition linker errors NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler") Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler") Signed-off-by: Mauro Rossi <[email protected]>
* radv: Fix L2 cache rinse programming.Timur Kristóf2019-09-261-5/+9
| | | | | | | | According to radeonsi, GLM doesn't support WB alone, so we have to set INV too when WB is set. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add debug option to dump meta shaders.Timur Kristóf2019-09-263-2/+6
| | | | | | | | This new option can help debug shader compiler problems when there are issues with the meta shaders. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Introduce ac_get_fs_input_vgpr_cnt.Timur Kristóf2019-09-261-33/+1
| | | | | | | | | | | Add a function called ac_get_fs_input_vgpr_cnt which will return the number of input VGPRs used by an AMD shader. Previously, radv and radeonsi had the same code duplicated, but this commit also allows them to share this code. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Set shared VGPR count in radv_postprocess_config.Timur Kristóf2019-09-262-2/+18
| | | | | | | | This commit allows RADV to set the shared VGPR count according to the shader config. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_stringRhys Perry2019-09-265-15/+15
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: return a correct name and description for the backend IRRhys Perry2019-09-263-2/+9
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco,radv/aco: get dissassembly for release builds if requestedRhys Perry2019-09-261-6/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: actually disable ACO when unsupportedRhys Perry2019-09-261-1/+0
| | | | | | | | | We were setting this twice. The second time, we weren't later disabling it if unsupported. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix s/load/store/ copy-paste typoEric Engestrom2019-09-241-1/+1
| | | | | | Fixes: cdc6efddf918bc07d30d ("radv: implement all depth/stencil resolve modes using graphics") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add workaround for hang in The Surge 2.Bas Nieuwenhuizen2019-09-241-0/+8
| | | | | | | | | | | Released today and hangs on RADV. We don't have the root cause yet, but this should unblock people playing the game. No drirc because the radv debugflags are not usable from drirc and I want this backported. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove dead shared variablesDaniel Schürmann2019-09-191-1/+1
| | | | | | | LLVM does this anyway, but for ACO we need to do it in NIR. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: enable VK_EXT_shader_demote_to_helper_invocationDaniel Schürmann2019-09-193-0/+8
| | | | | | | For now, this extension will only be enabled for ACO. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable clustered reductionsDaniel Schürmann2019-09-191-0/+1
| | | | | | | These work with both, LLVM and ACO. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: Setup alternate path in RADV to support the experimental ACO compilerDaniel Schürmann2019-09-199-103/+201
| | | | | | | | | | LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco. Co-authored-by: Daniel Schürmann <[email protected]> Co-authored-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add DFSM support.Bas Nieuwenhuizen2019-09-181-5/+17
| | | | | | | | | | | | Apparently we already enabled it without having support ... Not sure if we also need to set disable_start_of_prim when the PS has memory writes, but this mirrors radeonsi. Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to ~32 pixels/cycle on a Raven. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Disable dfsm by default even on Raven.Bas Nieuwenhuizen2019-09-182-3/+4
| | | | | | When actually implementing it, Talos on low is still 3% slower. Reviewed-by: Samuel Pitoiset <[email protected]>