summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Split out layout code from image creation.Bas Nieuwenhuizen2019-10-101-61/+77
| | | | | | So we can delay the layout until later in some import cases. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Handle device memory alloc failure with normal free.Bas Nieuwenhuizen2019-10-101-12/+22
| | | | | | Less duplication/complexity. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Cleanup buffer_from_fd.Bas Nieuwenhuizen2019-10-103-6/+3
| | | | | | Unused stride/offset args. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement & enable VK_EXT_texel_buffer_alignment.Bas Nieuwenhuizen2019-10-102-0/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: use a compute shader for copying timestamp query resultsSamuel Pitoiset2019-10-102-30/+227
| | | | | | | | | | | | | | When the timestamp is not ready (ie. UINT64_MAX), the availabily bit should be zero. The previous code used to copy the timestamp value as the availabily bit and that's completely wrong. Because it's not that simple to emit a conditional with the CP, the driver now uses a compute shader for copying timestamp query results. Fixes dEQP-VK.pipeline.timestamp.misc_tests.reset_query_before_copy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: sync before resetting query pools if timestamps have been writtenSamuel Pitoiset2019-10-101-0/+10
| | | | | | | | Otherwise, the GPU might write timestamp queries after the reset operation. This is similar to other query operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: Clean up usages of PhysReg::reg from aco_assembler.Timur Kristóf2019-10-101-27/+27
| | | | | | | | | These are not needed anymore, since PhyReg has an implicit conversion operator that can convert it to unsigned int, which is equivalent to accessing this field. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Add extra assertion for number of FS input VGPRs.Timur Kristóf2019-10-101-0/+7
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Fix s_dcache_wb on GFX10.Timur Kristóf2019-10-102-0/+13
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Have s_waitcnt_vscnt write to NULL.Rhys Perry2019-10-101-2/+3
| | | | | | | | | Not sure if this instruction actually writes anything, but LLVM disassembles a destination and sets it to NULL. Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Use the VOP3-only add/sub GFX10 instructions if needed.Rhys Perry2019-10-101-1/+15
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Initial work to avoid GFX10 hazards.Rhys Perry2019-10-102-36/+117
| | | | | | | | | Currently just breaks up SMEM groups and fixes FeatureVMEMtoScalarWriteHazard (name from LLVM). Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: pad code with s_code_end on GFX10Rhys Perry2019-10-101-2/+13
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: workaround GFX10 0x3f branch bugRhys Perry2019-10-101-5/+39
| | | | | | | | | | | According to LLVM, branches with an offset of 0x3f are buggy. v2: (by Timur Kristóf) - extract the GFX10 specific part to its own function Signed-off-by: Rhys Perry <[email protected]> Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Fix VS input VGPRs on GFX10.Timur Kristóf2019-10-101-1/+5
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Assemble opsel in VOP3 instructions.Rhys Perry2019-10-102-2/+3
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Allow literals on VOP3 instructions.Rhys Perry2019-10-102-2/+4
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-By: Timur Kristóf <[email protected]>
* aco: Support subvector loops in aco_assembler.Timur Kristóf2019-10-102-1/+26
| | | | | | | These are currently not used, but could be useful later. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Set GFX10 dimensionality on the instructions that need it.Timur Kristóf2019-10-101-0/+21
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Use ac_get_sampler_dim, delete duplicate code.Timur Kristóf2019-10-101-44/+5
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Set GFX10 DLC bit properly.Timur Kristóf2019-10-102-0/+21
| | | | | | | | | The DLC bit is now set to 1 for all loads when GLC is also set, but cleared to 0 for all stores (otherwise it causes issues), and also cleared to 0 for atomics. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 VOP3 and VOP1 as VOP3 in aco_assembler.Timur Kristóf2019-10-101-6/+18
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 EXP in aco_assembler.Timur Kristóf2019-10-101-1/+7
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Fix GFX9 FLAT, SCRATCH, GLOBAL instructions, add GFX10 support.Timur Kristóf2019-10-102-8/+27
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 MIMG and GFX9 D16 in aco_assembler.Timur Kristóf2019-10-101-3/+17
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 MTBUF in aco_assembler.Timur Kristóf2019-10-102-10/+21
| | | | | | | | Also remove img_format from aco_ir, since it can be calculated from dfmt and nfmt. So only the assember needs to deal with it. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Link ACO with amd/common.Timur Kristóf2019-10-101-0/+3
| | | | | | | | | We'd like to use some functions, for example some ac_shader_util functions in ACO, so we need to link ACO to AC. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* amd/common: Add extern "C" to some headers that were missing it.Timur Kristóf2019-10-103-0/+24
| | | | | | | | | We'd like to include some of these in C++ code later. Specifically, ACO is written in C++ and we would like to use some of this code in ACO in order to avoid code duplication. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 MUBUF in aco_assembler.Timur Kristóf2019-10-101-1/+9
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 DS in aco_assembler.Timur Kristóf2019-10-101-2/+7
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 VINTRP in aco_assembler.Timur Kristóf2019-10-101-1/+9
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Support GFX10 SMEM in aco_assembler.Timur Kristóf2019-10-101-13/+60
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Add missing GFX10 specific fields and some README notes.Timur Kristóf2019-10-103-2/+33
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Set +wavefrontsize64 for LLVM disassembler in GFX10 wave64 mode.Timur Kristóf2019-10-104-7/+15
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: get the device name from radeon_info::nameSamuel Pitoiset2019-10-101-39/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: enable nir_opt_sinkRhys Perry2019-10-091-1/+1
| | | | | | | | | | | | | | | SGPRS: 880272 -> 838936 (-4.70 %) VGPRS: 705316 -> 680988 (-3.45 %) Spilled SGPRs: 1032 -> 832 (-19.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 252 -> 252 (0.00 %) dwords per thread Code Size: 55150788 -> 55172436 (0.04 %) bytes LDS: 451 -> 451 (0.00 %) blocks Max Waves: 66178 -> 68706 (3.82 %) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* amd: don't use AMD_FAMILY definitions from amdgpu_drm.hMarek Olšák2019-10-091-8/+8
| | | | | | use the ones from addrlib Reviewed-by: Samuel Pitoiset <[email protected]>
* aco: move s_andn2_b64 instructions out of the p_discard_ifRhys Perry2019-10-095-61/+54
| | | | | | | | | | | | And use a new p_discard_early_exit instruction. This fixes some cases where a definition having the same register as an operand causes issues. v2: rename instruction to p_exit_early_if v2: modify the existing instruction instead of creating a new one v3: merge the "i == num - 1" IFs Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: don't reorder instructions in order to lower boolean phisDaniel Schürmann2019-10-091-26/+8
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: re-use existing phi instruction when lowering boolean phisDaniel Schürmann2019-10-091-3/+18
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: Cleanup insert_before_logical_endMichael Schellenberger Costa2019-10-091-15/+11
| | | | | Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Rhys Perry <[email protected]>
* radv: bump minTexelBufferOffsetAlignment to 4Samuel Pitoiset2019-10-091-1/+1
| | | | | | | | | | The spec has probably been misinterpreted during RADV bringup. This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*. Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement VK_KHR_shader_clockSamuel Pitoiset2019-10-093-0/+9
| | | | | | NIR->LLVM and ACO already support nir_intrinsic_shader_clock. Signed-off-by: Samuel Pitoiset <[email protected]>
* amd/llvm: Fix warning due to asserted-only variable.Bas Nieuwenhuizen2019-10-081-1/+1
| | | | | | | | | | [212/893] Compiling C object 'src/amd/llvm/ce8261c@@amd_common_llvm@sta/ac_nir_to_llvm.c.o'. ../mesa/src/amd/llvm/ac_nir_to_llvm.c: In function ‘visit_image_atomic’: ../mesa/src/amd/llvm/ac_nir_to_llvm.c:2636:17: warning: unused variable ‘format’ [-Wunused-variable] 2636 | const GLenum format = nir_intrinsic_format(instr); | ^~~~~~ Reviewed-by: Marek Olšák <[email protected]>
* amd: Move all amd/common code that depends on LLVM to amd/llvm.Timur Kristóf2019-10-0817-36/+74
| | | | | | | | | | | | | This commit is a step towards the goal of being able to build RADV without LLVM. In the future we would like to offer the option to use RADV solely with ACO. There is still a need for the common AMD code located in amd/common but the LLVM specific parts need to be separated. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* aco: fix load_constant with multiple arraysRhys Perry2019-10-041-3/+3
| | | | | | | | | I thought I fixed this, but I guess I must have broken it again. Fixes various dEQP-VK.draw.* tests Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv/aco,aco: set lower_fmodRhys Perry2019-10-043-31/+1
| | | | | | | | | | | | | | | | | | | | | | | | This simplifies ACO and allows the lowered code to be optimized (in particular, constant folded). Totals from affected shaders: SGPRS: 1776 -> 1776 (0.00 %) VGPRS: 1436 -> 1436 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 203452 -> 203564 (0.06 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 103 -> 103 (0.00 %) At least some of the code size increase seems to be from literals being applied to instructions as a result of constant folding. v2: remove fmod/frem handling in init_context() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* ac/nir: remove unused code for nir_op_{fmod,frem}Samuel Pitoiset2019-10-031-14/+0
| | | | | | | RADV and RadeonSI both lower these two NIR instructions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable lower_fmod for the LLVM pathSamuel Pitoiset2019-10-031-0/+1
| | | | | | | | | | | | This lowers fmod and frem at NIR level like RadeonSI. fmod is already lowered directly in NIR->LLVM, and frem will be lowered by LLVM anyways. This fixes a LLVM crash with: dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix warning in 32-bit build.Bas Nieuwenhuizen2019-10-031-2/+3
| | | | | | | | uintptr_t is 32 bits in a 32-bits build, resulting in shifting out of bounds. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>