aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd/llvm
Commit message (Collapse)AuthorAgeFilesLines
* ac: add a bug workaround for the 100% NGG culling caseMarek Olšák2020-03-091-0/+33
| | | | | | Fixes: 8db00a51f85 - radeonsi/gfx10: implement NGG culling for 4x wave32 subgroups Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4079>
* amd: join emit_kill() from radv and radeonsi in ac_nir_to_llvmDaniel Schürmann2020-03-092-3/+1
| | | | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
* amd/llvm: implement nir_intrinsic_demote(_if) and ↵Daniel Schürmann2020-03-093-11/+132
| | | | | | | | | | nir_intrinsic_is_helper_invocation The current implementation uses a temporary helper variable to ensure correct behavior until LLVM provides an intrinsic. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
* radeonsi: remove AMD_DEBUG=sisched optionPierre-Eric Pelloux-Prayer2020-03-062-11/+9
| | | | | | | | | sisched is not maintained anymore in LLVM. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4059> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4059>
* ac/llvm: flush denorms for nir_op_fmed3 on GFX8 and older gensSamuel Pitoiset2020-02-271-0/+5
| | | | | | | | | | | The hardware doesn't flush denorms, exactly like fmin/fmax, so we have to do it manually. This doesn't fix anything known. Fixes: d6a07732c9c ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
* ac/llvm: fix 16-bit fmed3 on GFX8 and older gensSamuel Pitoiset2020-02-271-2/+4
| | | | | | | | | | | 16-bit med3 is only supported on GFX9+. Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f16.*. Fixes: d6a07732c9c ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
* ac/llvm: fix 64-bit fmed3Samuel Pitoiset2020-02-271-17/+31
| | | | | | | | | | | Lower 64-bit fmed3 because LLVM doesn't expose an intrinsic. Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f64.*. Fixes: d6a07732c9c ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
* ac/llvm: implement VK_AMD_shader_explicit_vertex_parameterSamuel Pitoiset2020-01-291-20/+49
| | | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
* ac/llvm: fix missing casts in ac_build_readlane()Samuel Pitoiset2020-01-241-6/+9
| | | | | | | | | | | | | | | | | | | | | | | Because ac_build_optimization_barrier() overwrites the original src_type, we have to keep track of it before emitting that barrier. Otherwise, wrong conversions are expected for pointers or small bitsizes. By doing this, we no longer need to do the cast dance in ac_build_readlane_no_opt_barrier(), it was just necessary for ac_build_optimization_barrier(). This fixes a bunch of crashes with subgroups related tests when RADV_DEBUG=checkir is enabled, and it also fixes a compiler crash with The Surge 2. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2395 Fixes: 0f45d4dc2b1 ("ac: add ac_build_readlane without optimization barrier") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
* ac/nir: add support for nir_texop_fragment_{mask}_fetchSamuel Pitoiset2020-01-231-3/+35
| | | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304>
* ac: add helper ac_build_triangle_strip_indices_to_triangleMarek Olšák2020-01-202-0/+39
| | | | Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: add ac_build_readlane without optimization barrierMarek Olšák2020-01-202-4/+17
| | | | Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: add prefix bitcount functionsMarek Olšák2020-01-202-0/+64
| | | | Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac/cull: don't read Position.Z if it's not needed for cullingMarek Olšák2020-01-151-1/+1
| | | | | | It could be NULL. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* nir/lower_atomics_to_ssbo: Also lower barriersJason Ekstrand2020-01-131-2/+0
| | | | | | | | | | | This is more correct for a pass which is supposed to completely lower away atomic counters. It also lets us stop supporting atomic counter barriers in most of the drivers. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
* nir: Rename nir_intrinsic_barrier to control_barrierJason Ekstrand2020-01-131-2/+2
| | | | | | | | This is a more explicit name now that we don't want it to be doing any memory barrier stuff for us. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
* nir: Add a new memory_barrier_tcs_patch intrinsicJason Ekstrand2020-01-131-0/+2
| | | | | | | | | | | Right now, it's implemented as a no-op for everyone. For most drivers, it's a switch case in the NIR -> whatever which just breaks. For ir3, they already have code to delete tessellation barriers so we just add a case to also delete memory_barrier_tcs_patch. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
* ac/llvm: Fix ac_build_reduce in wave32 mode.Timur Kristóf2020-01-101-6/+9
| | | | | | | | | Previously, when cluster_size was set to 0, it always worked as if the cluster size was 64. This commit fixes it in wave32 mode by changing to work as if the cluster size was set to 32. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* amd/llvm: handle nir_intrinsic_image_deref_{load,store} with lodSamuel Pitoiset2020-01-091-2/+10
| | | | | | | | Use image_load_mip and image_store_mip respectively if the lod parameter isn't zero. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add ac_build_s_endpgmMarek Olšák2020-01-082-0/+7
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: add 128-bit bitcountMarek Olšák2020-01-082-0/+12
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: unify primitive export codeMarek Olšák2020-01-082-0/+66
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: unify build_sendmsg_gs_alloc_reqMarek Olšák2020-01-082-0/+23
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: fix the return value in cull_bbox when bbox culling is disabledMarek Olšák2019-12-161-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
* ac: fix ac_get_i1_sgpr_mask for Wave32Marek Olšák2019-12-161-2/+11
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
* ac/nir: fix out-of-bound access when loading constants from globalSamuel Pitoiset2019-12-121-4/+14
| | | | | | | | | | | | | | | | Global load/store instructions can't know if the offset is out-of-bound because they don't use descriptors (no range). Fix this by clamping the offset for arrays that are indexed with a non-constant offset that's greater or equal to the array size. This fixes VM faults and GPU hangs with Dead Rising 4. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148 Fixes: 71a67942003 ("ac/nir: Enable nir_opt_large_constants") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac/llvm: fix atomic var operations if source isn't a derefSamuel Pitoiset2019-12-031-7/+9
| | | | | | | | Fixes some CTS regressions. Fixes: e61a826f396 ("ac/llvm: fix pointer type for global atomics") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac/llvm: improve sync scope for global atomicsRhys Perry2019-12-021-0/+3
| | | | | | | Stronger ordering is implemented in SPIRV->NIR with barriers. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* ac/llvm: fix pointer type for global atomicsRhys Perry2019-12-021-0/+6
| | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* radv,ac/nir: lower deref operations for shared memorySamuel Pitoiset2019-11-291-22/+26
| | | | | | | | | | | | | | | | | This shouldn't introduce any functional changes for RadeonSI when NIR is enabled because these operations are already lowered. pipeline-db (NAVI10/LLVM): SGPRS: 9043 -> 9051 (0.09 %) VGPRS: 7272 -> 7292 (0.28 %) Code Size: 638892 -> 621628 (-2.70 %) bytes LDS: 1333 -> 1331 (-0.15 %) blocks Max Waves: 1614 -> 1608 (-0.37 %) Found this while glancing at some F12019 shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* amd/llvm: Refactor ac_build_scan.Bas Nieuwenhuizen2019-11-281-40/+51
| | | | | | | | Split out the logic for exclusive scans into a separate function that makes clear what it does instead of having this opaque 60 line if. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* ac/llvm: convert src operands to pointers if necessarySamuel Pitoiset2019-11-281-0/+11
| | | | | | | | | | | | | To avoid generating invalid LLVM IR when both operands don't have the same type. This might happen when performing pointer comparisons with SPIRV 1.4. Fixes invalid LLVM IR for: dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac/nir: don't rely on data.patch for tess factorsMarek Olšák2019-11-271-2/+6
| | | | Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* ac: add 8-bit and 16-bit supports to ac_build_permlane16()Samuel Pitoiset2019-11-271-8/+16
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv/gfx10: fix implementation of exclusive scansSamuel Pitoiset2019-11-271-24/+52
| | | | | | | | | | | This implementation is loosely based on ROCm. https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10. Fixes: 227c29a80de ("amd/common/gfx10: implement scan & reduce operations") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac/llvm: fix warning in ac_build_canonicalize()Samuel Pitoiset2019-11-261-1/+1
| | | | | | | | | | | | | ../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’: ../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized] 4567 | return ac_build_intrinsic(ctx, intr, type, params, 1, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4568 | AC_FUNC_ATTR_READNONE); | ~~~~~~~~~~~~~~~~~~~~~~ ../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
* radeonsi/nir: don't run si_nir_opts again if there is no changeMarek Olšák2019-11-252-7/+10
| | | | | | | 0.3% less overhead Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* ac: set swizzled bit in cache policy as a hint not to merge loads/storesMarek Olšák2019-11-253-10/+7
| | | | | | LLVM now merges loads and stores for all opcodes, so this must be set. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* ac/nir, radv, radeonsi: Switch to using ac_shader_argsConnor Abbott2019-11-253-77/+65
| | | | | Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com>
* ac: Add a shared interface between radv, radeonsi, LLVM and ACOConnor Abbott2019-11-253-0/+105
| | | | | | | | | | | | | | | | | | | ac_shader_args will be similar to ac_shader_abi, except for being free from LLVM-specific concepts and therefore capable of being shared between LLVM and ACO. This will help us accomplish a few different things: - Decouple setting up SGPR and VGPR arguments from translating to LLVM, so that we can reference these arguments in NIR lowering passes, which will let us lower e.g. descriptor sets in NIR. - Stop using radv-specific structures for things like determining the chip generation in ACO. In the end, we should replace ac_shader_abi with this structure + driver-specific lowering passes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* ac/llvm: fix the local invocation index for wave32Samuel Pitoiset2019-11-251-0/+4
| | | | | | | | | | | | Fixes dEQP-VK.compute.builtin_var.local_invocation_index with RADV_PERFTEST=cswave32. My initial fix was to lower it but Rhys suggested the shift-right and it's much better like this. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* amd/llvm: Add Subgroup Scan functions for SIDaniel Schürmann2019-11-201-6/+75
| | | | | | | The idea of this implementation is taken from the ROCm Device Libs: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nir: move data.image.access to data.accessMarek Olšák2019-11-191-2/+2
| | | | | | The size of the data structure doesn't change. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* ac: add 16-bit float support to ac_build_alu_op()Samuel Pitoiset2019-11-191-4/+5
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add 8-bit and 16-bit supports to ac_build_optimization_barrier()Samuel Pitoiset2019-11-191-2/+13
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add 8-bit and 16-bit supports to ac_build_wwm()Samuel Pitoiset2019-11-191-3/+18
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add 8-bit and 16-bit supports to get_reduction_identity()Samuel Pitoiset2019-11-191-1/+33
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add 8-bit and 16-bit supports to ac_build_swizzle()Samuel Pitoiset2019-11-191-6/+13
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add 8-bit and 16-bit supports to ac_build_dpp()Samuel Pitoiset2019-11-191-13/+20
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac: add 8-bit and 16-bit supports to ac_build_set_inactive()Samuel Pitoiset2019-11-191-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>