aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd/compiler/aco_reduce_assign.cpp
Commit message (Collapse)AuthorAgeFilesLines
* aco/gfx10: Refactor of GFX10 wave64 bpermute.Timur Kristóf2020-06-021-5/+0
| | | | | | | | | | | The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr, so we don't consider it a reduction anymore. Additionally, the code is slightly reorganized in preparation for the GFX6 emulated bpermute. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
* aco: allocate a temp VGPR for some 8-bit/16-bit reduction ops on GFX10Samuel Pitoiset2020-05-291-1/+4
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>
* aco: use a temporary SGPR for 8-bit/16-bit literal reduction identitiesSamuel Pitoiset2020-05-211-3/+5
| | | | | | | | Otherwise, the compiler overwrites s0 which contains the exec mask. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4494>
* aco: fix gfx10_wave64_bpermuteRhys Perry2020-02-061-1/+6
| | | | | | | | | | | | | | | | Since 9254fb4fc72, the pass replaced the SCC clobber with the scalar identity temporary. Just skip most of the temporary setup, since we don't need it for gfx10_wave64_bpermute. Although shuffles are disabled on GFX10, Detroit: Become Human seems to use them anyway. Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]> Fixes: 9254fb4fc72ed289ffded28ef067b4582973e90c ('aco: don't use a scalar temporary for reductions on GFX10') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683>
* aco: implement (clustered) reductions for SI/CIDaniel Schürmann2019-12-071-0/+2
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: don't use a scalar temporary for reductions on GFX10Daniel Schürmann2019-12-071-1/+1
| | | | | | This patch also adds the scalar temporary for scans on SI/CI Reviewed-by: Rhys Perry <[email protected]>
* aco/wave32: Use lane mask regclass for exec/vcc.Timur Kristóf2019-12-041-1/+1
| | | | | | | | | Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: implement 64-bit integer reductionsRhys Perry2019-11-191-2/+12
| | | | | | | | | | | | | | | The multiplication reduction is larger than it could be, but it should be easier to implement this way. No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM being used for other stages. v2: don't call setFixed() for v_add carry-out, since setHint sets physReg v3: add and use emit_vadd32() helper v4: use num_opcodes instead of last_opcode Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> (v3)
* aco: Implement subgroup shuffle in GFX10 wave64 mode.Timur Kristóf2019-10-281-1/+3
| | | | | | | | | | | | | Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Fix reductions on GFX10.Rhys Perry2019-10-281-7/+12
| | | | | | | | Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan with all reduction operations. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Initial commit of independent AMD compilerDaniel Schürmann2019-09-191-0/+164
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <[email protected]> Co-authored-by: Rhys Perry <[email protected]> Co-authored-by: Bas Nieuwenhuizen <[email protected]> Co-authored-by: Connor Abbott <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Timur Kristóf <[email protected]> Acked-by: Samuel Pitoiset <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]>