| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr,
so we don't consider it a reduction anymore. Additionally, the
code is slightly reorganized in preparation for the GFX6 emulated
bpermute.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>
|
|
|
|
|
|
|
|
| |
Otherwise, the compiler overwrites s0 which contains the exec mask.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4494>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 9254fb4fc72, the pass replaced the SCC clobber with the scalar
identity temporary. Just skip most of the temporary setup, since we don't
need it for gfx10_wave64_bpermute.
Although shuffles are disabled on GFX10, Detroit: Become Human seems to
use them anyway.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-By: Timur Kristóf <[email protected]>
Fixes: 9254fb4fc72ed289ffded28ef067b4582973e90c ('aco: don't use a scalar
temporary for reductions on GFX10')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683>
|
|
|
|
| |
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
| |
This patch also adds the scalar temporary for scans on SI/CI
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.
No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.
v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]> (v3)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.
This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.
ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.
Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend
Co-authored-by: Daniel Schürmann <[email protected]>
Co-authored-by: Rhys Perry <[email protected]>
Co-authored-by: Bas Nieuwenhuizen <[email protected]>
Co-authored-by: Connor Abbott <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Co-authored-by: Timur Kristóf <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
Acked-by: Bas Nieuwenhuizen <[email protected]>
|