summaryrefslogtreecommitdiffstats
path: root/src/amd/compiler
Commit message (Collapse)AuthorAgeFilesLines
* amd: Move all amd/common code that depends on LLVM to amd/llvm.Timur Kristóf2019-10-081-1/+1
| | | | | | | | | | | | | This commit is a step towards the goal of being able to build RADV without LLVM. In the future we would like to offer the option to use RADV solely with ACO. There is still a need for the common AMD code located in amd/common but the LLVM specific parts need to be separated. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* aco: fix load_constant with multiple arraysRhys Perry2019-10-041-3/+3
| | | | | | | | | I thought I fixed this, but I guess I must have broken it again. Fixes various dEQP-VK.draw.* tests Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv/aco,aco: set lower_fmodRhys Perry2019-10-042-31/+0
| | | | | | | | | | | | | | | | | | | | | | | | This simplifies ACO and allows the lowered code to be optimized (in particular, constant folded). Totals from affected shaders: SGPRS: 1776 -> 1776 (0.00 %) VGPRS: 1436 -> 1436 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 203452 -> 203564 (0.06 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 103 -> 103 (0.00 %) At least some of the code size increase seems to be from literals being applied to instructions as a result of constant folding. v2: remove fmod/frem handling in init_context() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: call nir_opt_algebraic_late() exhaustivelyDaniel Schürmann2019-09-301-4/+15
| | | | | | | | | | | | | | | | 57559 shaders in 28980 tests Totals: SGPRS: 2963407 -> 2959935 (-0.12 %) VGPRS: 2014812 -> 2016328 (0.08 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 114545436 -> 114498084 (-0.04 %) bytes LDS: 933 -> 933 (0.00 %) blocks Max Waves: 375997 -> 375866 (-0.03 %) Reviewed-by: Connor Abbott <[email protected]>
* android: aco: fix undefined template 'std::__1::array' build errorsMauro Rossi2019-09-285-1/+5
| | | | | | | | | | | | | | Fixes a few building errors similar to the following: In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26: In file included from external/libcxx/include/algorithm:639: external/libcxx/include/utility:321:9: error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>' _T2 second; ^ Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler") Signed-off-by: Mauro Rossi <[email protected]>
* aco: don't remove the loop exec mask in transition_to_Exact()Rhys Perry2019-09-271-1/+5
| | | | | | | No pipeline-db changes. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: set loop_info::has_discard for demotesRhys Perry2019-09-273-5/+9
| | | | | | | | We need the loop header phis for the outer exec masks. Needed for dEQP-VK.glsl.demote.dynamic_loop_texture Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: CSE readlane/readfirstlane/permute/reduce with the same exec maskRhys Perry2019-09-262-9/+37
| | | | | | | | | | v2: rename pass_temp to pass_flags v2: also CSE reductions v3: add ds_swizzle_b32 support v3: check gds/offset0/offset1 fields Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: don't CSE v_readlane_b32/v_readfirstlane_b32Rhys Perry2019-09-261-0/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_stringRhys Perry2019-09-261-3/+3
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: store printed backend IR in binaryRhys Perry2019-09-261-4/+21
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco,radv/aco: get dissassembly for release builds if requestedRhys Perry2019-09-261-4/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: check for duplicate opcode numbersRhys Perry2019-09-251-0/+25
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix opcode for s_mul_hi_i32Rhys Perry2019-09-251-1/+1
| | | | | | | | Fixes dEQP-VK.glsl.builtin.function.integer.imulextended.*_compute Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix v_subrev_co_u32_e64 opcodeRhys Perry2019-09-251-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix GFX9 opcode for v_xad_u32Rhys Perry2019-09-251-1/+1
| | | | | | | Fixes various dEQP-VK.image.store.* tests. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: implement 64-bit inegRhys Perry2019-09-252-2/+17
| | | | | | | | We currently lower them, but nir_opt_algebraic() can add new ones because lower_sub=true. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: run nir_lower_int64() before nir_lower_idiv()Rhys Perry2019-09-251-3/+3
| | | | | | | nir_lower_idiv() asserts on 64-bit integers. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: only emit waitcnt on loop continues if we there was some load or exportDaniel Schürmann2019-09-231-1/+1
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: Initial commit of independent AMD compilerDaniel Schürmann2019-09-1931-0/+25572
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <[email protected]> Co-authored-by: Rhys Perry <[email protected]> Co-authored-by: Bas Nieuwenhuizen <[email protected]> Co-authored-by: Connor Abbott <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Timur Kristóf <[email protected]> Acked-by: Samuel Pitoiset <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]>