summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac/nir: remove unused code for nir_op_{fmod,frem}Samuel Pitoiset2019-10-031-14/+0
| | | | | | | RADV and RadeonSI both lower these two NIR instructions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable lower_fmod for the LLVM pathSamuel Pitoiset2019-10-031-0/+1
| | | | | | | | | | | | This lowers fmod and frem at NIR level like RadeonSI. fmod is already lowered directly in NIR->LLVM, and frem will be lowered by LLVM anyways. This fixes a LLVM crash with: dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix warning in 32-bit build.Bas Nieuwenhuizen2019-10-031-2/+3
| | | | | | | | uintptr_t is 32 bits in a 32-bits build, resulting in shifting out of bounds. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix condition for skipping the continue CS.Bas Nieuwenhuizen2019-10-031-1/+2
| | | | | | | | | We need the continue CS for referencing the tess/GDS/sample position BOs. Fixes: 46e52df34d3 "radv: add tessellation ring allocation support. (v2)" Fixes: e1dc3ab7534 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout" Fixes: 1171b304f30 "radv: overhaul fragment shader sample positions." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/gfx10: fix the ESGS ring size symbolSamuel Pitoiset2019-10-021-19/+1
| | | | | | | | Random hangs no longer happen, I'm actually not sure if they were related to this. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix buildSamuel Pitoiset2019-10-021-1/+1
| | | | | | Forgot to amend the commit before updating the MR. Signed-off-by: Samuel Pitoiset <[email protected]>
* Revert "radv: disable viewport clamping even if FS doesn't write Z"Samuel Pitoiset2019-10-021-1/+3
| | | | | | | | This was actually the wrong fix. This reverts commit 0a313cc285c2939de9cac07f045b0b699bc208ca. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rework the slow depthstencil clear to write depth from PSSamuel Pitoiset2019-10-021-6/+12
| | | | | | | | | | Make sure to export the expected clear values to the depth stencil attachment. This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix NGG streamout with triangle strips for VSSamuel Pitoiset2019-10-024-1/+13
| | | | | | | | | | The number of vertices has to be adjusted with the output primitive type. This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix storing/loading NGG stream outputs for GSSamuel Pitoiset2019-10-021-12/+77
| | | | | | | | | | | The GS outputs are stored differently in the LDS storage, they are indexed by out_idx which is incremented for each stored DWORD. Thus, we need a different path for exporting the stream outputs. This fixes a bunch of CTS failures when NGG GS is force enabled. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: use the component mask when storing/loading NGG stream outputsSamuel Pitoiset2019-10-021-0/+6
| | | | | | | It's unnecessary to store/load more components that needed. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix storing/loading NGG stream outputs for VS and TESSamuel Pitoiset2019-10-021-8/+10
| | | | | | | | | | | | | The LDS storage allocated for stream outputs is 4 * N, where N is the number of outputs. So, we have to store/load with N as index and not with the output location as index. This doesn't fix anything known but it should fix out-of-bounds access and it also reduces the number of outputs written to the LDS storage. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: add missing counter buffer to the BO listSamuel Pitoiset2019-10-021-0/+2
| | | | | | | The buffer isn't necessarily used before. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: add radv_device::use_nggSamuel Pitoiset2019-10-023-3/+8
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix GLSL imageSamples()Marek Olšák2019-09-301-24/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add ac_build_image_get_sample_count from radeonsiMarek Olšák2019-09-302-0/+21
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/surface: don't allocate FMASK if there is no graphicsMarek Olšák2019-09-301-2/+3
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: reorder and print all radeon_info fieldsMarek Olšák2019-09-302-19/+53
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: set the number of SDPs same as the number of TCCsMarek Olšák2019-09-301-13/+3
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix num_good_cu_per_sh for harvested chipsMarek Olšák2019-09-301-0/+6
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add radeon_info::tcc_harvestedMarek Olšák2019-09-302-0/+5
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix incorrect vram_size reported by the kernelMarek Olšák2019-09-301-2/+10
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: call nir_opt_algebraic_late() exhaustivelyDaniel Schürmann2019-09-301-4/+15
| | | | | | | | | | | | | | | | 57559 shaders in 28980 tests Totals: SGPRS: 2963407 -> 2959935 (-0.12 %) VGPRS: 2014812 -> 2016328 (0.08 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 114545436 -> 114498084 (-0.04 %) bytes LDS: 933 -> 933 (0.00 %) blocks Max Waves: 375997 -> 375866 (-0.03 %) Reviewed-by: Connor Abbott <[email protected]>
* radv/aco: Don't lower subtractionsDaniel Schürmann2019-09-301-1/+0
| | | | | | | | | | | | | | | | | 40228 shaders in 20236 tests Totals: SGPRS: 2045512 -> 2046496 (0.05 %) VGPRS: 1430856 -> 1430464 (-0.03 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 77202840 -> 77151832 (-0.07 %) bytes LDS: 863 -> 863 (0.00 %) blocks Max Waves: 260729 -> 260754 (0.01 %) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* android: aco: add support for libmesa_acoMauro Rossi2019-09-284-1/+129
| | | | | | | | | | | | | | | | | | Android building rules are added in src/amd/Android.compiler.mk libmesa_aco static library is built conditionally to radeonsi as done for vulkan.radv module This will prevent Android build errors for non x86 systems filter-out compiler/aco_instruction_selection_setup.cpp source, as already included by compiler/aco_instruction_selection.cpp and would cause several multiple definition linker errors NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler") Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler") Signed-off-by: Mauro Rossi <[email protected]>
* android: aco: fix undefined template 'std::__1::array' build errorsMauro Rossi2019-09-285-1/+5
| | | | | | | | | | | | | | Fixes a few building errors similar to the following: In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26: In file included from external/libcxx/include/algorithm:639: external/libcxx/include/utility:321:9: error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>' _T2 second; ^ Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler") Signed-off-by: Mauro Rossi <[email protected]>
* aco: don't remove the loop exec mask in transition_to_Exact()Rhys Perry2019-09-271-1/+5
| | | | | | | No pipeline-db changes. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: set loop_info::has_discard for demotesRhys Perry2019-09-273-5/+9
| | | | | | | | We need the loop header phis for the outer exec masks. Needed for dEQP-VK.glsl.demote.dynamic_loop_texture Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: Fix L2 cache rinse programming.Timur Kristóf2019-09-261-5/+9
| | | | | | | | According to radeonsi, GLM doesn't support WB alone, so we have to set INV too when WB is set. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add debug option to dump meta shaders.Timur Kristóf2019-09-263-2/+6
| | | | | | | | This new option can help debug shader compiler problems when there are issues with the meta shaders. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Introduce ac_get_fs_input_vgpr_cnt.Timur Kristóf2019-09-263-33/+60
| | | | | | | | | | | Add a function called ac_get_fs_input_vgpr_cnt which will return the number of input VGPRs used by an AMD shader. Previously, radv and radeonsi had the same code duplicated, but this commit also allows them to share this code. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Set shared VGPR count in radv_postprocess_config.Timur Kristóf2019-09-262-2/+18
| | | | | | | | This commit allows RADV to set the shared VGPR count according to the shader config. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Add num_shared_vgprs to ac_shader_config for GFX10.Timur Kristóf2019-09-262-0/+20
| | | | | | | | | In GFX10 wave64 mode, shared VGPRs allow the two wave halves to share some data with each other. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: Extract some helper functions to ac_shader_util.Timur Kristóf2019-09-265-117/+131
| | | | | | | | | | This commit moves ac_get_tbuffer_format, ac_get_sampler_dim and ac_get_image_dim into ac_shader_util, thus enabling them to be used by compilers other than LLVM. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: Move ac_export_mrt_z to ac_llvm_build.Timur Kristóf2019-09-264-75/+76
| | | | | | | | | The aim of this commit is to keep ac_shader_util LLVM-free, since we would like to use it in ACO later. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* aco: CSE readlane/readfirstlane/permute/reduce with the same exec maskRhys Perry2019-09-262-9/+37
| | | | | | | | | | v2: rename pass_temp to pass_flags v2: also CSE reductions v3: add ds_swizzle_b32 support v3: check gds/offset0/offset1 fields Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: don't CSE v_readlane_b32/v_readfirstlane_b32Rhys Perry2019-09-261-0/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_stringRhys Perry2019-09-266-18/+18
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: return a correct name and description for the backend IRRhys Perry2019-09-263-2/+9
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: store printed backend IR in binaryRhys Perry2019-09-261-4/+21
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco,radv/aco: get dissassembly for release builds if requestedRhys Perry2019-09-262-10/+2
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: actually disable ACO when unsupportedRhys Perry2019-09-261-1/+0
| | | | | | | | | We were setting this twice. The second time, we weren't later disabling it if unsupported. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: check for duplicate opcode numbersRhys Perry2019-09-251-0/+25
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix opcode for s_mul_hi_i32Rhys Perry2019-09-251-1/+1
| | | | | | | | Fixes dEQP-VK.glsl.builtin.function.integer.imulextended.*_compute Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix v_subrev_co_u32_e64 opcodeRhys Perry2019-09-251-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix GFX9 opcode for v_xad_u32Rhys Perry2019-09-251-1/+1
| | | | | | | Fixes various dEQP-VK.image.store.* tests. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: implement 64-bit inegRhys Perry2019-09-252-2/+17
| | | | | | | | We currently lower them, but nir_opt_algebraic() can add new ones because lower_sub=true. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: run nir_lower_int64() before nir_lower_idiv()Rhys Perry2019-09-251-3/+3
| | | | | | | nir_lower_idiv() asserts on 64-bit integers. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: fix s/load/store/ copy-paste typoEric Engestrom2019-09-241-1/+1
| | | | | | Fixes: cdc6efddf918bc07d30d ("radv: implement all depth/stencil resolve modes using graphics") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add workaround for hang in The Surge 2.Bas Nieuwenhuizen2019-09-241-0/+8
| | | | | | | | | | | Released today and hangs on RADV. We don't have the root cause yet, but this should unblock people playing the game. No drirc because the radv debugflags are not usable from drirc and I want this backported. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>