summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Add debug option to dump meta shaders.Timur Kristóf2019-09-263-2/+6
| | | | | | | | This new option can help debug shader compiler problems when there are issues with the meta shaders. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Introduce ac_get_fs_input_vgpr_cnt.Timur Kristóf2019-09-263-33/+60
| | | | | | | | | | | Add a function called ac_get_fs_input_vgpr_cnt which will return the number of input VGPRs used by an AMD shader. Previously, radv and radeonsi had the same code duplicated, but this commit also allows them to share this code. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Set shared VGPR count in radv_postprocess_config.Timur Kristóf2019-09-262-2/+18
| | | | | | | | This commit allows RADV to set the shared VGPR count according to the shader config. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Add num_shared_vgprs to ac_shader_config for GFX10.Timur Kristóf2019-09-262-0/+20
| | | | | | | | | In GFX10 wave64 mode, shared VGPRs allow the two wave halves to share some data with each other. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: Extract some helper functions to ac_shader_util.Timur Kristóf2019-09-265-117/+131
| | | | | | | | | | This commit moves ac_get_tbuffer_format, ac_get_sampler_dim and ac_get_image_dim into ac_shader_util, thus enabling them to be used by compilers other than LLVM. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: Move ac_export_mrt_z to ac_llvm_build.Timur Kristóf2019-09-264-75/+76
| | | | | | | | | The aim of this commit is to keep ac_shader_util LLVM-free, since we would like to use it in ACO later. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* aco: CSE readlane/readfirstlane/permute/reduce with the same exec maskRhys Perry2019-09-262-9/+37
| | | | | | | | | | v2: rename pass_temp to pass_flags v2: also CSE reductions v3: add ds_swizzle_b32 support v3: check gds/offset0/offset1 fields Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: don't CSE v_readlane_b32/v_readfirstlane_b32Rhys Perry2019-09-261-0/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_stringRhys Perry2019-09-266-18/+18
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: return a correct name and description for the backend IRRhys Perry2019-09-263-2/+9
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: store printed backend IR in binaryRhys Perry2019-09-261-4/+21
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco,radv/aco: get dissassembly for release builds if requestedRhys Perry2019-09-262-10/+2
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: actually disable ACO when unsupportedRhys Perry2019-09-261-1/+0
| | | | | | | | | We were setting this twice. The second time, we weren't later disabling it if unsupported. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: check for duplicate opcode numbersRhys Perry2019-09-251-0/+25
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix opcode for s_mul_hi_i32Rhys Perry2019-09-251-1/+1
| | | | | | | | Fixes dEQP-VK.glsl.builtin.function.integer.imulextended.*_compute Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix v_subrev_co_u32_e64 opcodeRhys Perry2019-09-251-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: fix GFX9 opcode for v_xad_u32Rhys Perry2019-09-251-1/+1
| | | | | | | Fixes various dEQP-VK.image.store.* tests. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: implement 64-bit inegRhys Perry2019-09-252-2/+17
| | | | | | | | We currently lower them, but nir_opt_algebraic() can add new ones because lower_sub=true. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: run nir_lower_int64() before nir_lower_idiv()Rhys Perry2019-09-251-3/+3
| | | | | | | nir_lower_idiv() asserts on 64-bit integers. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: fix s/load/store/ copy-paste typoEric Engestrom2019-09-241-1/+1
| | | | | | Fixes: cdc6efddf918bc07d30d ("radv: implement all depth/stencil resolve modes using graphics") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add workaround for hang in The Surge 2.Bas Nieuwenhuizen2019-09-241-0/+8
| | | | | | | | | | | Released today and hangs on RADV. We don't have the root cause yet, but this should unblock people playing the game. No drirc because the radv debugflags are not usable from drirc and I want this backported. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: force unnormalized coordinates for RECTMarek Olšák2019-09-231-1/+3
| | | | | | This fixes VAAPI. Reviewed-by: Connor Abbott <[email protected]>
* ac/nir: port Z compare value clamping from radeonsiMarek Olšák2019-09-231-9/+25
| | | | | | This fixes some dEQP tests. Reviewed-by: Connor Abbott <[email protected]>
* ac: stop using PCI IDs for chip identificationMarek Olšák2019-09-231-15/+58
| | | | | | PCI IDs for amdgpu will be removed from Mesa. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* ac/addrlib: fix chip identification for Vega10, Arcturus, Raven2, RenoirMarek Olšák2019-09-231-10/+5
| | | | | Cc: 19.2 <[email protected]> Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* aco: only emit waitcnt on loop continues if we there was some load or exportDaniel Schürmann2019-09-231-1/+1
| | | | Reviewed-by: Rhys Perry <[email protected]>
* amd: Build aco only if radv is enabledBas Nieuwenhuizen2019-09-211-1/+1
| | | | | | | | ACO depends on C++14, but radeonsi/radv with LLVM 8,9 do not. Let us only require it for RADV, since that is the only user. Fixes: a70a9987181 "radv/aco: Setup alternate path in RADV to support the experimental ACO compiler" Reviewed-by: Marek Olšák <[email protected]>
* radv: remove dead shared variablesDaniel Schürmann2019-09-191-1/+1
| | | | | | | LLVM does this anyway, but for ACO we need to do it in NIR. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: enable VK_EXT_shader_demote_to_helper_invocationDaniel Schürmann2019-09-193-0/+8
| | | | | | | For now, this extension will only be enabled for ACO. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable clustered reductionsDaniel Schürmann2019-09-191-0/+1
| | | | | | | These work with both, LLVM and ACO. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/aco: Setup alternate path in RADV to support the experimental ACO compilerDaniel Schürmann2019-09-1911-103/+205
| | | | | | | | | | LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco. Co-authored-by: Daniel Schürmann <[email protected]> Co-authored-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: Initial commit of independent AMD compilerDaniel Schürmann2019-09-1931-0/+25572
| | | | | | | | | | | | | | | | | | | | | | ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <[email protected]> Co-authored-by: Rhys Perry <[email protected]> Co-authored-by: Bas Nieuwenhuizen <[email protected]> Co-authored-by: Connor Abbott <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Timur Kristóf <[email protected]> Acked-by: Samuel Pitoiset <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add DFSM support.Bas Nieuwenhuizen2019-09-181-5/+17
| | | | | | | | | | | | Apparently we already enabled it without having support ... Not sure if we also need to set disable_start_of_prim when the PS has memory writes, but this mirrors radeonsi. Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to ~32 pixels/cycle on a Raven. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Disable dfsm by default even on Raven.Bas Nieuwenhuizen2019-09-182-3/+4
| | | | | | When actually implementing it, Talos on low is still 3% slower. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Only break batch on framebuffer change with dfsm.Bas Nieuwenhuizen2019-09-181-1/+1
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: never kill a NGG GS shaderRhys Perry2019-09-181-1/+3
| | | | | | | | Seems to fix a hang with excessive vertex emissions when NGG is used for GS. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix VK_KHR_pipeline_executable_properties with NGG GSSamuel Pitoiset2019-09-181-4/+13
| | | | | | | | | | | No GS copy shader if a pipeline enables NGG GS. This fixes dEQP-VK.pipeline.executable_properties.graphics.*geometry_stage*. Fixes: 86864eedd2d ("radv: Implement radv_GetPipelineExecutablePropertiesKHR.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: move ac_get_num_physical_vgprs into radeon_infoMarek Olšák2019-09-182-10/+2
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move ac_get_num_physical_sgprs into radeon_infoMarek Olšák2019-09-184-15/+15
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move ac_get_max_wave64_per_simd into radeon_infoMarek Olšák2019-09-183-17/+5
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move num_sdp_interfaces into radeon_infoMarek Olšák2019-09-183-15/+16
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move PBB MAX_ALLOC_COUNT into radeon_infoMarek Olšák2019-09-183-31/+34
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix loading 64-bit GS inputsSamuel Pitoiset2019-09-181-0/+35
| | | | | | | | | | | We have to load 2 32-bit integer and to cast correctly. This fixes crashes with gs-double-interpolator.vk_shader_test. Cc: 19.2 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111734 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix writing depth/stencil clear values to imageSamuel Pitoiset2019-09-181-3/+4
| | | | | | | | | Use the fastest way only if both aspects are used. Oops. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111728 Fixes: 218ce34962c ("radv: add mipmap support for the clear depth/stencil values") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: Remove DEBUG workaroundMichel Dänzer2019-09-171-6/+0
| | | | | | As of version 7, LLVM uses LLVM_DEBUG instead of just DEBUG. Reviewed-by: Timothy Arceri <[email protected]>
* radv: always emit a position export in gs copy shadersRhys Perry2019-09-161-1/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Fixes: f8d0337299f ('radv: add multiple streams support for the GS copy shader') Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: keep GS threads with excessive emissions which could write to memoryRhys Perry2019-09-163-4/+16
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/gfx10: disable unsupported transform feedback features for NGGSamuel Pitoiset2019-09-161-3/+3
| | | | | | | Mostly multiple streams and queries which have to be fixed/implemented. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: implement NGG streamoutSamuel Pitoiset2019-09-161-7/+514
| | | | | | | | It's still disabled by default because transform feedback randomly hangs and it seems like it's related to GDS (cf. RadeonSI). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: make sure to wait for idle before clearing GDSSamuel Pitoiset2019-09-161-0/+8
| | | | | | | | | | Otherwise the next streamout operation will overwrite GDS. This can be improved by tracking if there is a streamout operation in flight. Currently the driver unconditionally flushes but that doesn't matter much as NGG streamout is disabled by default. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>