summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: add VK_NV_compute_shader_derivates supportSamuel Pitoiset2019-04-223-0/+9
| | | | | | | | | Only computeDerivativeGroupLinear is supported for now. All crucible tests pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Support VK_EXT_inline_uniform_block.Bas Nieuwenhuizen2019-04-195-15/+124
| | | | | | | | | | | | | | | | | | | Basically just reserve the memory in the descriptor sets. On the shader side we construct a buffer descriptor, since AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken. This fully supports update after bind and variable descriptor set sizes. However, the limits are somewhat arbitrary and are mostly about finding a reasonable division of a 2 GiB max memory size over the set. v2: - rebased on top of master (Samuel) - remove the loading resources rework (Samuel) - only load UBO descriptors if it's a pointer (Samuel) - use LLVMBuildPtrToInt to avoid IR failures (Samuel) Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2)
* ac/nir: use the new raw/struct SSBO atomic intrisics for comp_swapSamuel Pitoiset2019-04-191-2/+1
| | | | | | | | | | | | This is actually fixed now. This change requires LLVM r358579. Make sure to have it in your tree, otherwise the following piglit will hang: tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* ac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+Samuel Pitoiset2019-04-191-1/+4
| | | | | | | | They are buggy with older LLVM version, see r358579. Fixes: 78c551aca1c ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+Samuel Pitoiset2019-04-191-1/+4
| | | | | | | | | They are buggy with LLVM 8 because they weren't marked as source of divergence, see r358579. Fixes: dd0172e865f ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")" Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+Samuel Pitoiset2019-04-171-14/+34
| | | | | | | | This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+Samuel Pitoiset2019-04-171-12/+38
| | | | | | | | This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add support for more types with struct/raw LLVM intrinsicsSamuel Pitoiset2019-04-171-20/+26
| | | | | | | | LLVM 9+ now supports 8-bit and 16-bit types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: add VK_KHR_shader_atomic_int64 but disable it for nowSamuel Pitoiset2019-04-173-0/+12
| | | | | | | No support for 64-bit compare&swap atomic operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add 64-bit SSBO atomic operations supportSamuel Pitoiset2019-04-171-3/+7
| | | | | | | | Except compare&swap which is still buggy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswapSamuel Pitoiset2019-04-171-13/+18
| | | | | | | | | | Use the raw version (ie. IDXEN=0) because vindex is unused. Use the old intrinsic for compare&swap because the new one hangs the GPU for some reasons. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* compiler/glsl: handle case where we have multiple users for typesTapani Pälli2019-04-161-0/+3
| | | | | | | | | | | | | | | | | | Both Vulkan and OpenGL might be using glsl_types simultaneously or we can also have multiple concurrent Vulkan instances using glsl_types. Patch adds a one time init to track number of users and will release types only when last user calls _glsl_type_singleton_decref(). This change fixes glsl_type memory leaks we have with anv driver. v2: reuse hash_mutex, cleanup, apply fix also to radv driver and rename helper functions (Jason) v3: move init, destroy to happen on GL context init and destroy Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: sort the shader capabilities alphabeticallySamuel Pitoiset2019-04-161-3/+3
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: enable shaderInt8 on SI and CIKSamuel Pitoiset2019-04-162-4/+3
| | | | | | | No CTS failures. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Delete autotoolsDylan Baker2019-04-154-346/+0
| | | | | | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
* radv: set ACCESS_NON_READABLE on stores for copy/fill/clear meta shadersSamuel Pitoiset2019-04-152-0/+3
| | | | | | | The compiler will emit GLC=1. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use local buffers for the global bo list.Bas Nieuwenhuizen2019-04-153-2/+8
| | | | | | | | | | | | | | Even if we don't use local buffers in general. Turns out that even though the performance is not the best the kernel still does it better than our own list. We still have to keep the radv bo list for buffers that are shared externally. This improves Talos on lowest quality setting (so as CPU bound as possible) by ~10% if the global bo list is enabled. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: Move has_local_buffers disable to radeonsi.Bas Nieuwenhuizen2019-04-151-3/+1
| | | | | | | | | | | | | | In radv we had a separate flag to actually use it + an env option to experimentally use it. The common code setting has_local_buffers to false of course broke that experimental option. Also the "enable on APU" did not make sense for RADV as it is still disabled by default. Fixes: b21a4efb553 "radv/winsys: allow local BOs on APUs" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add bolist RADV_PERFTEST flag.Bas Nieuwenhuizen2019-04-152-0/+3
| | | | | | To test global_bo_list performance. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix incorrect bindless atomic code in visit_image_atomicMarek Olšák2019-04-151-3/+3
| | | | | | | | | Coverity: CID 1444664 Fixes: d62d434fe920 ("ac/nir_to_llvm: add image bindless support") Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir,ac/nir: fix cube_face_coordRhys Perry2019-04-151-2/+9
| | | | | | | | Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: enable VK_KHR_shader_float16_int8Samuel Pitoiset2019-04-152-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: make nir_const_value scalarKarol Herbst2019-04-141-4/+4
| | | | | | | | | v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* radv: use nir constant helpersKarol Herbst2019-04-142-20/+10
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* amd/nir: some cleanupsKarol Herbst2019-04-141-20/+9
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* ac: use the common helper ac_apply_fmask_to_sampleMarek Olšák2019-04-121-64/+5
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: set AC_FUNC_ATTR_READNONE for image opcodes where it was missingMarek Olšák2019-04-121-0/+1
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: remove some useless integer casts for ALU operationsSamuel Pitoiset2019-04-121-16/+0
| | | | | | | Sources are always casted to integers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: remove useless integer cast in visit_image_load()Samuel Pitoiset2019-04-121-1/+1
| | | | | | | | ac_build_image_opcode() casts if necessary and buffer images are casted too. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: remove useless integer cast in adjust_sample_index_using_fmask()Samuel Pitoiset2019-04-121-1/+0
| | | | | | | It's already casted if necessary in ac_build_image_opcode(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: remove useles LLVMGetUndef for nir_op_pack_64_2x32_splitSamuel Pitoiset2019-04-121-2/+1
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add ac_build_load_helper_invocation() helperSamuel Pitoiset2019-04-123-12/+14
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add ac_build_ddxy_interp() helperSamuel Pitoiset2019-04-123-22/+24
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add ac_build_umax() and use it where possibleSamuel Pitoiset2019-04-123-15/+13
| | | | | | | This changes the predicate from LessThan to Equal. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: make use of ac_build_umin() where possibleSamuel Pitoiset2019-04-121-5/+5
| | | | | | | This changes the predicate from LessThan to Equal. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: make use of ac_build_imin() where possibleSamuel Pitoiset2019-04-121-5/+5
| | | | | | | This changes the predicate from LessThan to Equal. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: make use of ac_build_imax() where possibleSamuel Pitoiset2019-04-121-7/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/nir_to_llvm: add image bindless supportTimothy Arceri2019-04-121-57/+153
| | | | | | With this all piglit bindless image tests pass on radeonsi. Reviewed-by: Marek Olšák <[email protected]>
* ac/nir_to_llvm: make get_sampler_desc() more generic and pass it the image ↵Timothy Arceri2019-04-121-18/+21
| | | | | | | | intrinsic This will be required by the bindless support in the following patches. Reviewed-by: Marek Olšák <[email protected]>
* glsl_to_nir: handle bindless texturesKarol Herbst2019-04-121-2/+10
| | | | | | | | | v2: add support for AMD Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: enable VK_AMD_gpu_shader_half_floatSamuel Pitoiset2019-04-101-0/+1
| | | | | | | | Should be safe to enable as all instructions seem to support 16-bit. Unfortunately, there is no CTS test. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_ddxy()Rhys Perry2019-04-101-5/+17
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix nir_op_b2f16Samuel Pitoiset2019-04-101-3/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add non-uniform indexing lowering.Bas Nieuwenhuizen2019-04-102-7/+12
| | | | | | | | | This patch does it as late as possible so the potential extra basic blocks don't inhibit other optimizations. Big thanks to Jason for writing the lowering pass. Reviewed-by: Samuel Pitoiset <[email protected]>
* nir/radv: remove restrictions on opt_if_loop_last_continue()Timothy Arceri2019-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
* radv: fix getting the vertex strides if the bindings aren't contiguousSamuel Pitoiset2019-04-081-1/+15
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349 Fixes: a66b186bebf ("radv: use typed buffer loads for vertex input fetches") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix intrinsic names for atomic operations with LLVM 9+Samuel Pitoiset2019-04-081-11/+21
| | | | | | | | | | | | This fixes the following LLVM error when using RADV_DEBUG=checkir: Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32 i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add The cmpswap operation still uses the old intrinsic. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Erik Faye-Lund <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* simplify LLVM version string printingEric Engestrom2019-04-042-15/+6
| | | | | | | Figure it out once in the build system, then just use that all over the place. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable displayable DCC on RavensMarek Olšák2019-04-042-0/+12
|
* radeonsi: add support for displayable DCC for multi-RB chipsMarek Olšák2019-04-044-10/+134
| | | | A compute shader is used to reorder DCC data from aligned to unaligned.