summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Interpolate less aggressively.Bas Nieuwenhuizen2019-02-261-9/+12
| | | | | | | | | | | | | | Seems like dxvk used integer builtins without setting the flat interpolation decoration. I believe in the current spec the app is required to set these, but in the meantime to avoid breaking things in stable releases (and so close to release for 19.0), only expand the interpolation to float16 and struct (which cannot be builtins as our spirv parser lowers the builtin block). Fixes: f3247841040 "radv: Allow interpolation on non-float types." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: don't copy buffer descriptors list for samplersSamuel Pitoiset2019-02-261-1/+5
| | | | | | | | | | | Sampler descriptors don't have a buffer list. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*.sampler_*. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix out-of-bounds access when copying descriptors BO listSamuel Pitoiset2019-02-261-2/+0
| | | | | | | | | | | We shouldn't increment the buffer list pointers twice. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix clearing attachments in secondary command buffersSamuel Pitoiset2019-02-251-10/+43
| | | | | | | | | | | If no framebuffer is bound, get the number of samples and the image format from the render pass. This fixes new CTS dEQP-VK.geometry.layered.*.secondary_cmd_buffer. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Allow interpolation on non-float types.Bas Nieuwenhuizen2019-02-221-10/+9
| | | | | | | | | | In particular structs containing floats and 16-bit floating point types. Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Fixes: da295946361 "spirv: Only split blocks" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735 Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix float16 interpolation set up.Bas Nieuwenhuizen2019-02-226-16/+94
| | | | | | | | float16 types can have non-flat interpolation so set up the HW correctly for that. Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Disable depth clamping even without EXT_depth_range_unrestricted.Bas Nieuwenhuizen2019-02-201-2/+1
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement VK_EXT_depth_clip_enable.Bas Nieuwenhuizen2019-02-203-2/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Handle clip+cull distances more generally as compact arrays.Bas Nieuwenhuizen2019-02-204-99/+83
| | | | | | | | | | | | Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 . That MR keeps the clip and cull arrays split. So we have to handle - compact arrays with location_frac != 0 - VARYING_SLOT_CLIP_DIST1 Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Clean up a bunch of compiler warnings.Bas Nieuwenhuizen2019-02-203-7/+0
| | | | | | Random unused vars. Reviewed-by: Timothy Arceri <[email protected]>
* radv: Sync ETC2 whitelisted devices.Bas Nieuwenhuizen2019-02-203-5/+11
| | | | | Fixes: 4bb6c49375e "radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9." Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: Go back to using llvm.pow intrinsic for nir_op_fpowKenneth Graunke2019-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL leaves it undefined). Performing fpow lowering in NIR would break this behavior, preventing us from using prog_to_nir. According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>, which presumably does a zero-wins multiply. Lowering in NIR results in a non-legacy multiply, where: pow(0, 0) = 2^(log2(0) * 0) = 2^(-INF * 0) = 2^(-NaN) = -NaN which isn't the desired result. This reverts: - commit d6b75392067712908bdc372f1007e085439bf9f5 (ac/nir: remove emission of nir_op_fpow) - commit 22430224fec31591432d4a3e65c6f457ba1c1653 (radeonsi/nir: enable lowering of fpow) and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir after enabling prog_to_nir in st/mesa later in this series. Reviewed-by: Timothy Arceri <[email protected]>
* ac/nir: implement half-float nir_op_ldexpRhys Perry2019-02-191-1/+3
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: implement half-float nir_op_frsqRhys Perry2019-02-191-2/+1
| | | | | | | v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: implement half-float nir_op_frcpRhys Perry2019-02-191-2/+1
| | | | | | | v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: make ac_build_fdiv support 16-bit floatsRhys Perry2019-02-191-1/+1
| | | | | | | v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: make ac_build_isign work on all bit sizesRhys Perry2019-02-191-23/+4
| | | | | | | v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: make ac_build_clamp work on all bit sizesRhys Perry2019-02-191-4/+9
| | | | | | | | v2: don't use ac_get_zerof() and ac_get_onef() v3: rename "intr" to "name" Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: fix 64-bit nir_op_f2f16_rtzRhys Perry2019-02-191-0/+2
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: implement 8-bit nir_load_const_instrRhys Perry2019-02-191-0/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: ensure export arguments are always floatRhys Perry2019-02-191-5/+1
| | | | | | | | | | | | | So that the signature is correct and consistent, the inputs to a export intrinsic should always be 32-bit floats. This and the previous commit fixes a large amount crashes from dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_* tests Fixes: b722b29f10d ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: bitcast 16-bit outputs to integersRhys Perry2019-02-191-2/+2
| | | | | | | | | 16-bit outputs are stored as 16-bit floats in the outputs array, so they have to be bitcast. Fixes: b722b29f10d ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix writing the alpha channel of MRT0 when alpha coverage is enabledSamuel Pitoiset2019-02-181-7/+8
| | | | | | | | This version is better and safer. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused variable in gather_push_constant_info()Samuel Pitoiset2019-02-181-1/+0
| | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: write the alpha channel of MRT0 when alpha coverage is enabledSamuel Pitoiset2019-02-181-0/+8
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109597 Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: use new LLVM 8 intrinsic when loading 16-bit valuesSamuel Pitoiset2019-02-181-14/+27
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_llvm8_tbuffer_load() helperSamuel Pitoiset2019-02-182-0/+52
| | | | | | | It uses the new LLVM intrinsics. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix invalid element type when filling vertex input default valuesSamuel Pitoiset2019-02-161-1/+3
| | | | | | | | | | The elements added into a vector should have the same type as the first one, otherwise this hits an assertion in LLVM. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") reported-by: Philip Rebohle <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use correct num formats to detect whether we should be use 1.0 or 1.Bas Nieuwenhuizen2019-02-151-1/+2
| | | | | | | normalized and scaled formats also return floats. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix radv_fixup_vertex_input_fetches()Samuel Pitoiset2019-02-141-1/+1
| | | | | | | | We should check that num_channels is 4, otherwise that breaks the world. Sorry for the short breakage. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: reduce the number of loaded channels for vertex input fetchesSamuel Pitoiset2019-02-141-2/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | It's unnecessary to load more channels than the vertex attribute format. The remaining channels are filled with 0 for y and z, and 1 for w. 29077 shaders in 15096 tests Totals: SGPRS: 1321605 -> 1318869 (-0.21 %) VGPRS: 935236 -> 932252 (-0.32 %) Spilled SGPRs: 24860 -> 24776 (-0.34 %) Code Size: 49832348 -> 49819464 (-0.03 %) bytes Max Waves: 242101 -> 242611 (0.21 %) Totals from affected shaders: SGPRS: 93675 -> 90939 (-2.92 %) VGPRS: 58016 -> 55032 (-5.14 %) Spilled SGPRs: 172 -> 88 (-48.84 %) Code Size: 2862740 -> 2849856 (-0.45 %) bytes Max Waves: 15474 -> 15984 (3.30 %) This mostly helps Croteam games (Talos/Sam2017). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store vertex attribute formats as pipeline keysSamuel Pitoiset2019-02-143-3/+21
| | | | | | | The formats will be used for reducing the number of loaded channels. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use MAX_{VBS,VERTEX_ATTRIBS} when defining max vertex input limitsSamuel Pitoiset2019-02-141-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: make use of ac_build_expand_to_vec4() in visit_image_store()Samuel Pitoiset2019-02-143-8/+6
| | | | | | | And make ac_build_expand() a static function. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always export gl_SampleMask when the fragment shader uses itSamuel Pitoiset2019-02-131-4/+4
| | | | | | | | | For some reasons, this breaks trees rendering in Project Cars. Fixes: 85010585cde ("radv: only enable gl_SampleMask if MSAA is enabled too") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: fix BO list creation when RADV_DEBUG=allbos is setSamuel Pitoiset2019-02-131-0/+1
| | | | | | Fixes: 50fd253bd6e ("radv/winsys: Add priority handling during submit.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8Samuel Pitoiset2019-02-123-3/+10
| | | | | | | | | This fixes a critical issue. Cc: <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for push constants inlining when possibleSamuel Pitoiset2019-02-125-28/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some scalar loads from shaders, but it increases the number of SET_SH_REG packets. This is currently basic but it could be improved if needed. Inlining dynamic offsets might also help. Original idea from Dave Airlie. 29077 shaders in 15096 tests Totals: SGPRS: 1321325 -> 1357101 (2.71 %) VGPRS: 936000 -> 932576 (-0.37 %) Spilled SGPRs: 24804 -> 24791 (-0.05 %) Code Size: 49827960 -> 49642232 (-0.37 %) bytes Max Waves: 242007 -> 242700 (0.29 %) Totals from affected shaders: SGPRS: 290989 -> 326765 (12.29 %) VGPRS: 244680 -> 241256 (-1.40 %) Spilled SGPRs: 1442 -> 1429 (-0.90 %) Code Size: 8126688 -> 7940960 (-2.29 %) bytes Max Waves: 80952 -> 81645 (0.86 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: keep track of the number of remaining user SGPRsSamuel Pitoiset2019-02-121-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: gather if shaders load dynamic offsets separatelySamuel Pitoiset2019-02-122-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: gather more info about push constantsSamuel Pitoiset2019-02-124-1/+44
| | | | | | | | This is needed in order to inline some push constants when possible. This also adds a new helper for initializing the pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix compiler issues with GCC 9Samuel Pitoiset2019-02-121-42/+48
| | | | | | | | | | | | | | | | | "The C standard says that compound literals which occur inside of the body of a function have automatic storage duration associated with the enclosing block. Older GCC releases were putting such compound literals into the scope of the whole function, so their lifetime actually ended at the end of containing function. This has been fixed in GCC 9. Code that relied on this extended lifetime needs to be fixed, move the compound literals to whatever scope they need to accessible in." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543 Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Gustaw Smolarczyk <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove alloc parameter from pipeline initDave Airlie2019-02-111-5/+2
| | | | | | clang points out this isn't used. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/llvm: initialise passes member.Dave Airlie2019-02-111-1/+1
| | | | | | Fixes coverity warning Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: assert that colorAttachment is valid for CmdClearAttachmentLionel Landwerlin2019-02-081-3/+1
| | | | | | | | | | | | | This partially reverts a change from b7a93cbdede05a ("radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment") which fixed actual issues but also started to accept invalid values for the colorAttachment field. This change asserts that the field is valid for the current pass. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: b7a93cbdede05a ("radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement VK_EXT_buffer_device_address.Bas Nieuwenhuizen2019-02-063-1/+22
| | | | | | v2: Also update the release notes. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Do not use the bo list for local buffers.Bas Nieuwenhuizen2019-02-061-0/+6
| | | | | | The kernel already does it for us. Reviewed-by: Samuel Pitoiset <[email protected]>
* amd/common: Implement global memory accesses.Bas Nieuwenhuizen2019-02-061-20/+131
| | | | | | | | | | Needed for VK_EXT_buffer_device_address. The pointers are implmemented as i8*, since I could not figure out how to emulate setting struct offsets in LLVM based on the SPIR-V offsets (and more weird stuff like row major matrices). Acked-by: Samuel Pitoiset <[email protected]>
* amd/common: Do not use 32-bit loads for shared memory.Bas Nieuwenhuizen2019-02-061-6/+12
| | | | | | | | | We use a straight glsl->llvm type conversion so types should already be right. Also even though the writemasks were changed we we not actually doing 32-bit things, so this fails miserably. Reviewed-by: Samuel Pitoiset <[email protected]>
* amd/common: handle nir_deref_cast for shared memory from integers.Bas Nieuwenhuizen2019-02-061-68/+82
| | | | | | | Can happen e.g. after a phi. Fixes: a2b5cc3c399 "radv: enable variable pointers" Reviewed-by: Samuel Pitoiset <[email protected]>