summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* nir/xfb: adding varyings on nir_xfb_info and gather_infoAlejandro PiƱeiro2019-03-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | In order to be used for OpenGL (right now for ARB_gl_spirv). This commit adds two new structures: * nir_xfb_varying_info: that identifies each individual varying. For each one, we need to know the type, buffer and xfb_offset * nir_xfb_buffer_info: as now for each buffer, in addition to the stride, we need to know how many varyings are assigned to it. For this patch, the only case where num_outputs != num_varyings is with the case of doubles, that for dvec3/4 could require more than one output. There are more cases though (like aoa), that will be handled on following patches. v2: updated after new nir general XFB support introduced for "anv: Add support for VK_EXT_transform_feedback" v3: compute num_varyings beforehand for allocating, instead of relying on num_outputs as approximate value (Timothy Arceri) Reviewed-by: Timothy Arceri <[email protected]>
* Revert "radv: execute external subpass barriers after ending subpasses"Samuel Pitoiset2019-03-081-2/+2
| | | | | | | | | | | | | | | This changes is actually wrong because we have to sync before doing image layout transitions. This fixes rendering issues in Batman, Path of Exile and probably more titles. This reverts commit 76c17cfd8da017ebd19be33ba6cef888957a6758. Fixes: 76c17cfd8da ("radv: execute external subpass barriers after ending subpasses") Cc: 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable lower_mul_2x32_64Samuel Pitoiset2019-03-061-0/+1
| | | | | | Fixes: 58bcebd987b ("spirv: Allow [i/u]mulExtended to use new nir opcode") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set num_components on vulkan_resource_index intrinsicLionel Landwerlin2019-03-063-10/+20
| | | | | | | | | | In 61e009d2c4e4df we changed the number of components in the vulkan_resource_index intrinsic and forgot the update Radv's code for it. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 61e009d2c4e4df ("spirv: Use the same types for resource indices as pointers") Reviewed-by: Samuel Pitoiset [email protected]
* nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc()Timothy Arceri2019-03-062-2/+2
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* glsl: rename record_location_offset() -> struct_location_offset()Timothy Arceri2019-03-061-2/+2
| | | | | | | | | | Replace done using: find ./src -type f -exec sed -i -- \ 's/record_location_offset(/struct_location_offset(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* radv: properly align the fence and EOP bug VA on GFX9Samuel Pitoiset2019-03-051-2/+5
| | | | | | | | | | If alignement is 0, offets returned by radv_cmd_buffer_upload_alloc() are always 0. These two virtual addresses were pointing at the same location. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate enough space in cmdbuf when starting a subpassSamuel Pitoiset2019-03-051-1/+1
| | | | | | | | | | | | This fixes some CTS crashes with: dEQP-VK.renderpass2.suballocation.attachment_write_mask.attachment_count_8.start_index_* Ideally, we should check cmd_buffer->cs->max_dw because there is likely enough space (the internal clear draws allocate space), but keep that way for consistency. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use the platform defines in vk.xml instead of hard-coding themEric Engestrom2019-03-051-4/+7
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* rav: use 32_AR instead of 32_ABGR when alpha coverage is requiredSamuel Pitoiset2019-03-041-1/+1
| | | | | | | | This export format is faster. Seems to improve performance in Wreckfest. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Interpolate less aggressively.Bas Nieuwenhuizen2019-02-261-9/+12
| | | | | | | | | | | | | | Seems like dxvk used integer builtins without setting the flat interpolation decoration. I believe in the current spec the app is required to set these, but in the meantime to avoid breaking things in stable releases (and so close to release for 19.0), only expand the interpolation to float16 and struct (which cannot be builtins as our spirv parser lowers the builtin block). Fixes: f3247841040 "radv: Allow interpolation on non-float types." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: don't copy buffer descriptors list for samplersSamuel Pitoiset2019-02-261-1/+5
| | | | | | | | | | | Sampler descriptors don't have a buffer list. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*.sampler_*. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix out-of-bounds access when copying descriptors BO listSamuel Pitoiset2019-02-261-2/+0
| | | | | | | | | | | We shouldn't increment the buffer list pointers twice. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix clearing attachments in secondary command buffersSamuel Pitoiset2019-02-251-10/+43
| | | | | | | | | | | If no framebuffer is bound, get the number of samples and the image format from the render pass. This fixes new CTS dEQP-VK.geometry.layered.*.secondary_cmd_buffer. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Allow interpolation on non-float types.Bas Nieuwenhuizen2019-02-221-10/+9
| | | | | | | | | | In particular structs containing floats and 16-bit floating point types. Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Fixes: da295946361 "spirv: Only split blocks" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735 Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix float16 interpolation set up.Bas Nieuwenhuizen2019-02-226-16/+94
| | | | | | | | float16 types can have non-flat interpolation so set up the HW correctly for that. Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Disable depth clamping even without EXT_depth_range_unrestricted.Bas Nieuwenhuizen2019-02-201-2/+1
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement VK_EXT_depth_clip_enable.Bas Nieuwenhuizen2019-02-203-2/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Handle clip+cull distances more generally as compact arrays.Bas Nieuwenhuizen2019-02-204-99/+83
| | | | | | | | | | | | Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 . That MR keeps the clip and cull arrays split. So we have to handle - compact arrays with location_frac != 0 - VARYING_SLOT_CLIP_DIST1 Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Clean up a bunch of compiler warnings.Bas Nieuwenhuizen2019-02-203-7/+0
| | | | | | Random unused vars. Reviewed-by: Timothy Arceri <[email protected]>
* radv: Sync ETC2 whitelisted devices.Bas Nieuwenhuizen2019-02-203-5/+11
| | | | | Fixes: 4bb6c49375e "radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9." Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: Go back to using llvm.pow intrinsic for nir_op_fpowKenneth Graunke2019-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL leaves it undefined). Performing fpow lowering in NIR would break this behavior, preventing us from using prog_to_nir. According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>, which presumably does a zero-wins multiply. Lowering in NIR results in a non-legacy multiply, where: pow(0, 0) = 2^(log2(0) * 0) = 2^(-INF * 0) = 2^(-NaN) = -NaN which isn't the desired result. This reverts: - commit d6b75392067712908bdc372f1007e085439bf9f5 (ac/nir: remove emission of nir_op_fpow) - commit 22430224fec31591432d4a3e65c6f457ba1c1653 (radeonsi/nir: enable lowering of fpow) and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir after enabling prog_to_nir in st/mesa later in this series. Reviewed-by: Timothy Arceri <[email protected]>
* ac/nir: implement half-float nir_op_ldexpRhys Perry2019-02-191-1/+3
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: implement half-float nir_op_frsqRhys Perry2019-02-191-2/+1
| | | | | | | v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: implement half-float nir_op_frcpRhys Perry2019-02-191-2/+1
| | | | | | | v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: make ac_build_fdiv support 16-bit floatsRhys Perry2019-02-191-1/+1
| | | | | | | v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: make ac_build_isign work on all bit sizesRhys Perry2019-02-191-23/+4
| | | | | | | v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: make ac_build_clamp work on all bit sizesRhys Perry2019-02-191-4/+9
| | | | | | | | v2: don't use ac_get_zerof() and ac_get_onef() v3: rename "intr" to "name" Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: fix 64-bit nir_op_f2f16_rtzRhys Perry2019-02-191-0/+2
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: implement 8-bit nir_load_const_instrRhys Perry2019-02-191-0/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: ensure export arguments are always floatRhys Perry2019-02-191-5/+1
| | | | | | | | | | | | | So that the signature is correct and consistent, the inputs to a export intrinsic should always be 32-bit floats. This and the previous commit fixes a large amount crashes from dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_* tests Fixes: b722b29f10d ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: bitcast 16-bit outputs to integersRhys Perry2019-02-191-2/+2
| | | | | | | | | 16-bit outputs are stored as 16-bit floats in the outputs array, so they have to be bitcast. Fixes: b722b29f10d ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix writing the alpha channel of MRT0 when alpha coverage is enabledSamuel Pitoiset2019-02-181-7/+8
| | | | | | | | This version is better and safer. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused variable in gather_push_constant_info()Samuel Pitoiset2019-02-181-1/+0
| | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: write the alpha channel of MRT0 when alpha coverage is enabledSamuel Pitoiset2019-02-181-0/+8
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109597 Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: use new LLVM 8 intrinsic when loading 16-bit valuesSamuel Pitoiset2019-02-181-14/+27
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_llvm8_tbuffer_load() helperSamuel Pitoiset2019-02-182-0/+52
| | | | | | | It uses the new LLVM intrinsics. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix invalid element type when filling vertex input default valuesSamuel Pitoiset2019-02-161-1/+3
| | | | | | | | | | The elements added into a vector should have the same type as the first one, otherwise this hits an assertion in LLVM. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") reported-by: Philip Rebohle <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use correct num formats to detect whether we should be use 1.0 or 1.Bas Nieuwenhuizen2019-02-151-1/+2
| | | | | | | normalized and scaled formats also return floats. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix radv_fixup_vertex_input_fetches()Samuel Pitoiset2019-02-141-1/+1
| | | | | | | | We should check that num_channels is 4, otherwise that breaks the world. Sorry for the short breakage. Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches") Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: reduce the number of loaded channels for vertex input fetchesSamuel Pitoiset2019-02-141-2/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | It's unnecessary to load more channels than the vertex attribute format. The remaining channels are filled with 0 for y and z, and 1 for w. 29077 shaders in 15096 tests Totals: SGPRS: 1321605 -> 1318869 (-0.21 %) VGPRS: 935236 -> 932252 (-0.32 %) Spilled SGPRs: 24860 -> 24776 (-0.34 %) Code Size: 49832348 -> 49819464 (-0.03 %) bytes Max Waves: 242101 -> 242611 (0.21 %) Totals from affected shaders: SGPRS: 93675 -> 90939 (-2.92 %) VGPRS: 58016 -> 55032 (-5.14 %) Spilled SGPRs: 172 -> 88 (-48.84 %) Code Size: 2862740 -> 2849856 (-0.45 %) bytes Max Waves: 15474 -> 15984 (3.30 %) This mostly helps Croteam games (Talos/Sam2017). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store vertex attribute formats as pipeline keysSamuel Pitoiset2019-02-143-3/+21
| | | | | | | The formats will be used for reducing the number of loaded channels. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use MAX_{VBS,VERTEX_ATTRIBS} when defining max vertex input limitsSamuel Pitoiset2019-02-141-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: make use of ac_build_expand_to_vec4() in visit_image_store()Samuel Pitoiset2019-02-143-8/+6
| | | | | | | And make ac_build_expand() a static function. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always export gl_SampleMask when the fragment shader uses itSamuel Pitoiset2019-02-131-4/+4
| | | | | | | | | For some reasons, this breaks trees rendering in Project Cars. Fixes: 85010585cde ("radv: only enable gl_SampleMask if MSAA is enabled too") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: fix BO list creation when RADV_DEBUG=allbos is setSamuel Pitoiset2019-02-131-0/+1
| | | | | | Fixes: 50fd253bd6e ("radv/winsys: Add priority handling during submit.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8Samuel Pitoiset2019-02-123-3/+10
| | | | | | | | | This fixes a critical issue. Cc: <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for push constants inlining when possibleSamuel Pitoiset2019-02-125-28/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some scalar loads from shaders, but it increases the number of SET_SH_REG packets. This is currently basic but it could be improved if needed. Inlining dynamic offsets might also help. Original idea from Dave Airlie. 29077 shaders in 15096 tests Totals: SGPRS: 1321325 -> 1357101 (2.71 %) VGPRS: 936000 -> 932576 (-0.37 %) Spilled SGPRs: 24804 -> 24791 (-0.05 %) Code Size: 49827960 -> 49642232 (-0.37 %) bytes Max Waves: 242007 -> 242700 (0.29 %) Totals from affected shaders: SGPRS: 290989 -> 326765 (12.29 %) VGPRS: 244680 -> 241256 (-1.40 %) Spilled SGPRs: 1442 -> 1429 (-0.90 %) Code Size: 8126688 -> 7940960 (-2.29 %) bytes Max Waves: 80952 -> 81645 (0.86 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: keep track of the number of remaining user SGPRsSamuel Pitoiset2019-02-121-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: gather if shaders load dynamic offsets separatelySamuel Pitoiset2019-02-122-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>