summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Implement buffer stores with less than 4 components.Bas Nieuwenhuizen2019-01-071-5/+14
| | | | | | | | | We started using it in the btoi paths for r32g32b32, and the LLVM IR checker will complain about it because we end up with intrinsics with the wrong type extension in the name. Fixes: 593996bc02 ("radv: implement buffer to image operations for R32G32B32") Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: rename nir_link_constant_varyings() nir_link_opt_varyings()Timothy Arceri2019-01-021-2/+2
| | | | | | | | | | The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ac/nir_to_llvm: add ac_are_tessfactors_def_in_all_invocs()Timothy Arceri2019-01-022-0/+163
| | | | | | | | | | | The following patch will use this with the radeonsi NIR backend but I've added it to ac so we can use it with RADV in future. This is a NIR implementation of the tgsi function tgsi_scan_tess_ctrl(). Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Do a cache flush if needed before reading predicates.Bas Nieuwenhuizen2018-12-311-0/+2
| | | | | | | | | | | | | | This caused random failures for two conditional rendering tests: dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard These wrote the predicate with the vertex shader, did a barrier and then started the conditional rendering. However the cache flushes for the barrier only happen on first draw, so after the predicate has been read. Fixes: e45ba51ea45 "radv: add support for VK_EXT_conditional_rendering" Reviewed-by: Dave Airlie <[email protected]>
* radv: Fix wrongly positioned paren.Bas Nieuwenhuizen2018-12-211-1/+1
| | | | | | Trivial. Fixes: 9f0bfbed11f "radv: Work around non-renderable 128bpp compressed 3d textures on GFX9."
* radv: enable shaderStorageImageMultisample feature on GFX8+Samuel Pitoiset2018-12-203-4/+4
| | | | | | | Untested on older chips. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for FMASK expandSamuel Pitoiset2018-12-207-0/+335
| | | | | | | Original patch by Dave Airlie. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: initialize FMASK for images in fully expanded modeSamuel Pitoiset2018-12-204-0/+39
| | | | | | | The value depends on the number of samples. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: restrict fmask lookup to image load intrinsicsSamuel Pitoiset2018-12-201-1/+1
| | | | | | | | | | | We don't ever want to do the fmask lookup on a atomic or store, the fmask should have been decompressed if the surface has been moved to IMAGE_LAYOUT. Original patch by Dave Airlie. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: compute optimal VM alignment for imported buffersSamuel Pitoiset2018-12-201-1/+30
| | | | | | | | | | This fixes GPU hangs on GFX9 with dEQP-VK.memory.external_memory_host.bind_image_memory_and_render.with_zero_offset.* Copied from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Work around non-renderable 128bpp compressed 3d textures on GFX9.Bas Nieuwenhuizen2018-12-205-8/+41
| | | | | | | | | | | Exactly what title says, the new addrlib does not allow the above with certain dimensions that the CTS seems to hit. Work around it by not allowing the app to render to it via compat with other 128bpp formats and do not render to it ourselves during copies. Fixes: 776b9113656 "amd/addrlib: update Mesa's copy of addrlib" Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix subpass image transitions with multiviewsSamuel Pitoiset2018-12-201-0/+11
| | | | | | | | The driver needs to decompress all image layers if a fast depth/color clear has been performed. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop the amdgpu-skip-threshold=1 workaround for LLVM 8Samuel Pitoiset2018-12-201-3/+9
| | | | | | | | | This workaround has been introduced by 135e4d434f6 for fixing DXVK GPU hangs with many games. It is no longer needed since LLVM r345718. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: remove the bitfield_extract workaround for LLVM 8Samuel Pitoiset2018-12-201-9/+15
| | | | | | | | This workaround has been introduced by 3d41757788a and it is no longer needed since LLVM r346422. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv/query: Use 1-bit booleans in query shadersJason Ekstrand2018-12-191-21/+21
| | | | | | | | Fixes: 44227453ec03f "nir: Switch to using 1-bit Booleans for almost..." Reviewed-by: Rhys Perry <[email protected]> Tested-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* radv/query: Add a nir_test_flag helperJason Ekstrand2018-12-191-15/+16
| | | | | | | | | | This is little more than an iadd_imm right now but it will help in the next commit where we refactor things further. Reviewed-by: Rhys Perry <[email protected]> Tested-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when availableNicolai Hähnle2018-12-192-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/surface: 3D and cube surfaces are never displayableNicolai Hähnle2018-12-191-3/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scanNicolai Hähnle2018-12-191-2/+25
| | | | | | | Allow for a unified but efficient treatment of adding a bitmask over a wave or an entire threadgroup. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: scan/reduce across waves of a workgroupNicolai Hähnle2018-12-192-4/+227
| | | | | | | Order-aware scan/reduce can trade-off LDS traffic for external atomics memory traffic in producer/consumer compute shaders. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add ac_build_ifccNicolai Hähnle2018-12-192-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: whitespace fixesNicolai Hähnle2018-12-191-10/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/sid_tables: add additional python3 compatibility importsNicolai Hähnle2018-12-191-1/+1
| | | | | | This happened to bite me while doing some experiments. Reviewed-by: Marek Olšák <[email protected]>
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: Fix multiview depth clearsBas Nieuwenhuizen2018-12-171-8/+21
| | | | | | | | We were not using the view mask for depth clears, causing only the first view to be cleared. Fixes: 2e86f6b2597 "radv: Add multiview clears." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove redundant format check.Bas Nieuwenhuizen2018-12-171-4/+0
| | | | | | | | | The switch directly after the check has a default case that returns NULL too, so the effective return value is not changed. Also this check is wrong once we start dealing with formats introduced by an extension (e.g. YUV formats). Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: report Vulkan version 1.1.90 for realSamuel Pitoiset2018-12-171-1/+1
| | | | | | | | I thought the value was correctly propagated, but actually not. Fixes: 2ac6d55f38c ("radv: bump reported version to 1.1.90") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv,radv: Re-enable VK_EXT_pci_bus_infoJason Ekstrand2018-12-171-1/+1
| | | | | | | Now at version 2 with the fixed header. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: switch from nir_bcsel to nir_b32cselRhys Perry2018-12-171-4/+4
| | | | | | Fixes: 191a1dce928 ('nir: Add 1-bit Boolean opcodes') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: don't set surf_index for stencil-only imagesRhys Perry2018-12-171-1/+1
| | | | | | | | Fixes: f8d5b377c8b ('radv: set cb base tile swizzles for MRT speedups (v4)') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108116 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix a stupid if in gather_intrinsic_infoJason Ekstrand2018-12-161-9/+9
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Add a bool to int32 lowering passJason Ekstrand2018-12-161-0/+4
| | | | | | | | We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* ac: split 16-bit ssbo loads that may not be dword alignedRhys Perry2018-12-161-0/+2
| | | | | | | Fixes: 7e7ee826982 ('ac: add support for 16bit buffer loads') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108114 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: refactor visit_load_bufferRhys Perry2018-12-162-44/+42
| | | | | | | This is so that we can split different types of loads more easily. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/xfb: fix counter buffer bounds checks.Dave Airlie2018-12-131-2/+2
| | | | | | | | | | | | If we gave this function 0 counter buffers, we'd still try and access pCounterBuffers[0] as this check was incorrect. Fixes crash with ext_transform_feedback-pipeline-basic-primgen on zink on radv. Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.) Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: don't check if format is depth in radv_image_can_enable_hile()Samuel Pitoiset2018-12-131-1/+0
| | | | | | | This is always TRUE if htile_size is not 0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: check if addrlib enabled HTILE in radv_image_can_enable_htile()Samuel Pitoiset2018-12-131-1/+2
| | | | | | | | When hile_size is 0, we can't enable HTILE. This doesn't change anything, except not calling radv_image_alloc_htile(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: switch on EOP when primitive restart is enabled with triangle stripsSamuel Pitoiset2018-12-131-2/+1
| | | | | | | | | | Otherwise, Yakuza hangs the GPU with DXVK. We don't know if linetrip and pointlist are affected, so my point is to do that only for triangle strips. Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow to skip DCC decompressions with the new predicateSamuel Pitoiset2018-12-131-6/+13
| | | | | | | | | | Feral games aren't affected because they don't decompress DCC. F1 2018 has one DCC decompression per frame, but I don't see any performance improvements. This new predicate will be probably more useful for DCC/MSAA. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a predicate for reflecting DCC decompression stateSamuel Pitoiset2018-12-135-1/+44
| | | | | | | It's somehow similar to the FCE predicate. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: bump reported version to 1.1.90Samuel Pitoiset2018-12-121-1/+1
| | | | | | | | After going through the spec changelog, it looks like RADV is up to date. Note that ANV also reports 1.1.90. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv,radv: Disable VK_EXT_pci_bus_infoJason Ekstrand2018-12-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The Vulkan working group recently discovered that we made a mistake in assuming that PCI domains are 16-bit even though they can potentially be 32-bit values. To fix this, the next spec update will change the types in the VK_EXT_pci_bus_info struct to be 32 bits which will be a backwards-incompatible change. Normally, Khronos tries very hard to never make backwards incompatible changes to specs. Hopefully, the extension is new enough (2 months) that there are no shipping apps which use the extension so this should be safe. This commit disables the extension for both anv and radv in mesa and should be back-ported to 18.3 ASAP so we avoid any potential issues with new apps running on old drivers. I'll send out a commit (which we can also back-port to 18.3 if we really care) to re-enable the extension in both drivers once this week's spec update ships. The one known use of this extension is internal to mesa and will continue working with the extension disabled and will naturally update when we get a new header. Cc: "18.3" <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
* amd/addrlib: drop si_ci_vi_merged_enum.h from the listEmil Velikov2018-12-101-1/+0
| | | | | Fixes: 776b9113656 ("amd/addrlib: update Mesa's copy of addrlib") Signed-off-by: Emil Velikov <[email protected]>
* amd: remove support for LLVM 6.0Samuel Pitoiset2018-12-0614-325/+47
| | | | | | | User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+8
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
* radv: expose VK_EXT_scalar_block_layoutSamuel Pitoiset2018-12-052-0/+7
| | | | | | | | | | | | Nothing to do, the compiler already handles that. All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some 16-bit tests that are quite related to fdo bug #108114. Only enable the extension on CIK+ because it might not work on SI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: wait on the high 32 bits of timestamp queriesSamuel Pitoiset2018-12-051-1/+4
| | | | | | | | In case we are unlucky if the low part is 0xffffffff. Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: reset pending_reset_query when flushing cachesSamuel Pitoiset2018-12-052-1/+5
| | | | | | | | | | | | | If the driver used a compute shader for resetting a query pool, it should be completed when caches are flushed. This might reduce the number of stalls if operations are done between vkCmdResetQueryPool() and vkCmdBeginQuery() (or vkCmdWriteTimestamp()). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Alex Smith <[email protected]>