summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Set partial_vs_wave for pipelines with just GS, not tess.Bas Nieuwenhuizen2019-01-151-8/+20
| | | | | | | | | | | | | | | | Looking at -pro we need to enable it for pipelines with just a GS too. This seems to reduce the hangs from https://bugs.freedesktop.org/show_bug.cgi?id=109242 on a RX 550 to the point where I can't reproduce, after the false start with the wd_switch_on_eop patch due to flakiness. (but people are reporting it does not fix the issue completely for them on polaris 11) CC: <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add missing 16-bit types to glsl_base_to_llvm_type()Samuel Pitoiset2019-01-141-1/+6
| | | | | | | | Fix crashes with dEQP-VK.spirv_assembly.instruction.compute.workgroup_memory.*16 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Only use 32 KiB per threadgroup on Stoney.Bas Nieuwenhuizen2019-01-141-1/+10
| | | | | | | | | | | | Causes hangs on some machines. What works for dEQP-VK.tessellation.shader_input_output.barrier: - running num_patches = 6 (which limits LDS to 32 KiB) - running num_patches = 8, and artificially cutting LDS size at 32 KiB. CC: <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: set cache policy when loading/storing buffer imagesSamuel Pitoiset2019-01-141-14/+11
| | | | | | | This was missing. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add get_cache_policy() helper and use itSamuel Pitoiset2019-01-141-12/+26
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Restore v4i32 suffix for llvm.SI.load.const intrinsicMichel Dänzer2019-01-141-1/+1
| | | | | | | | It was accidentally dropped in commit e4803ab7d2b6 "amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0", breaking the universe with LLVM 7. Trivial.
* amd/common/vi+: enable SMEM loads with GLC=1Nicolai Hähnle2019-01-141-3/+7
| | | | | | Only on LLVM 8.0+, which supports the new intrinsic. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0Nicolai Hähnle2019-01-141-4/+8
| | | | | | llvm.SI.load.const is deprecated. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove a few more unnecessary KHR suffixesEric Engestrom2019-01-103-11/+11
| | | | | | Cc: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> (v1)
* ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsicsRhys Perry2019-01-093-1/+6
| | | | | | | | | | | | | | Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is enabled with DXVK. It also fixes artifacts with Fallout 4's god rays with DXVK. Various piglit interpolateAt*() tests under NIR are also fixed. v2: formatting fix update commit message to include Fallout 4 and the Fixes tag Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595 Signed-off-by: Rhys Perry <[email protected]>
* radv: skip draws with instance_count == 0Samuel Pitoiset2019-01-091-0/+13
| | | | | | | Loosely based on RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable variable pointersSamuel Pitoiset2019-01-091-1/+1
| | | | | | | | | | | | | | | | | | | The Vulkan spec 1.1.97 says: "variablePointers specifies whether the implementation supports the SPIR-V VariablePointers capability. When this feature is not enabled, shader modules must not declare the VariablePointers capability." As the SPIR-V feature is enabled, we should turn on the extension feature as well. All dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.* pass with the khronos internal repo. Note that a bunch of them fails with the public repo, but it's expected as they violate the specification. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: get rid of bunch of KHR suffixesSamuel Pitoiset2019-01-0910-164/+164
| | | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]>
* nir: rename global/local to private/function memoryKarol Herbst2019-01-082-7/+7
| | | | | | | | | | | | | | | | | | the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Sort supported capabilitiesJason Ekstrand2019-01-071-12/+12
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Add support for using derefs for UBO/SSBO accessJason Ekstrand2019-01-081-0/+1
| | | | | | | | | For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* spirv: Add explicit pointer typesJason Ekstrand2019-01-081-0/+4
| | | | | | | | Instead of baking in uvec2 for UBO and SSBO pointers and uint for push constant and shared memory pointers, make it configurable. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Move propagation of cast derefs to a new nir_opt_deref passJason Ekstrand2019-01-081-1/+1
| | | | | | | | | We're going to want to do more deref optimizations going forward and this gives us a central place to do them. Also, cast propagation will get a bit more complicated with the addition of ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* radv: Fix rasterization precision bits.Bas Nieuwenhuizen2019-01-071-3/+3
| | | | | | | | | | | | | | | Note that these limits are exact, not a "precision is at least x", as texel coords also get snapped to a multiple of this step size before filtering. This fixes CTS tests dEQP-VK.texture.explicit_lod.2d.sizes.31x55_nearest_linear_mipmap_nearest_repeat dEQP-VK.texture.explicit_lod.2d.sizes.57x35_nearest_linear_mipmap_nearest_repeat Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109151 Reviewed-by: Samuel Pitoiset <[email protected]>
* amd/common: Add some parentheses to silence warning.Bas Nieuwenhuizen2019-01-071-2/+2
| | | | | | | | | | | | [1/59] Compiling C object 'src/amd/common/src@amd@common@@amd_common@sta/ac_nir_to_llvm.c.o'. ../mesa/src/amd/common/ac_nir_to_llvm.c: In function ‘get_inst_tessfactor_writemask’: ../mesa/src/amd/common/ac_nir_to_llvm.c:4089:32: warning: suggest parentheses around ‘+’ inside ‘<<’ [-Wparentheses] writemask = ((1 << num_comps + 1) - 1) << first_component; ~~~~~~~~~~^~~ ../mesa/src/amd/common/ac_nir_to_llvm.c:4091:33: warning: suggest parentheses around ‘+’ inside ‘<<’ [-Wparentheses] writemask = (((1 << num_comps + 1) - 1) << first_component) << 4; Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove unused variable.Bas Nieuwenhuizen2019-01-071-1/+0
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove device path.Bas Nieuwenhuizen2019-01-072-3/+0
| | | | | | | | unused and gcc complains about strncpy. (from what I can see because strncpy does not leave a 0 byte on truncate. That said we don't use it so this does not fix a real bug). Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: remove unused variable from ac_build_ddxyMarek Olšák2019-01-071-1/+1
| | | | trivial
* radv: Implement buffer stores with less than 4 components.Bas Nieuwenhuizen2019-01-071-5/+14
| | | | | | | | | We started using it in the btoi paths for r32g32b32, and the LLVM IR checker will complain about it because we end up with intrinsics with the wrong type extension in the name. Fixes: 593996bc02 ("radv: implement buffer to image operations for R32G32B32") Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: rename nir_link_constant_varyings() nir_link_opt_varyings()Timothy Arceri2019-01-021-2/+2
| | | | | | | | | | The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ac/nir_to_llvm: add ac_are_tessfactors_def_in_all_invocs()Timothy Arceri2019-01-022-0/+163
| | | | | | | | | | | The following patch will use this with the radeonsi NIR backend but I've added it to ac so we can use it with RADV in future. This is a NIR implementation of the tgsi function tgsi_scan_tess_ctrl(). Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Do a cache flush if needed before reading predicates.Bas Nieuwenhuizen2018-12-311-0/+2
| | | | | | | | | | | | | | This caused random failures for two conditional rendering tests: dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard These wrote the predicate with the vertex shader, did a barrier and then started the conditional rendering. However the cache flushes for the barrier only happen on first draw, so after the predicate has been read. Fixes: e45ba51ea45 "radv: add support for VK_EXT_conditional_rendering" Reviewed-by: Dave Airlie <[email protected]>
* radv: Fix wrongly positioned paren.Bas Nieuwenhuizen2018-12-211-1/+1
| | | | | | Trivial. Fixes: 9f0bfbed11f "radv: Work around non-renderable 128bpp compressed 3d textures on GFX9."
* radv: enable shaderStorageImageMultisample feature on GFX8+Samuel Pitoiset2018-12-203-4/+4
| | | | | | | Untested on older chips. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for FMASK expandSamuel Pitoiset2018-12-207-0/+335
| | | | | | | Original patch by Dave Airlie. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: initialize FMASK for images in fully expanded modeSamuel Pitoiset2018-12-204-0/+39
| | | | | | | The value depends on the number of samples. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: restrict fmask lookup to image load intrinsicsSamuel Pitoiset2018-12-201-1/+1
| | | | | | | | | | | We don't ever want to do the fmask lookup on a atomic or store, the fmask should have been decompressed if the surface has been moved to IMAGE_LAYOUT. Original patch by Dave Airlie. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: compute optimal VM alignment for imported buffersSamuel Pitoiset2018-12-201-1/+30
| | | | | | | | | | This fixes GPU hangs on GFX9 with dEQP-VK.memory.external_memory_host.bind_image_memory_and_render.with_zero_offset.* Copied from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Work around non-renderable 128bpp compressed 3d textures on GFX9.Bas Nieuwenhuizen2018-12-205-8/+41
| | | | | | | | | | | Exactly what title says, the new addrlib does not allow the above with certain dimensions that the CTS seems to hit. Work around it by not allowing the app to render to it via compat with other 128bpp formats and do not render to it ourselves during copies. Fixes: 776b9113656 "amd/addrlib: update Mesa's copy of addrlib" Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix subpass image transitions with multiviewsSamuel Pitoiset2018-12-201-0/+11
| | | | | | | | The driver needs to decompress all image layers if a fast depth/color clear has been performed. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop the amdgpu-skip-threshold=1 workaround for LLVM 8Samuel Pitoiset2018-12-201-3/+9
| | | | | | | | | This workaround has been introduced by 135e4d434f6 for fixing DXVK GPU hangs with many games. It is no longer needed since LLVM r345718. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: remove the bitfield_extract workaround for LLVM 8Samuel Pitoiset2018-12-201-9/+15
| | | | | | | | This workaround has been introduced by 3d41757788a and it is no longer needed since LLVM r346422. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv/query: Use 1-bit booleans in query shadersJason Ekstrand2018-12-191-21/+21
| | | | | | | | Fixes: 44227453ec03f "nir: Switch to using 1-bit Booleans for almost..." Reviewed-by: Rhys Perry <[email protected]> Tested-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* radv/query: Add a nir_test_flag helperJason Ekstrand2018-12-191-15/+16
| | | | | | | | | | This is little more than an iadd_imm right now but it will help in the next commit where we refactor things further. Reviewed-by: Rhys Perry <[email protected]> Tested-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when availableNicolai Hähnle2018-12-192-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/surface: 3D and cube surfaces are never displayableNicolai Hähnle2018-12-191-3/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scanNicolai Hähnle2018-12-191-2/+25
| | | | | | | Allow for a unified but efficient treatment of adding a bitmask over a wave or an entire threadgroup. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: scan/reduce across waves of a workgroupNicolai Hähnle2018-12-192-4/+227
| | | | | | | Order-aware scan/reduce can trade-off LDS traffic for external atomics memory traffic in producer/consumer compute shaders. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add ac_build_ifccNicolai Hähnle2018-12-192-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: whitespace fixesNicolai Hähnle2018-12-191-10/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/sid_tables: add additional python3 compatibility importsNicolai Hähnle2018-12-191-1/+1
| | | | | | This happened to bite me while doing some experiments. Reviewed-by: Marek Olšák <[email protected]>
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: Fix multiview depth clearsBas Nieuwenhuizen2018-12-171-8/+21
| | | | | | | | We were not using the view mask for depth clears, causing only the first view to be cleared. Fixes: 2e86f6b2597 "radv: Add multiview clears." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove redundant format check.Bas Nieuwenhuizen2018-12-171-4/+0
| | | | | | | | | The switch directly after the check has a default case that returns NULL too, so the effective return value is not changed. Also this check is wrong once we start dealing with formats introduced by an extension (e.g. YUV formats). Reviewed-by: Samuel Pitoiset <[email protected]>