summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac: fix possible truncation of intrinsic nameTimothy Arceri2018-06-081-1/+1
| | | | | | | | Fixes the gcc warning: snprintf’ output between 26 and 33 bytes into a destination of size 32 Fixes: d5f7ebda3ec0 ("ac: add LLVM build functions for subgroup instrinsics") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Fix number of coords for getlod.Bas Nieuwenhuizen2018-06-071-3/+18
| | | | | | | | | | The LLVM 6 code reduced it to a non-array call. We need to do that with the new code too. This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv. Fixes: a9a79934412 "amd/common: use the dimension-aware image intrinsics on LLVM 7+" Reviewed-by: Dave Airlie <[email protected]>
* radv: fix Coverity no effect control flow issueTimothy Arceri2018-06-071-1/+1
| | | | | swizzle is unsigned so "desc->swizzle[c] < 0" is never true. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use correct color format for fast clearsPhilip Rebohle2018-06-051-2/+2
| | | | | | | | | Using the image format is incorrect when the view has a different format than the image. Instead, the view format needs to be used. Reviewed-by: Bas Nieuwenhuizen <[email protected]> CC: 18.1 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106687
* radv: Do not hardcode fast clear formats.Bas Nieuwenhuizen2018-06-051-180/+73
| | | | | | | | except for the odd one out. This should support many more formats. Reviewed-by: Dave Airlie <[email protected]>
* amd/common: use the dimension-aware image intrinsics on LLVM 7+Nicolai Hähnle2018-06-041-24/+165
| | | | | | Requires LLVM trunk r329166. Acked-by: Marek Olšák <[email protected]>
* radv: fix a GPU hang when MRTs are sparseSamuel Pitoiset2018-06-041-0/+10
| | | | | | | | | | | | | | When the i-th target format is set, all previous target formats must be non-zero to avoid hangs. In other words, without this if a fragment shader exports mrt0, mrt2 and mrt3, the GPU hangs because the target format of mrt1 is zero. This fixes DXVK GPU hangs with "Seven: The Days Long Gone", "GTA V" and probably more games. Cc: "18.0" 18.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.Bas Nieuwenhuizen2018-06-041-0/+2
| | | | | | | | | | | | | | | | Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the GEOMETRY shader for the TESS_EVAL shader. This would cause the flush_constants code to emit the GEOMETRY constants to the TESS_EVAL registers and then conclude that it did not need to set the GEOMETRY shader registers. Fixes: dfff9fb6f8d "radv: Handle GFX9 merged shaders in radv_flush_constants()" CC: 18.1 <[email protected]> Reviewed-by: Alex Smith <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Handle GFX9 merged shaders in radv_flush_constants()Alex Smith2018-06-011-1/+8
| | | | | | | | | | | | | | | | | | | | This was not previously handled correctly. For example, push_constant_stages might only contain MESA_SHADER_VERTEX because only that stage was changed by CmdPushConstants or CmdBindDescriptorSets. In that case, if vertex has been merged with tess control, then the push constant address wouldn't be updated since pipeline->shaders[MESA_SHADER_VERTEX] would be NULL. Use radv_get_shader() instead of getting the shader directly so that we get the right shader if merged. Also, skip emitting the address redundantly - if two merged stages are set in push_constant_stages this change would have made the address get emitted twice. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Consolidate GFX9 merged shader lookup logicAlex Smith2018-06-013-35/+26
| | | | | | | | | This was being handled in a few different places, consolidate it into a single radv_get_shader() function. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Set active_stages the same whether or not shaders were cachedAlex Smith2018-06-011-5/+2
| | | | | | | | | | | | With GFX9 merged shaders, active_stages would be set to the original stages specified if shaders were not cached, but to the stages still present after merging if they were. Be consistent and use the original stages. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add startup debug option.Bas Nieuwenhuizen2018-05-314-2/+50
| | | | | | | | | | | | | | | | This adds a RADV_DEBUG=startup option to dump more info about instance creation and device enumeration. A common question end users have is why the direver is not loading for them, and this has two common reasons: 1) They did not install the driver. 2) AMDGPU is not used for the card in the kernel. This adds some info messages so we can easily get a some useful output from end users. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add option to print errors even in optimized builds.Bas Nieuwenhuizen2018-05-3114-96/+108
| | | | | | | | | | | Errors are not that common of a case so we can eat a slight perf hit in having to call a function and do a runtime check. In turn this makes debugging random errors happening for end users easier, because they don't have to have a debug build on hand. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Make the sem_info allocate/free functions static.Bas Nieuwenhuizen2018-05-312-15/+9
| | | | | | | They are only used in 1 file. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Only expose subgroup shuffles on VI+.Bas Nieuwenhuizen2018-05-301-2/+5
| | | | | | | | | The current implementation depends on bpermute, which is VI+. Fixes: f2c6a550611 "radv: enable subgroup capabilities" Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix emitting descriptor pointers with LLVM < 7Samuel Pitoiset2018-05-301-2/+4
| | | | | | | | | | This was terribly wrong, I forced use of 32-bit pointers when emitting shader descriptor pointers. This fixes GPU hangs with LLVM 5&6 because 32-bit pointers are only supported with LLVM 7. Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: emit shader descriptor pointers consecutivelySamuel Pitoiset2018-05-291-47/+57
| | | | | | | | | | | | This reduces the number of SET_SH_REG packets which are emitted for applications that use more than one descriptor set per stage. We should be able to emit more SET_SH_REG packets consecutively (like push constants and vertex buffers for the vertex stage), but this will be improved later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow radv_emit_shader_pointer_head() to emit more pointersSamuel Pitoiset2018-05-291-3/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: split radv_emit_shader_pointer()Samuel Pitoiset2018-05-291-5/+20
| | | | | | | | | This will allow to emit consecutive shader pointers for reducing the number of emitted SET_SH_REG packets, which is recommended. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement VK_KHR_draw_indirect_count.Bas Nieuwenhuizen2018-05-282-0/+50
| | | | | | | | | Literally the same as the AMD ext. Passes *indirect_draw_count* CTS tests. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement alternate GFX9 scissor workaround.Bas Nieuwenhuizen2018-05-281-33/+47
| | | | | | | | | | | | | | | | | | | | | This improves dota2 performance for me by 11% when I force the GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my threadripper), which should be a large part of the radv-amdvlk gap. (For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68 fps) It looks like dota2 rendered the GUI with a bunch of draws with a SetScissors before almost each draw, causing a lot of pipeline stalls. I'm not really happy with the duplication of code, but overriding radeon_set_context_reg would also be messy since we have the pre-recorded pipelines and a bunch of si_cmd_buffer code, as well as some memory->context reg loads for which things would be more complicated. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: run the EarlyCSEMemSSA LLVM passSamuel Pitoiset2018-05-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's recommended by the instruction combining pass, and RadeonSI also runs it. This pass used to segfault with one shader of F12017 in the past, but it no longer crashes. Maybe the LLVM IR generated by RADV has changed. Polaris10: Totals from affected shaders: SGPRS: 441352 -> 441648 (0.07 %) VGPRS: 310888 -> 300784 (-3.25 %) Spilled SGPRs: 13576 -> 12983 (-4.37 %) Code Size: 22560328 -> 22420544 (-0.62 %) bytes Max Waves: 40755 -> 41366 (1.50 %) Vega10: Totals from affected shaders: SGPRS: 442848 -> 442000 (-0.19 %) VGPRS: 310396 -> 300460 (-3.20 %) Spilled SGPRs: 13708 -> 12906 (-5.85 %) Code Size: 22479428 -> 22336216 (-0.64 %) bytes Max Waves: 45783 -> 46506 (1.58 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix dumping compute shader on the graphics queueSamuel Pitoiset2018-05-251-5/+8
| | | | | | | The graphics pipeline can be NULL. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_dump_pipeline_state() helperSamuel Pitoiset2018-05-251-6/+11
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rework how shaders are dumped when generating a hang reportSamuel Pitoiset2018-05-251-26/+15
| | | | | | | Use a flag for the active stages instead. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused parameter in radv_dump_annotated_shader()Samuel Pitoiset2018-05-251-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: remove some old gfx 9.x registersMarek Olšák2018-05-241-48/+0
| | | | | | | Leftover from bring up. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface/gfx6: don't overallocate mipmapped HTILEMarek Olšák2018-05-241-2/+11
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: call nir_lower_io_to_temporaries for VS, GS, TES and FSSamuel Pitoiset2018-05-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not lower FS inputs because this moves all load_var instructions at beginning of shaders and because interp_var_at_sample (and friends) seem broken. That might be eventually enabled later on if we really want to preload all FS inputs at beginning. Polaris10: Totals from affected shaders: SGPRS: 54072 -> 54264 (0.36 %) VGPRS: 38580 -> 38124 (-1.18 %) Spilled SGPRs: 652 -> 652 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 2128116 -> 2127380 (-0.03 %) bytes Max Waves: 8048 -> 8086 (0.47 %) Vega10: Totals from affected shaders: SGPRS: 52616 -> 52656 (0.08 %) VGPRS: 37536 -> 37116 (-1.12 %) Spilled SGPRs: 828 -> 828 (0.00 %) Code Size: 2043756 -> 2042672 (-0.05 %) bytes Max Waves: 9176 -> 9254 (0.85 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: call nir_split_var_copies() before nir_lower_var_copies()Samuel Pitoiset2018-05-241-0/+3
| | | | | | | | This doesn't nothing special currently because we don't create any copy_var instructions, but this is needed for the next patch. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: Use DPP for build_ddxy where possible.Bas Nieuwenhuizen2018-05-231-1/+15
| | | | | | | | | | | | WQM is pretty reliable now on LLVM 7, so let us just use DPP + WQM. This gives approximately a 1.5% performance increase on the vrcompositor built-in benchmark. v2: Use ac_build_quad_swizzle. Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface/gfx6: Don't force a tile index for fmask.Bas Nieuwenhuizen2018-05-231-1/+1
| | | | | | | | | | | | | | The bpe of the fmask often differs from the bpe of the main surface. On SI that means it has to get a different tile index. addrlib is capable of figuring this out itself, so just pass -1 instead to let it know that it is not preset. Fixes: 9bf3570fed0 "ac/surface/gfx6: compute FMASK together with the color surface" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106511 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106499 Reviewed-by: Marek Olšák <[email protected]>
* radv: fix computation of user sgprs for 32-bit pointersSamuel Pitoiset2018-05-221-1/+3
| | | | | | | With 32-bit pointers we only need one user SGPR per desc set. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop user_sgpr_info::sgpr_countSamuel Pitoiset2018-05-221-13/+11
| | | | | | | It's only used inside allocate_user_sgprs(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for 32-bit pointers in user data SGPRsSamuel Pitoiset2018-05-224-21/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We still use 64-bit GPU pointers for all ring buffers because llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit GPU pointers for now. This can be improved later anyways. Vega10: Totals from affected shaders: SGPRS: 1008722 -> 1026710 (1.78 %) VGPRS: 706580 -> 707136 (0.08 %) Spilled SGPRs: 22555 -> 22209 (-1.53 %) Spilled VGPRs: 75 -> 75 (0.00 %) Code Size: 34819208 -> 35202140 (1.10 %) bytes Max Waves: 175423 -> 175086 (-0.19 %) Polaris10: Totals from affected shaders: SGPRS: 1029849 -> 1036517 (0.65 %) VGPRS: 709984 -> 708872 (-0.16 %) Spilled SGPRs: 22672 -> 22309 (-1.60 %) Spilled VGPRs: 82 -> 66 (-19.51 %) Scratch size: 76 -> 60 (-21.05 %) dwords per thread Code Size: 34915336 -> 35309752 (1.13 %) bytes Max Waves: 151221 -> 151677 (0.30 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add set_loc_shader_ptr() helperSamuel Pitoiset2018-05-221-7/+13
| | | | | | | This helper will hep for switching to 32-bit GPU pointers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate descriptor BOs in the 32-bit addr spaceSamuel Pitoiset2018-05-221-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate the upload BO in the 32-bit addr spaceSamuel Pitoiset2018-05-221-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set amdgpu-32bit-address-high-bits LLVM attributeSamuel Pitoiset2018-05-223-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: allow to allocate BOs in the 32-bit addr spaceSamuel Pitoiset2018-05-222-1/+3
| | | | | | | This introduces a new flag called RADEON_FLAG_32BIT. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: request high addressSamuel Pitoiset2018-05-221-4/+6
| | | | | | | This is needed for 32-bit GPU pointers. Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix centroid interpolationSamuel Pitoiset2018-05-211-3/+0
| | | | | | | | | | | | | | | It's legal to set the centroid and sample interpolation modes when MSAA disabled. So, we have to initialize the centroid inputs because the hardware doesn't. This fixes rendering issues with DXVK and The Witness, World of Warcraft, Trackmania and probably more games. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390 CC: 18.0 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Cleanup unused prime blit path.Bas Nieuwenhuizen2018-05-212-25/+0
| | | | | | | | Since we have the common WSI code, we use vkCmdCopyImageToBuffer instead. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: Fix SRGB compute copies.Bas Nieuwenhuizen2018-05-212-0/+42
| | | | | | | | | | | | | | | SRGB stores are broken. We had compensation code in the resolve path but none in the copy path. Since we don't want any conversion and it does not matter for DCC, just make everything UNORM instead. This happened to cause wrong colors for the PRIME path, as that uses image->buffer copies which always use the compute path. CC: 18.0 18.1 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106587 Reviewed-by: Dave Airlie <[email protected]>
* radv: fix VK_EXT_descriptor_indexingChristoph Haag2018-05-201-1/+1
| | | | | | | | GetPhysicalDeviceProperties2KHR() was crashing because features was null Fixes: 0e10790558b "radv: Enable VK_EXT_descriptor_indexing." CC: 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/surface: Only align linear power of two fmt textures.Bas Nieuwenhuizen2018-05-201-2/+2
| | | | | | | | We're not sharing 32_32_32 formats between different GPUs, so we do not have to align for vega on pre-vega cards. Fixes: e361970ed73 "radv: Add support for IMG_DATA_FORMAT_32_32_32." Reviewed-by: Marek Olšák <[email protected]>
* amd/addrlib: Use defines in autotools build.Bas Nieuwenhuizen2018-05-201-0/+1
| | | | | | | | Otherwise stuff like NDEBUG would not be passed through. CC: <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106479 Reviewed-by: Marek Olšák <[email protected]>
* radv: pass radv_nir_compiler_options directly to create_llvm_function()Samuel Pitoiset2018-05-181-4/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: add radv_emit_shader_pointer() helperSamuel Pitoiset2018-05-173-13/+18
| | | | | | | For future work (support for 32-bit GPU pointers). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add some helpers for cleaning up radv_get_preamble_cs()Samuel Pitoiset2018-05-171-86/+128
| | | | | | | Because this function looks a bit ugly to me. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>