summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: remove multisample bit from shader key.Dave Airlie2018-06-153-4/+0
| | | | | | This wasn't being used anywhere inside the shader from what I can see. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix output for sparse MRTs.Bas Nieuwenhuizen2018-06-141-9/+10
| | | | | | | | | | | | We need to init the cb_shader_format correctly with the changed col_format, so this moves the col_format adjustment to before the adjustment to before the cb_shader_mask gets generated. Fixes: 06d3c650980 "radv: fix a GPU hang when MRTs are sparse" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903 CC: 18.1 <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: update the ZRANGE_PRECISION value for the TC-compat bugSamuel Pitoiset2018-06-141-0/+108
| | | | | | | | | | | | | | | | | | | | | | On GFX8+, there is a bug that affects TC-compatible depth surfaces when the ZRange is not reset after LateZ kills pixels. The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match the last fast clear value. Because the value is set to 1 by default, we only need to update it when clearing Z to 0.0. We also need to set the depth clear regs and to update ZRANGE_PRECISION when initializing a TC-compat depth image to 0. Original patch from James Legg. This fixes random CTS fails with dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396 CC: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: handle undefined EQAA samples in ac_apply_fmask_to_sampleMarek Olšák2018-06-131-2/+4
| | | | | | RADV might wanna use this helper too. Tested-by: Dieter Nützel <[email protected]>
* ac/gpu_info: report real total memory sizesMarek Olšák2018-06-131-28/+54
| | | | | | | The change from MIN2 to MAX2 is intentional. Cc: 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't fast clear HTILE for 16-bit depth surfaces on GFX8Samuel Pitoiset2018-06-131-0/+8
| | | | | | | | | | This causes rendering issues in Shadow Warrior 2 with DXVK. Cc: [email protected] Fixes: ccc64f3133 ("radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106912 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a workaround for DXVK hangs by setting amdgpu-skip-thresholdSamuel Pitoiset2018-06-091-1/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | Workaround for bug in llvm that causes the GPU to hang in presence of nested loops because there is an exec mask issue. The proper solution is to fix LLVM but this might require a bunch of work. This fixes a bunch of GPU hangs that happen with DXVK. Vega10: Totals from affected shaders: SGPRS: 110456 -> 110456 (0.00 %) VGPRS: 122800 -> 122800 (0.00 %) Spilled SGPRs: 7478 -> 7478 (0.00 %) Spilled VGPRs: 36 -> 36 (0.00 %) Code Size: 9901104 -> 9922928 (0.22 %) bytes Max Waves: 7143 -> 7143 (0.00 %) Code size slightly increases because it inserts more branch instructions but that's expected. I don't see any real performance changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105613 Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix missing ZRANGE_PRECISION(1) for GFX9+Samuel Pitoiset2018-06-091-1/+2
| | | | | | | | | | | | | | | ZRANGE_PRECISION(1) seems to be the default optimal value, but it was only set for VI and older chips. This fixes a rendering issue with Banished through DXVK, and might fix more than that. There is still the ZRANGE_PRECISION bug that we need to handle but that can be fixed later. Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix possible truncation of intrinsic nameTimothy Arceri2018-06-081-1/+1
| | | | | | | | Fixes the gcc warning: snprintf’ output between 26 and 33 bytes into a destination of size 32 Fixes: d5f7ebda3ec0 ("ac: add LLVM build functions for subgroup instrinsics") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Fix number of coords for getlod.Bas Nieuwenhuizen2018-06-071-3/+18
| | | | | | | | | | The LLVM 6 code reduced it to a non-array call. We need to do that with the new code too. This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv. Fixes: a9a79934412 "amd/common: use the dimension-aware image intrinsics on LLVM 7+" Reviewed-by: Dave Airlie <[email protected]>
* radv: fix Coverity no effect control flow issueTimothy Arceri2018-06-071-1/+1
| | | | | swizzle is unsigned so "desc->swizzle[c] < 0" is never true. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use correct color format for fast clearsPhilip Rebohle2018-06-051-2/+2
| | | | | | | | | Using the image format is incorrect when the view has a different format than the image. Instead, the view format needs to be used. Reviewed-by: Bas Nieuwenhuizen <[email protected]> CC: 18.1 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106687
* radv: Do not hardcode fast clear formats.Bas Nieuwenhuizen2018-06-051-180/+73
| | | | | | | | except for the odd one out. This should support many more formats. Reviewed-by: Dave Airlie <[email protected]>
* amd/common: use the dimension-aware image intrinsics on LLVM 7+Nicolai Hähnle2018-06-041-24/+165
| | | | | | Requires LLVM trunk r329166. Acked-by: Marek Olšák <[email protected]>
* radv: fix a GPU hang when MRTs are sparseSamuel Pitoiset2018-06-041-0/+10
| | | | | | | | | | | | | | When the i-th target format is set, all previous target formats must be non-zero to avoid hangs. In other words, without this if a fragment shader exports mrt0, mrt2 and mrt3, the GPU hangs because the target format of mrt1 is zero. This fixes DXVK GPU hangs with "Seven: The Days Long Gone", "GTA V" and probably more games. Cc: "18.0" 18.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.Bas Nieuwenhuizen2018-06-041-0/+2
| | | | | | | | | | | | | | | | Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the GEOMETRY shader for the TESS_EVAL shader. This would cause the flush_constants code to emit the GEOMETRY constants to the TESS_EVAL registers and then conclude that it did not need to set the GEOMETRY shader registers. Fixes: dfff9fb6f8d "radv: Handle GFX9 merged shaders in radv_flush_constants()" CC: 18.1 <[email protected]> Reviewed-by: Alex Smith <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Handle GFX9 merged shaders in radv_flush_constants()Alex Smith2018-06-011-1/+8
| | | | | | | | | | | | | | | | | | | | This was not previously handled correctly. For example, push_constant_stages might only contain MESA_SHADER_VERTEX because only that stage was changed by CmdPushConstants or CmdBindDescriptorSets. In that case, if vertex has been merged with tess control, then the push constant address wouldn't be updated since pipeline->shaders[MESA_SHADER_VERTEX] would be NULL. Use radv_get_shader() instead of getting the shader directly so that we get the right shader if merged. Also, skip emitting the address redundantly - if two merged stages are set in push_constant_stages this change would have made the address get emitted twice. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Consolidate GFX9 merged shader lookup logicAlex Smith2018-06-013-35/+26
| | | | | | | | | This was being handled in a few different places, consolidate it into a single radv_get_shader() function. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Set active_stages the same whether or not shaders were cachedAlex Smith2018-06-011-5/+2
| | | | | | | | | | | | With GFX9 merged shaders, active_stages would be set to the original stages specified if shaders were not cached, but to the stages still present after merging if they were. Be consistent and use the original stages. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add startup debug option.Bas Nieuwenhuizen2018-05-314-2/+50
| | | | | | | | | | | | | | | | This adds a RADV_DEBUG=startup option to dump more info about instance creation and device enumeration. A common question end users have is why the direver is not loading for them, and this has two common reasons: 1) They did not install the driver. 2) AMDGPU is not used for the card in the kernel. This adds some info messages so we can easily get a some useful output from end users. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add option to print errors even in optimized builds.Bas Nieuwenhuizen2018-05-3114-96/+108
| | | | | | | | | | | Errors are not that common of a case so we can eat a slight perf hit in having to call a function and do a runtime check. In turn this makes debugging random errors happening for end users easier, because they don't have to have a debug build on hand. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Make the sem_info allocate/free functions static.Bas Nieuwenhuizen2018-05-312-15/+9
| | | | | | | They are only used in 1 file. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Only expose subgroup shuffles on VI+.Bas Nieuwenhuizen2018-05-301-2/+5
| | | | | | | | | The current implementation depends on bpermute, which is VI+. Fixes: f2c6a550611 "radv: enable subgroup capabilities" Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix emitting descriptor pointers with LLVM < 7Samuel Pitoiset2018-05-301-2/+4
| | | | | | | | | | This was terribly wrong, I forced use of 32-bit pointers when emitting shader descriptor pointers. This fixes GPU hangs with LLVM 5&6 because 32-bit pointers are only supported with LLVM 7. Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: emit shader descriptor pointers consecutivelySamuel Pitoiset2018-05-291-47/+57
| | | | | | | | | | | | This reduces the number of SET_SH_REG packets which are emitted for applications that use more than one descriptor set per stage. We should be able to emit more SET_SH_REG packets consecutively (like push constants and vertex buffers for the vertex stage), but this will be improved later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow radv_emit_shader_pointer_head() to emit more pointersSamuel Pitoiset2018-05-291-3/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: split radv_emit_shader_pointer()Samuel Pitoiset2018-05-291-5/+20
| | | | | | | | | This will allow to emit consecutive shader pointers for reducing the number of emitted SET_SH_REG packets, which is recommended. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement VK_KHR_draw_indirect_count.Bas Nieuwenhuizen2018-05-282-0/+50
| | | | | | | | | Literally the same as the AMD ext. Passes *indirect_draw_count* CTS tests. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement alternate GFX9 scissor workaround.Bas Nieuwenhuizen2018-05-281-33/+47
| | | | | | | | | | | | | | | | | | | | | This improves dota2 performance for me by 11% when I force the GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my threadripper), which should be a large part of the radv-amdvlk gap. (For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68 fps) It looks like dota2 rendered the GUI with a bunch of draws with a SetScissors before almost each draw, causing a lot of pipeline stalls. I'm not really happy with the duplication of code, but overriding radeon_set_context_reg would also be messy since we have the pre-recorded pipelines and a bunch of si_cmd_buffer code, as well as some memory->context reg loads for which things would be more complicated. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: run the EarlyCSEMemSSA LLVM passSamuel Pitoiset2018-05-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's recommended by the instruction combining pass, and RadeonSI also runs it. This pass used to segfault with one shader of F12017 in the past, but it no longer crashes. Maybe the LLVM IR generated by RADV has changed. Polaris10: Totals from affected shaders: SGPRS: 441352 -> 441648 (0.07 %) VGPRS: 310888 -> 300784 (-3.25 %) Spilled SGPRs: 13576 -> 12983 (-4.37 %) Code Size: 22560328 -> 22420544 (-0.62 %) bytes Max Waves: 40755 -> 41366 (1.50 %) Vega10: Totals from affected shaders: SGPRS: 442848 -> 442000 (-0.19 %) VGPRS: 310396 -> 300460 (-3.20 %) Spilled SGPRs: 13708 -> 12906 (-5.85 %) Code Size: 22479428 -> 22336216 (-0.64 %) bytes Max Waves: 45783 -> 46506 (1.58 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix dumping compute shader on the graphics queueSamuel Pitoiset2018-05-251-5/+8
| | | | | | | The graphics pipeline can be NULL. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_dump_pipeline_state() helperSamuel Pitoiset2018-05-251-6/+11
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rework how shaders are dumped when generating a hang reportSamuel Pitoiset2018-05-251-26/+15
| | | | | | | Use a flag for the active stages instead. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused parameter in radv_dump_annotated_shader()Samuel Pitoiset2018-05-251-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: remove some old gfx 9.x registersMarek Olšák2018-05-241-48/+0
| | | | | | | Leftover from bring up. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface/gfx6: don't overallocate mipmapped HTILEMarek Olšák2018-05-241-2/+11
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: call nir_lower_io_to_temporaries for VS, GS, TES and FSSamuel Pitoiset2018-05-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not lower FS inputs because this moves all load_var instructions at beginning of shaders and because interp_var_at_sample (and friends) seem broken. That might be eventually enabled later on if we really want to preload all FS inputs at beginning. Polaris10: Totals from affected shaders: SGPRS: 54072 -> 54264 (0.36 %) VGPRS: 38580 -> 38124 (-1.18 %) Spilled SGPRs: 652 -> 652 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 2128116 -> 2127380 (-0.03 %) bytes Max Waves: 8048 -> 8086 (0.47 %) Vega10: Totals from affected shaders: SGPRS: 52616 -> 52656 (0.08 %) VGPRS: 37536 -> 37116 (-1.12 %) Spilled SGPRs: 828 -> 828 (0.00 %) Code Size: 2043756 -> 2042672 (-0.05 %) bytes Max Waves: 9176 -> 9254 (0.85 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: call nir_split_var_copies() before nir_lower_var_copies()Samuel Pitoiset2018-05-241-0/+3
| | | | | | | | This doesn't nothing special currently because we don't create any copy_var instructions, but this is needed for the next patch. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: Use DPP for build_ddxy where possible.Bas Nieuwenhuizen2018-05-231-1/+15
| | | | | | | | | | | | WQM is pretty reliable now on LLVM 7, so let us just use DPP + WQM. This gives approximately a 1.5% performance increase on the vrcompositor built-in benchmark. v2: Use ac_build_quad_swizzle. Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface/gfx6: Don't force a tile index for fmask.Bas Nieuwenhuizen2018-05-231-1/+1
| | | | | | | | | | | | | | The bpe of the fmask often differs from the bpe of the main surface. On SI that means it has to get a different tile index. addrlib is capable of figuring this out itself, so just pass -1 instead to let it know that it is not preset. Fixes: 9bf3570fed0 "ac/surface/gfx6: compute FMASK together with the color surface" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106511 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106499 Reviewed-by: Marek Olšák <[email protected]>
* radv: fix computation of user sgprs for 32-bit pointersSamuel Pitoiset2018-05-221-1/+3
| | | | | | | With 32-bit pointers we only need one user SGPR per desc set. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop user_sgpr_info::sgpr_countSamuel Pitoiset2018-05-221-13/+11
| | | | | | | It's only used inside allocate_user_sgprs(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for 32-bit pointers in user data SGPRsSamuel Pitoiset2018-05-224-21/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We still use 64-bit GPU pointers for all ring buffers because llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit GPU pointers for now. This can be improved later anyways. Vega10: Totals from affected shaders: SGPRS: 1008722 -> 1026710 (1.78 %) VGPRS: 706580 -> 707136 (0.08 %) Spilled SGPRs: 22555 -> 22209 (-1.53 %) Spilled VGPRs: 75 -> 75 (0.00 %) Code Size: 34819208 -> 35202140 (1.10 %) bytes Max Waves: 175423 -> 175086 (-0.19 %) Polaris10: Totals from affected shaders: SGPRS: 1029849 -> 1036517 (0.65 %) VGPRS: 709984 -> 708872 (-0.16 %) Spilled SGPRs: 22672 -> 22309 (-1.60 %) Spilled VGPRs: 82 -> 66 (-19.51 %) Scratch size: 76 -> 60 (-21.05 %) dwords per thread Code Size: 34915336 -> 35309752 (1.13 %) bytes Max Waves: 151221 -> 151677 (0.30 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add set_loc_shader_ptr() helperSamuel Pitoiset2018-05-221-7/+13
| | | | | | | This helper will hep for switching to 32-bit GPU pointers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate descriptor BOs in the 32-bit addr spaceSamuel Pitoiset2018-05-221-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate the upload BO in the 32-bit addr spaceSamuel Pitoiset2018-05-221-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set amdgpu-32bit-address-high-bits LLVM attributeSamuel Pitoiset2018-05-223-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: allow to allocate BOs in the 32-bit addr spaceSamuel Pitoiset2018-05-222-1/+3
| | | | | | | This introduces a new flag called RADEON_FLAG_32BIT. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: request high addressSamuel Pitoiset2018-05-221-4/+6
| | | | | | | This is needed for 32-bit GPU pointers. Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix centroid interpolationSamuel Pitoiset2018-05-211-3/+0
| | | | | | | | | | | | | | | It's legal to set the centroid and sample interpolation modes when MSAA disabled. So, we have to initialize the centroid inputs because the hardware doesn't. This fixes rendering issues with DXVK and The Witness, World of Warcraft, Trackmania and probably more games. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390 CC: 18.0 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>