summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: emit the GLC bit for SSBO loads/stores when neededSamuel Pitoiset2018-10-123-8/+22
| | | | | | | | | This fixes some new memory model tests: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement clear operations for R32G32B32Samuel Pitoiset2018-10-113-1/+284
| | | | | | | | | | This fixes crashes for some CTS: dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.linear_*_* dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.*_linear_* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formatsSamuel Pitoiset2018-10-111-0/+14
| | | | | | | | R32G32B32 are weird formats and we are only going to support some basic operations for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a workaround for a VGT hang with prim restart and stripsSamuel Pitoiset2018-10-111-0/+11
| | | | | | | | | | | Otherwise, Yakuza and The Evil Within hang the GPU with DXVK. This apparently only works on Polaris. Suggested by Marek. Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unsigned comparison against 0Dave Airlie2018-10-111-1/+1
| | | | | | | | The value is always >= 0 here. Found by coverity Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: remove dead code for master_fd closeDave Airlie2018-10-111-2/+0
| | | | | | | | | We have never opened master_Fd at this point, so remove code to close it. Found by coverity. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: don't pass shader key by copyDave Airlie2018-10-111-7/+6
| | | | | | Coverity pointed out we were copying 168 bytes here unnecessarily. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: add missing meson c++ visibility argumentsEric Engestrom2018-10-091-0/+1
| | | | | | | | Fixes: 6f3aee40f90d725653b6 "radv: using tls to store llvm related info and speed up compiles (v10)" Cc: Dave Airlie <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* radv: tidy up radv_pipeline_init_multisample_state()Samuel Pitoiset2018-10-081-19/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always set PA_SC_MODE_CNTL_1.OUT_OF_ORDER_WATER_MARKSamuel Pitoiset2018-10-081-2/+2
| | | | | | | | It has probably no effect without out of order rasterization anyway. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set DB_EQAA.INCOHERENT_EQAA_READSSamuel Pitoiset2018-10-081-1/+1
| | | | | | | | My attempt was to set this field instead of duplicating one. Fixes: 6cfa321c39 ("radv: add potential missing fields for DB_EQAA") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_roundMarek Olšák2018-10-063-3/+19
|
* ac: correct PKT3_COPY_DATA definitionsMarek Olšák2018-10-063-9/+16
|
* ac: simplify LLVM alloca helpersMarek Olšák2018-10-061-7/+4
|
* ac: define all address spaces properlyMarek Olšák2018-10-064-11/+13
|
* radv: fix resetting the pool for timestamp queriesSamuel Pitoiset2018-10-041-7/+5
| | | | | | | | | | | Since the driver no longer uses the availability bit for timestamp queries it shouldn't reset it. Instead, it should reset the query values to UINT32_MAX. This fixes VM faults. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108164 Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Józef Kucia <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* util: disable cache if we have no build-id and timestamp is zeroTimothy Arceri2018-10-021-4/+0
| | | | | | | | | | Timestamp can be zero for example when Flatpak is used. In this case just disable the cache rather then segfaulting when incompatible cache items are loaded. V2: actually return false when mtime is 0. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not try to set DCC_CONTROL when image doesn't use DCCSamuel Pitoiset2018-10-011-1/+1
| | | | | | | | Unnecessary. While we are at it, remove the check for pre-VI because it's already checked earlier. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a sanity check for mutable formats and TC-compat HTILESamuel Pitoiset2018-10-011-5/+22
| | | | | | | | | | If apps use the MUTABLE bit and the same formats as the image one in the list, we can still enable TC-compat HTILE. I don't think this happens often but given the fact that TC-compat HTILE allows a nice boost in some situations, it's worth checking. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: disable HTILE for very small depth surfacesSamuel Pitoiset2018-10-011-1/+3
| | | | | | | | | | | Like we disable DCC/CMASK for small color surfaces as well. Serious Sam 2017 creates a 1x1 depth surface and I think it should be faster to do slow clears on the graphics queue instead of fast clears on compute, and eventually a depth expand if the surface isn't TC-compatible HTILE. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add potential missing fields for DB_EQAASamuel Pitoiset2018-10-011-1/+3
| | | | | | | Other drivers set these two as well, just apply the same rule. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: disable complicated point clipping against user clip planesSamuel Pitoiset2018-10-011-1/+0
| | | | | | | | | I don't think this is required by Vulkan too. Ported from RadeonSI (AMDVLK doesn't set it either). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not sync CP DMA when copying buffersSamuel Pitoiset2018-09-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already track if the DMA engine is busy/idle with a flag, and we emit a packet that waits for all CP DMA operations to be complete. This is done at end of command buffer because the kernel doesn't wait for them, and also when emitting barriers, so it should be safe. This improves small copies for both aligned and unaligned sizes. Aligned sizes: BEFORE: 1 KB: 59.840000 ms 2 KB: 71.200000 ms AFTER: 1 KB: 31.200000 ms 2 KB: 31.040000 ms Unaligned sizes: BEFORE: 2 KB: 68.3200 ms 3 KB: 79.3600 ms 5 KB: 76.6400 ms 9 KB: 90.8800 ms 17 KB: 116.0000 ms AFTER: 2 KB: 31.0400 ms 3 KB: 32.0000 ms 5 KB: 30.8800 ms 9 KB: 30.5600 ms 17 KB: 29.6000 ms Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: adjust the CmdUpdateBuffer threshold for optimal performanceSamuel Pitoiset2018-09-282-1/+3
| | | | | | | | | | | | | | | | According to my benchmark results, it appears that we should reduce the threshold to 1024. BEFORE: 1 KB: 68.656000 ms 2 KB: 118.368000 ms AFTER: 1 KB: 31.760000 ms 2 KB: 29.840000 ms Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not use the availability bit for timestamp queriesSamuel Pitoiset2018-09-282-30/+28
| | | | | | | | | | | It's unnecessary because we can just check if the timestamp is to different to the default value when a pool is created or resetted. Instead of waiting for the availability bit to be 1, we have to emit a not equal WAIT_REG_MEM for checking if the timestamp is ready. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Remove garbage comment.Bas Nieuwenhuizen2018-09-271-1/+0
| | | | Trivial.
* radv: Do not use multiple draws for multisample copies.Bas Nieuwenhuizen2018-09-271-57/+5
| | | | | | | | Use sample rate shading instead, should give better locality. Makes Nier with 8x msaa on a Raven go 5 fps -> 7 fps in the menu. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: only emit ZPASS_DONE for timestamp queries on gfx queuesAndres Rodriguez2018-09-251-1/+1
| | | | | | | | | | | | A ZPASS_DONE packet doesn't make sense for the compute queue. It will result in a gpu hang. This change resolves a gpu hang for SteamVR+Vega. Cc: [email protected] Fixes: 1f616a840eac02241c585d28e9dac8f19a297f39 "radv: emit a dummy ..." Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: make use of nir_lower_load_const_to_scalar()Timothy Arceri2018-09-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows NIR to CSE more operations. LLVM does this also so the impact is limited, however doing this in NIR allows other opts to make progress. For example in radeonsi more loops are unrolled in Civilization Beyond Earth. The actual pipeline-db stats are not overwhelming but even in the negatively affected shaders the NIR is clearly better. It just happens that the code shuffling and in some cases calls to max rather than a flt result in the final output from LLVM not giving as good numbers. However this is an incremental opt that further passes build off so the change should be made IMO. Totals from affected shaders: SGPRS: 20192 -> 20184 (-0.04 %) VGPRS: 19516 -> 19524 (0.04 %) Spilled SGPRs: 437 -> 444 (1.60 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1527444 -> 1522276 (-0.34 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 1018 -> 1016 (-0.20 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use the resolve compute path if dest uses multiple layersSamuel Pitoiset2018-09-211-1/+2
| | | | | | | | | | | | | | The hardware path doesn't support resolving layers, for both source and destination images. This fixes a reflection issue when MSAA is enabled which affects GTA V and probably DIRT3. CC: <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107786 Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Gregor Münch <gr.muench_at_gmail.com> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv,radv: Implement vkAcquireNextImage2Jason Ekstrand2018-09-211-10/+25
| | | | | | | | This was added as part of 1.1 but it's very hard to track exactly what extension added it. In any case, we should implement it. Cc: [email protected] Reviewed-by: Dave Airlie <[email protected]>
* radv: only enable shaderInt16 on GFX9+ and LLVM7+Samuel Pitoiset2018-09-211-1/+1
| | | | | | | | | | | The throughput is similar to 32-bit integers on GFX8 and AMDVLK does not expose 16-bit integers on pre Vega as well. On GFX9+, only LLVM 7+ has support. This fixes a bunch of CTS crashes on GFX9/LLVM 6. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix driver UUID SHA1 init.Bas Nieuwenhuizen2018-09-201-0/+2
| | | | | | | | | Was missing the init, found by Emil. Fixes: d17443a4593 "radv: Use build ID if available for cache UUID." CC: <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: use a 64-bit unsigned integer when allocating a descriptor poolSamuel Pitoiset2018-09-191-1/+1
| | | | | | | pool->size is a 64-bit unsigned integer too. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable VK_SUBGROUP_FEATURE_ARITHMETIC_BITSamuel Pitoiset2018-09-192-0/+2
| | | | | | | | All CTS pass on Polaris/Vega with LLVM 6, 7 and master, so I think it's safe to enable the feature. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not support blitting surfaces with depth and stencilSamuel Pitoiset2018-09-191-0/+4
| | | | | | | | | | | Fixes: dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.optimal_optimal_nearest And all friends that try to blit a surface with different depth and stencil formats. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "radv: fix descriptor pool allocation size"Bas Nieuwenhuizen2018-09-181-2/+1
| | | | | | | | | | This reverts commit 90819abb56f6b1a0cd4946b13b6caf24fb46e500. This logic was wrong, the original code is correct. The direct impact is that we allocate up to approximately a squared amount of memory compared to what we should allocate. Acked-by: Samuel Pitoiset <[email protected]>
* radv: implement VK_EXT_conservative_rasterizationSamuel Pitoiset2018-09-183-1/+63
| | | | | | | | | Only supported by GFX9+. The conservativeraster Sascha demo seems to work as expected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not re-create the sampler for every blits in CmdBlitImage()Samuel Pitoiset2018-09-181-15/+17
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow to force anisotropy via RADV_TEX_ANISOSamuel Pitoiset2018-09-182-2/+47
| | | | | | | Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use build ID if available for cache UUID.Bas Nieuwenhuizen2018-09-171-8/+35
| | | | | | | | | | | To get an useful UUID for systems that have a non-useful mtime for the binaries. I started using SHA1 to ensure we get reasonable mixing in the various possibilities and the various build id lengths. CC: <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: enable shaderInt16 capabilitySamuel Pitoiset2018-09-172-1/+2
| | | | | | | | | Not sure if this is all wired up. CTS does pass and the Tangrams demo works fine on Vega. There are corruption issues on Polaris but not sure if that related to 16-bit support. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_bitfield_reverse()Samuel Pitoiset2018-09-171-0/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_bit_count()Samuel Pitoiset2018-09-171-0/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_find_lsb()Samuel Pitoiset2018-09-171-2/+13
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_umsb()Samuel Pitoiset2018-09-171-2/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_isign()Samuel Pitoiset2018-09-171-5/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit constant values for zero and oneSamuel Pitoiset2018-09-172-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_bifield_reverse() helperSamuel Pitoiset2018-09-173-1/+26
| | | | | | | Are we missing 64-bit support? Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_bit_count() helperSamuel Pitoiset2018-09-173-6/+31
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>