summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* amd/common: use llvm.amdgcn.wqm for explicit derivativesNicolai Hähnle2018-05-041-0/+7
| | | | | | | To comply with an upcoming change in LLVM, see https://reviews.llvm.org/D46051 Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* compiler/lower_64bit_packing: rename the pass to be more genericIago Toral Quiroga2018-05-031-1/+1
| | | | | | It can do 32-bit packing too now. Reviewed-by: Jason Ekstrand <[email protected]>
* radv: UseEnumerateInstanceVersion for the default version.Bas Nieuwenhuizen2018-05-021-1/+1
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Don't check the incoming apiVersion on CreateInstance.Bas Nieuwenhuizen2018-05-021-9/+0
| | | | | | | | | This fixes dEQP-VK.api.device_init.create_instance_invalid_api_version CC: 18.1 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Allow vkEnumerateInstanceVersion ProcAddr without instance.Bas Nieuwenhuizen2018-05-021-1/+1
| | | | | | | | | | | | Apparently the somewhere between 1.1.70 and 1.1.73 the loader started depending on this. The loader then creates a 1.0 instance, which gets into funny situation because we have a 1.1 device. No idea how to do line wrapping in Mako though, my random guesses did not work. CC: 18.1 <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: fix multisample image copiesMatthew Nicholls2018-05-022-110/+196
| | | | | | | | | | | | | | | | | | | Previously before fb077b0728, the LOD parameter was being used in place of the sample index, which would only copy the first sample to all samples in the destination image. After that multisample image copies wouldn't copy anything from my observations. This fixes some copy_and_blit CTS tests. v3.1: - set lod to 0 for nir_txf_ms (Samuel) v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel) - updated commit description (Samuel) Fix this properly by copying each sample in a separate radv_CmdDraw and using a pipeline with the correct rasterizationSamples for the destination image. Cc: 18.0 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable out-of-order rasterization by defaultSamuel Pitoiset2018-05-022-2/+3
| | | | | | | | | | | As the implementation is conservative, we can now enable it by default. It can be disabled with RADV_DEBUG=nooutoforder. Don't expect much more than 1% of improvements, but the gain seems consistent. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: only disable out-of-order rast for perfect occlusion queriesSamuel Pitoiset2018-05-022-10/+12
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: compute the number of subpass attachments correctlySamuel Pitoiset2018-05-011-2/+2
| | | | | | | | | Only count color attachments twice if resolves are used, also account for the depth stencil attachment if present. Cc: 18.0 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: set fmask_surf_index on fmask surfaces.Dave Airlie2018-05-021-1/+3
| | | | | | | | | This is needed for gfx9 and later for all fmask surface index. (Mentioned by Marek on irc) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/winsys: fix leaking resources from bo's imported by fdAndres Rodriguez2018-04-301-0/+1
| | | | | | | | | | | | A bo's ref_count was not being initialized when imported from an fd. Therefore, we would fail to free the resource during VkFreeMemory(). This patch fixes applications like hifi VR in threaded mode, which perform frequent imports/releases of IPC shared memory. Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> CC: 18.0 18.1 <[email protected]>
* ac/nir: expand 64-bit vec3 loads to fix shuffling.Dave Airlie2018-05-011-0/+5
| | | | | | | | | | | | If loading 64-bit vec3 values, a 4 component load would be followed by a 2 component load and the resulting shuffle would fail as it requires 2 4 components. This just expands the second results vector out to 4 components. This fixes 100 CTS tests: dEQP-VK.spirv_assembly.type.vec3.*64* Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: add triple into si_compilerMarek Olšák2018-04-273-3/+9
| | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface: handle DCC subresource fast clear restriction on VIMarek Olšák2018-04-271-1/+20
| | | | | | | v2: require the previous level to be clearable for determining whether the last unaligned level is clearable Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: fix texture query LOD for 1D textures on GFX9Samuel Pitoiset2018-04-271-0/+8
| | | | | | | | | | | 1D textures are allocated as 2D which means we only need one coordinate for texture query LOD. Fixes: 625dcbbc456 ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: set ac_surf_info::num_channels correctlySamuel Pitoiset2018-04-262-1/+8
| | | | | | | | | | | | num_channels has been introduced since "ac/surface: don't set the display flag for obviously unsupported cases". Based on RadeonSI. Fixes: e29facff315 ("ac/surface: don't set the display flag for obviously unsupported cases (v2)") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix DCC enablement since partial MSAA implementationSamuel Pitoiset2018-04-261-6/+6
| | | | | | | | | | | | | dcc_msaa_allowed is always false on GFX9+ and only true on VI if RADV_PERFTEST=dccmsaa is set. This means DCC was disabled in some situations where it should not. This is likely going to fix a performance regression. Fixes: 2f63b3dd09 ("radv: enable DCC for MSAA 2x textures on VI under an option") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/radv/radeonsi: refactor harvest config register getters.Dave Airlie2018-04-243-101/+124
| | | | | | | | This refactors the code out to share it between radv and radeonsi. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: only set raster_config_1 outside the index registers.Dave Airlie2018-04-241-15/+16
| | | | | | | | | | This follows what radeonsi does. Ported from radeonsi: radeonsi: emit PA_SC_RASTER_CONFIG_1 only once Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/radv/radeonsi: refactor max simd waves into common code.Dave Airlie2018-04-242-11/+17
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/radv/radeonsi: refactor raster_config default values getters.Dave Airlie2018-04-243-83/+99
| | | | | | | This just makes this common code between the two drivers. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use common gs_table_depth code.Dave Airlie2018-04-241-30/+2
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/info: move gs table depth to common code.Dave Airlie2018-04-242-0/+34
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx9: don't use gs_table_depth on gfx9.Dave Airlie2018-04-242-5/+6
| | | | | | | | Missed this on initial radeonsi port, we shouldn't use this value on gfx9, but also in gfx8 only for when we have a geom shader. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac: fix the number of coordinates for ac_image_get_lod and arraysSamuel Pitoiset2018-04-231-0/+14
| | | | | | | | | | | | | This fixes crashes for the following CTS: dEQP-VK.glsl.texture_functions.query.texturequerylod.* Cubemaps are the same as 2D arrays. Fixes: 625dcbbc456 ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: teach get_ac_sampler_dim() about subpass attachmentsSamuel Pitoiset2018-04-231-17/+7
| | | | | | | | Suggested by Nicolai. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/nir: add missing round_slice for 1D arraysSamuel Pitoiset2018-04-231-0/+7
| | | | | | | | | | | | | This fixes a bunch of CTS fails with 1D arrays: dEQP-VK.glsl.texture_functions.texture*.sampler1darray_* Fixes: 625dcbbc456 ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: advertise 8 bits of subpixel precision for viewportsJózef Kucia2018-04-231-1/+1
| | | | | | This is what radeonsi does. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: mark const structs as extern in header file to avoid lto damageDave Airlie2018-04-231-3/+3
| | | | | | | | | | | | | The copr repo from che was using LTO and he reported radv broke recently with it. When testing with lto builds here I noticed that we weren't seeing any instance extensions reported. It appears LTO was treating the const without extern as an empty struct, this is possibly a gcc bug, but we can work around it just by marking these with extern. Acked-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: fix image dimension for subpass attachmentsSamuel Pitoiset2018-04-201-3/+15
| | | | | | | | | | | For subpass attachments we need one more coordinate with the layer, so make them array types. This fixes a bunch of CTS fails with RADV. Fixes: 24fb3e6aa1 ("ac/nir: use ac_build_image_opcode for image intrinsics") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Mark GTT memory as device local for APUs.Bas Nieuwenhuizen2018-04-201-3/+5
| | | | | | | | Otherwise a lot of games complain about not having enough memory, and it is sort of local so this seems reasonable to me. CC: 18.0 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/winsys: allow to submit up to 4 IBs for chips without chainingSamuel Pitoiset2018-04-201-50/+168
| | | | | | | | | | | | | | | | | | | The SI family doesn't support chaining which means the maximum size in dwords per CS is limited. When that limit was reached we failed to submit the CS and the application crashed. This patch allows to submit up to 4 IBs which is currently the limit, but recent amdgpu supports more than that. Please note that we can reach the limit of 4 IBs per submit but currently we can't improve that. The only solution is to upgrade libdrm. That will be improved later but for now this should fix crashes on SI or when using RADV_DEBUG=noibs. Fixes: 36cb5508e89 ("radv/winsys: Fail early on overgrown cs.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: handle nir_intrinsic_load_first_vertex like base_vertexSamuel Pitoiset2018-04-201-2/+2
| | | | | | | | This fixes a ton of CTS crashes. Fixes: c366f422f0 ("nir: Offset vertex_id by first_vertex instead of base_vertex") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: allow local BOs on APUsSamuel Pitoiset2018-04-201-1/+2
| | | | | | | | | Ported from RadeonSI. Local BOs ignore BO priorities, and we don't need those on APUs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use a global BO list only for VK_EXT_descriptor_indexingSamuel Pitoiset2018-04-203-9/+34
| | | | | | | | | | | | Maintaining two different paths is annoying but this gets rid of the performance regression introduced by the global BO list. We might find a better solution in the future, but for now just keeps two paths. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "radv: Don't store buffer references in the descriptor set."Samuel Pitoiset2018-04-205-13/+82
| | | | | | | | | | | | | | | | In order to reduce a performance regression introduced by 4b13fe55a4 ("radv: Keep a global BO list for VkMemory."), we are going to maintain two different paths. One when VK_EXT_descriptor_indexing is enabled by the application because we need to have a global BO list, and one (the old one) when it's not enabled. With Talos on Polaris, the global BO list reduces performance by 10% which is too much for me. This reverts commit ab6cadd3ecc7fbdd9079808b407674e0b19c52f0. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: use ac_build_image_opcode for image intrinsicsNicolai Hähnle2018-04-203-140/+78
| | | | | | So that we'll use the dimension-aware intrinsics in the future. Acked-by: Marek Olšák <[email protected]>
* radeonsi: generate image load/store/atomic ops using ac_build_image_opcodeNicolai Hähnle2018-04-202-32/+110
| | | | | | In preparation of dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <[email protected]>
* amd/common: pass address components individually to ac_build_image_intrinsicNicolai Hähnle2018-04-203-264/+216
| | | | | | This is in preparation for the new image intrinsics. Acked-by: Marek Olšák <[email protected]>
* amd/common: pass new enum ac_image_dim to ac_build_image_opcodeNicolai Hähnle2018-04-203-11/+66
| | | | | | | This is in preparation for the new, dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <[email protected]>
* ac/nir: fix atomic compare-and-swapNicolai Hähnle2018-04-201-0/+1
| | | | | | | | | The LLVM instruction returns { i32, i1 }, where the i1 indicates success. We're only interested in the first part, which is the loaded value. Fixes dEQP-GLES31.functional.compute.shared_var.atomic.compswap.* Reviewed-by: Timothy Arceri <[email protected]>
* radv: Add Vega M support.Bas Nieuwenhuizen2018-04-194-2/+11
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add bound checking workaround for dynamic buffers.Bas Nieuwenhuizen2018-04-193-1/+5
| | | | | | | I have seen a few applications and games do the dynamic buffer bounds incorrectly, this make it easier to work around, e.g. for debugging. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: enable DCC for MSAA 2x textures on VI under an optionSamuel Pitoiset2018-04-194-1/+13
| | | | | | | | | | | | | | | | This can be enabled with RADV_PERFTEST=dccmsaa. DCC for MSAA textures is actually not as easy to implement. It looks like there is some corner cases. I will improve support incrementally. Vega support, as well as Polaris improvements, will be added later. No CTS changes on Polaris using RADV_DEBUG=zerovram and RADV_PERFTEST=dccmsaa. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: decompress DCC for multisampled source images before resolvingSamuel Pitoiset2018-04-194-4/+18
| | | | | | | | | Multisampled source images (ie. color attachments) can be now DCC compressed, so the driver needs to perform a DCC decompression pass before resolving Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a workaround for fast clears with DCC and MSAA texturesSamuel Pitoiset2018-04-191-0/+9
| | | | | | | | This should be fixed at some point in order to improve performance. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate CMASK for DCC fast clear with MSAASamuel Pitoiset2018-04-191-0/+7
| | | | | | | | CMASK is required because it should be cleared to 0xCCCCCCCC for MSAA textures. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement fast color clear for DCC with MSAASamuel Pitoiset2018-04-191-1/+16
| | | | | | | | When DCC is enabled with MSAA textures, CMASK should be cleared to 0xCCCCCCCC. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make sure to sync after resolving using the compute pathSamuel Pitoiset2018-04-191-0/+3
| | | | | | | | | | | | | | | This fixes some random CTS failures: dEQP-VK.renderpass.multisample.*. Performing a fast-clear eliminate is still useless, but it seems that we need to sync. Found while running CTS with RADV_DEBUG=zerovram. Fixes: 56a171a499c ("radv: don't fast-clear eliminate after resolving a subpass with compute") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump the SHA1 of SPIRV in the hang reportSamuel Pitoiset2018-04-191-1/+8
| | | | | | | | Might be useful for debugging purposes, especially when we want to replace a shader on the fly. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>