aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Clamp gfx9 image view extents to the allocated image extents.Bas Nieuwenhuizen2018-11-271-4/+2
| | | | | | | | | | | Mirrors AMDVLK. Looks like if we go over the alignment of height we actually start to change the addressing. Seems like the extra miplevels actually work with this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245 Fixes: f6cc15dccd5 "radv/gfx9: fix block compression texture views. (v2)" Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix opaque metadata descriptor last layer.Bas Nieuwenhuizen2018-11-261-1/+1
| | | | | | | | | We used the layer count which results in an off by one error. Not sure this really affects anything. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
* radv: ignore subpass self-dependencies for CreateRenderPass() tooSamuel Pitoiset2018-11-231-0/+10
| | | | | | | We really need to refactor this... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless sync before CmdClear{Color,DepthStencil}Image()Samuel Pitoiset2018-11-231-6/+2
| | | | | | | | | | We don't need to flush anything before these two commands as well. This is because they have to be externally synchronized, so the app should have called CmdPipelineBarrier() prior to that and the driver should have flushed the caches. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless sync after CmdClear{Color,DepthStencil}Image()Samuel Pitoiset2018-11-221-4/+0
| | | | | | | | | | | | | | | | | | 'post_flush' is only set to NULL for the normal clear path (ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage() are affected commands). Because these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT, it's useless to set those flags internallY. VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded by RADV_CMD_FLAG_INV_GLOBAL_L2. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: only sync CP DMA for transfer operations or bottom pipeSamuel Pitoiset2018-11-211-2/+6
| | | | | | | | | | | | | | CP DMA can only be busy when the driver copies buffers. The only affected Vulkan commands are vkCmdCopyBuffer() and vkCmdUpdateBuffer() (because we fallback to a copy depending on a threshold). Clear operations are currently not concerned because the driver always syncs after the last DMA operation. Per the spec, these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: ignore subpass self-dependenciesSamuel Pitoiset2018-11-211-0/+10
| | | | | | | | Unnecessary as they allow the app to call vkCmdPipelineBarrier() inside the render pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: handle cast derefsDave Airlie2018-11-211-0/+3
| | | | | | Just give back the same value for now. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: handle loading from shared pointersDave Airlie2018-11-211-9/+18
| | | | | | | | | | We won't have a var to load from, so don't try to the processing required if we don't need it. This avoids crashes in: dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: avoid casting pointers on bcsel and storesDave Airlie2018-11-213-3/+14
| | | | | | | | For variable pointers we really don't want to case the pointers to int without a good reason, just add a wrapper for bcsel loading and result storing. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix intrinsic name string size in visit_image_atomic()Samuel Pitoiset2018-11-201-1/+1
| | | | | | | | Fixes an assertion in SoTTR. Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use structured intrinsics instead of indexing workaround for GFX9.Bas Nieuwenhuizen2018-11-193-8/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: implement fast HTILE clears for depth or stencil only on GFX9Samuel Pitoiset2018-11-192-5/+269
| | | | | | | | | | | | | | | This allows to fast clear the depth part (or the stencil part) of a depth+stencil surface when HTILE is enabled. I didn't test on GFX8, so it's disabled currently. This gives a very nice boost, for example when clearing the depth aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster). BEFORE: 235 us AFTER: 13 us Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rewrite the condition that checks allowed depth/stencil valuesSamuel Pitoiset2018-11-191-8/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: check allowed fast HTILE clears a bit earlierSamuel Pitoiset2018-11-191-0/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_is_fast_clear_{depth,stencil}_allowed() helpersSamuel Pitoiset2018-11-191-2/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_get_htile_fast_clear_value() helperSamuel Pitoiset2018-11-191-3/+18
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unnecessary goto in the fast clear pathsSamuel Pitoiset2018-11-191-28/+24
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: remove the max IBs per submit limit for the sysmem pathSamuel Pitoiset2018-11-191-17/+29
| | | | | | | | | This path will be eventually improved later but as it's only used on SI (or with RADV_DEBUG=noibs), I'm not sure if that matters much. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: remove the max IBs per submit limit for the fallback pathSamuel Pitoiset2018-11-191-48/+55
| | | | | | | | The chained submission is the fastest path and it should now be used more often than before. This removes some EOP events. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always clear the FCE predicate after DCC/FMASK/CMASK decompressionsSamuel Pitoiset2018-11-191-5/+8
| | | | | | | | | DCC and FMASK also imply a fast-clear eliminate, so it should be safe to reset the predicate unconditionally. We still only skip FMASK or CMASK decompressions for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: tidy up radv_set_dcc_need_cmask_elim_pred()Samuel Pitoiset2018-11-195-15/+14
| | | | | | | This is just a small cleanup. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable primitive binning by defaultSamuel Pitoiset2018-11-162-7/+3
| | | | | | | | | | | | After doing a bunch of benchmarks, primitive binning helps some games like The Talos Principle (+5%) or Serious Sam 2017 (+3%). For other titles, either it doesn't change anything or it hurts very few (less than 1%). This only affects GFX9. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a debug option for disabling primitive binningSamuel Pitoiset2018-11-162-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT"Connor Abbott2018-11-161-4/+2
| | | | | | | | | This reverts commit 647c2b90e96a9ab8571baf958a7c67c1e816911a. There was one recently-introduced bug in ac for dvec3 loads, but the other test failures were actually bugs in the tests. See https://github.com/KhronosGroup/VK-GL-CTS/commit/9429e621c48848d224e35f30a1ae45a4a079922c Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: replace nir_load_system_value calls with appropiate builder functionsKarol Herbst2018-11-146-24/+24
| | | | | | | | | this helps reduce the overall code changes when a bit_size parameter is added to nir_load_system_value Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* radv: make use of nir_move_out_const_to_consumer()Timothy Arceri2018-11-141-0/+4
| | | | | | | | | | | | | | | | | | vkpipeline-db results: Totals from affected shaders: SGPRS: 28400 -> 28576 (0.62 %) VGPRS: 27916 -> 27692 (-0.80 %) Spilled SGPRs: 140 -> 138 (-1.43 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1534456 -> 1520560 (-0.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3541 -> 3582 (1.16 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9Samuel Pitoiset2018-11-132-3/+21
| | | | | | | Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLESamuel Pitoiset2018-11-131-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: binding streamout buffers doesn't change context regsSamuel Pitoiset2018-11-131-2/+7
| | | | | | Cc: 18.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make use of num_good_cu_per_sh in si_emit_graphics() tooSamuel Pitoiset2018-11-121-2/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up setting partial_es_wave for distributed tess on VISamuel Pitoiset2018-11-121-7/+4
| | | | | | | | Only needed when the pipeline actually uses tessellation. I don't think that changes anything, except improving readability. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: cleanup and document a Hawaii bug with offchip buffersSamuel Pitoiset2018-11-121-9/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/surface: remove the overallocation workaround for Vega12Marek Olšák2018-11-091-4/+0
| | | | | | not needed anymore (probably since the tile_swizzle fix) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: include LLVM IR in the VK_AMD_shader_info "disassembly"Nicolai Hähnle2018-11-091-0/+1
| | | | | | | Helpful for debugging compiler backend problems: this allows us to easily retrieve the LLVM IR from RenderDoc. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix GPU hangs when loading depth/stencil clear values on SI/CIKSamuel Pitoiset2018-11-081-5/+19
| | | | | | | | HTILE is supported on these chips, not sure how I missed that. This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used. Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values") Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: use LOAD_CONTEXT_REG when loading fast clear valuesSamuel Pitoiset2018-11-082-19/+27
| | | | | | | | | This avoids syncing the Micro Engine. This is only supported for VI+ currently. There is probably a way for using LOAD_CONTEXT_REG on previous chips but that could be done later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: only expose VK_SUBGROUP_FEATURE_ARITHMETIC_BIT for VI+Samuel Pitoiset2018-11-081-1/+1
| | | | | | | | | | | Inclusive and exclusives scan are missing because older chips don't have llvm.amdgcn.update.dpp. This fixes crashes with dEQP-VK.subgroups.arithmetic.*. CC: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: disable conditional rendering for vkCmdCopyQueryPoolResults()Samuel Pitoiset2018-11-071-0/+10
| | | | | | | | | VK_EXT_conditional_rendering says that copy commands should not be affected by conditional rendering. Cc: 18.2 18.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: allocate enough space in CS when copying query results with computeSamuel Pitoiset2018-11-071-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac/nir_to_llvm: fix b2f for f64Timothy Arceri2018-11-071-3/+12
| | | | | | Fixes: d7e0d47b9de3 ("nir: Add a bunch of b2[if] optimizations") Reviewed-by: Dave Airlie <[email protected]>
* radv: more use of radv_cp_wait_mem()Samuel Pitoiset2018-11-051-22/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: replace si_emit_wait_fence() with radv_cp_wait_mem()Samuel Pitoiset2018-11-054-10/+14
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: add missing TFB queries support to CmdCopyQueryPoolsResults()Samuel Pitoiset2018-11-052-0/+278
| | | | | | | Cc: 18.3 <[email protected]> Fixes: b4eb029062a ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove useless sync after copying query results with computeSamuel Pitoiset2018-11-051-4/+0
| | | | | | | | | | | | | | | The spec says: "vkCmdCopyQueryPoolResults is considered to be a transfer operation, and its writes to buffer memory must be synchronized using VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before using the results." VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. So, it's useless to set those flags internally. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* android: radv: add libmesa_git_sha1 static dependencyMauro Rossi2018-11-031-1/+2
| | | | | | | | | | | | | | libmesa_git_sha1 whole static dependency is added to get git_sha1.h header and avoid following building error: external/mesa/src/amd/vulkan/radv_device.c:46:10: fatal error: 'git_sha1.h' file not found ^ 1 error generated. Fixes: 9d40ec2cf6 ("radv: Add support for VK_KHR_driver_properties.") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* amd: Make vgpr-spilling depend on llvm versionJan Vesely2018-11-021-1/+2
| | | | | | | | | The option was removed in LLVM r345763 Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: fix begin/end transform feedback with 0 counter buffers.Dave Airlie2018-11-021-12/+16
| | | | | | | | | | If the user gives 0 counterBuffers then the driver should still enable transform feedback on all targets. This changes the driver to always enable xfb, and use counter buffers where one is defined for the target in question. Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: apply xfb buffer offset at buffer binding time not later. (v2)Dave Airlie2018-11-021-2/+4
| | | | | | | | | | | In order to handle pause/resume properly, the offset should be added to the buffer binding not to the begin/end paths. v2: don't add offset to size Fixes ext_transform_feedback-alignment* under zink Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: set PA_SU_PRIM_FILTER_CNTL optimallySamuel Pitoiset2018-11-011-0/+9
| | | | | | | | Ported from RadeonSI. It's always TRUE for CIK+ because RADV doesn't support 16 samples. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>