summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac/surface/gfx9: let addrlib choose the preferred swizzle kindNicolai Hähnle2018-11-291-18/+4
| | | | | | | | | Our choices here are simply redundant as long as sin.flags is set correctly. (v2: - remove unused function parameter) Reviewed-by: Marek Olšák <[email protected]>
* radv: remove dependency on addrlib gfx9_enum.hNicolai Hähnle2018-11-293-9/+9
| | | | | | | v2: - use SI_CONTEXT_REG_OFFSET Reviewed-by: Dave Airlie <[email protected]>
* radv: drop few useless state changes when doing color/depth decompressionsSamuel Pitoiset2018-11-292-61/+41
| | | | | | | Viewport/scissor don't need to be updated for array textures. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused pending_clears param in the transition pathSamuel Pitoiset2018-11-291-11/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: optimize CmdClear{Color,DepthStencil}Image() for layered texturesSamuel Pitoiset2018-11-291-4/+86
| | | | | | | | | | | If all layers are bound we can perform a fast color or depth clear instead of iterating over all layers. This has the advantage to avoid trashing the framebuffer for nothing if you we end up by doing a fast clear when calling radv_clear_image_layer(), and clearing all layers in one shot is obviously faster. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: refactor the fast clear path for better re-useSamuel Pitoiset2018-11-291-38/+40
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: simplify a check in emit_fast_color_clear()Samuel Pitoiset2018-11-291-3/+1
| | | | | | | Currently only true if RADV_PERFTEST=dccmsaa is set. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_can_fast_clear_{color,depth}() helpersSamuel Pitoiset2018-11-291-44/+89
| | | | | | | For further optimisations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_image_view_can_fast_clear() helperSamuel Pitoiset2018-11-291-20/+27
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_image_can_fast_clear() helperSamuel Pitoiset2018-11-291-29/+39
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless check in emit_fast_color_clear()Samuel Pitoiset2018-11-291-3/+0
| | | | | | | The driver doesn't support DCC/CMASK for mipmapped textures. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Align large buffers to the fragment size.Bas Nieuwenhuizen2018-11-271-1/+5
| | | | | | | | | | | | | | | | Improves performance in Talos by about 15% (and significant improvements in RotR and possibly other but did not bench with final patch) on kernel 4.19 and earlier. On 4.20+ a similar effect comes from 433ca054949a "drm/amdgpu: try allocating VRAM as power of two" v2: Do not impact the alignment of the physical memory. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> CC: <[email protected]>
* radv: Clamp gfx9 image view extents to the allocated image extents.Bas Nieuwenhuizen2018-11-271-4/+2
| | | | | | | | | | | Mirrors AMDVLK. Looks like if we go over the alignment of height we actually start to change the addressing. Seems like the extra miplevels actually work with this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245 Fixes: f6cc15dccd5 "radv/gfx9: fix block compression texture views. (v2)" Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Fix opaque metadata descriptor last layer.Bas Nieuwenhuizen2018-11-261-1/+1
| | | | | | | | | We used the layer count which results in an off by one error. Not sure this really affects anything. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
* radv: ignore subpass self-dependencies for CreateRenderPass() tooSamuel Pitoiset2018-11-231-0/+10
| | | | | | | We really need to refactor this... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless sync before CmdClear{Color,DepthStencil}Image()Samuel Pitoiset2018-11-231-6/+2
| | | | | | | | | | We don't need to flush anything before these two commands as well. This is because they have to be externally synchronized, so the app should have called CmdPipelineBarrier() prior to that and the driver should have flushed the caches. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove useless sync after CmdClear{Color,DepthStencil}Image()Samuel Pitoiset2018-11-221-4/+0
| | | | | | | | | | | | | | | | | | 'post_flush' is only set to NULL for the normal clear path (ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage() are affected commands). Because these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT, it's useless to set those flags internallY. VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded by RADV_CMD_FLAG_INV_GLOBAL_L2. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: only sync CP DMA for transfer operations or bottom pipeSamuel Pitoiset2018-11-211-2/+6
| | | | | | | | | | | | | | CP DMA can only be busy when the driver copies buffers. The only affected Vulkan commands are vkCmdCopyBuffer() and vkCmdUpdateBuffer() (because we fallback to a copy depending on a threshold). Clear operations are currently not concerned because the driver always syncs after the last DMA operation. Per the spec, these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: ignore subpass self-dependenciesSamuel Pitoiset2018-11-211-0/+10
| | | | | | | | Unnecessary as they allow the app to call vkCmdPipelineBarrier() inside the render pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: handle cast derefsDave Airlie2018-11-211-0/+3
| | | | | | Just give back the same value for now. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: handle loading from shared pointersDave Airlie2018-11-211-9/+18
| | | | | | | | | | We won't have a var to load from, so don't try to the processing required if we don't need it. This avoids crashes in: dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: avoid casting pointers on bcsel and storesDave Airlie2018-11-213-3/+14
| | | | | | | | For variable pointers we really don't want to case the pointers to int without a good reason, just add a wrapper for bcsel loading and result storing. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix intrinsic name string size in visit_image_atomic()Samuel Pitoiset2018-11-201-1/+1
| | | | | | | | Fixes an assertion in SoTTR. Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use structured intrinsics instead of indexing workaround for GFX9.Bas Nieuwenhuizen2018-11-193-8/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: implement fast HTILE clears for depth or stencil only on GFX9Samuel Pitoiset2018-11-192-5/+269
| | | | | | | | | | | | | | | This allows to fast clear the depth part (or the stencil part) of a depth+stencil surface when HTILE is enabled. I didn't test on GFX8, so it's disabled currently. This gives a very nice boost, for example when clearing the depth aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster). BEFORE: 235 us AFTER: 13 us Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rewrite the condition that checks allowed depth/stencil valuesSamuel Pitoiset2018-11-191-8/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: check allowed fast HTILE clears a bit earlierSamuel Pitoiset2018-11-191-0/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_is_fast_clear_{depth,stencil}_allowed() helpersSamuel Pitoiset2018-11-191-2/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_get_htile_fast_clear_value() helperSamuel Pitoiset2018-11-191-3/+18
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unnecessary goto in the fast clear pathsSamuel Pitoiset2018-11-191-28/+24
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: remove the max IBs per submit limit for the sysmem pathSamuel Pitoiset2018-11-191-17/+29
| | | | | | | | | This path will be eventually improved later but as it's only used on SI (or with RADV_DEBUG=noibs), I'm not sure if that matters much. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: remove the max IBs per submit limit for the fallback pathSamuel Pitoiset2018-11-191-48/+55
| | | | | | | | The chained submission is the fastest path and it should now be used more often than before. This removes some EOP events. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always clear the FCE predicate after DCC/FMASK/CMASK decompressionsSamuel Pitoiset2018-11-191-5/+8
| | | | | | | | | DCC and FMASK also imply a fast-clear eliminate, so it should be safe to reset the predicate unconditionally. We still only skip FMASK or CMASK decompressions for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: tidy up radv_set_dcc_need_cmask_elim_pred()Samuel Pitoiset2018-11-195-15/+14
| | | | | | | This is just a small cleanup. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable primitive binning by defaultSamuel Pitoiset2018-11-162-7/+3
| | | | | | | | | | | | After doing a bunch of benchmarks, primitive binning helps some games like The Talos Principle (+5%) or Serious Sam 2017 (+3%). For other titles, either it doesn't change anything or it hurts very few (less than 1%). This only affects GFX9. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a debug option for disabling primitive binningSamuel Pitoiset2018-11-162-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT"Connor Abbott2018-11-161-4/+2
| | | | | | | | | This reverts commit 647c2b90e96a9ab8571baf958a7c67c1e816911a. There was one recently-introduced bug in ac for dvec3 loads, but the other test failures were actually bugs in the tests. See https://github.com/KhronosGroup/VK-GL-CTS/commit/9429e621c48848d224e35f30a1ae45a4a079922c Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: replace nir_load_system_value calls with appropiate builder functionsKarol Herbst2018-11-146-24/+24
| | | | | | | | | this helps reduce the overall code changes when a bit_size parameter is added to nir_load_system_value Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* radv: make use of nir_move_out_const_to_consumer()Timothy Arceri2018-11-141-0/+4
| | | | | | | | | | | | | | | | | | vkpipeline-db results: Totals from affected shaders: SGPRS: 28400 -> 28576 (0.62 %) VGPRS: 27916 -> 27692 (-0.80 %) Spilled SGPRs: 140 -> 138 (-1.43 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1534456 -> 1520560 (-0.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3541 -> 3582 (1.16 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9Samuel Pitoiset2018-11-132-3/+21
| | | | | | | Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLESamuel Pitoiset2018-11-131-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: binding streamout buffers doesn't change context regsSamuel Pitoiset2018-11-131-2/+7
| | | | | | Cc: 18.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make use of num_good_cu_per_sh in si_emit_graphics() tooSamuel Pitoiset2018-11-121-2/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up setting partial_es_wave for distributed tess on VISamuel Pitoiset2018-11-121-7/+4
| | | | | | | | Only needed when the pipeline actually uses tessellation. I don't think that changes anything, except improving readability. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: cleanup and document a Hawaii bug with offchip buffersSamuel Pitoiset2018-11-121-9/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/surface: remove the overallocation workaround for Vega12Marek Olšák2018-11-091-4/+0
| | | | | | not needed anymore (probably since the tile_swizzle fix) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: include LLVM IR in the VK_AMD_shader_info "disassembly"Nicolai Hähnle2018-11-091-0/+1
| | | | | | | Helpful for debugging compiler backend problems: this allows us to easily retrieve the LLVM IR from RenderDoc. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix GPU hangs when loading depth/stencil clear values on SI/CIKSamuel Pitoiset2018-11-081-5/+19
| | | | | | | | HTILE is supported on these chips, not sure how I missed that. This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used. Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values") Signed-off-by: Samuel Pitoiset <[email protected]>
* radv: use LOAD_CONTEXT_REG when loading fast clear valuesSamuel Pitoiset2018-11-082-19/+27
| | | | | | | | | This avoids syncing the Micro Engine. This is only supported for VI+ currently. There is probably a way for using LOAD_CONTEXT_REG on previous chips but that could be done later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: only expose VK_SUBGROUP_FEATURE_ARITHMETIC_BIT for VI+Samuel Pitoiset2018-11-081-1/+1
| | | | | | | | | | | Inclusive and exclusives scan are missing because older chips don't have llvm.amdgcn.update.dpp. This fixes crashes with dEQP-VK.subgroups.arithmetic.*. CC: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>