| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Our choices here are simply redundant as long as sin.flags is set
correctly.
(v2:
- remove unused function parameter)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2:
- use SI_CONTEXT_REG_OFFSET
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Viewport/scissor don't need to be updated for array textures.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If all layers are bound we can perform a fast color or depth clear
instead of iterating over all layers. This has the advantage
to avoid trashing the framebuffer for nothing if you we end up by
doing a fast clear when calling radv_clear_image_layer(), and
clearing all layers in one shot is obviously faster.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Currently only true if RADV_PERFTEST=dccmsaa is set.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
For further optimisations.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
The driver doesn't support DCC/CMASK for mipmapped textures.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves performance in Talos by about 15% (and significant improvements
in RotR and possibly other but did not bench with final patch) on
kernel 4.19 and earlier.
On 4.20+ a similar effect comes from
433ca054949a "drm/amdgpu: try allocating VRAM as power of two"
v2: Do not impact the alignment of the physical memory.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
CC: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Mirrors AMDVLK. Looks like if we go over the alignment of height
we actually start to change the addressing. Seems like the extra
miplevels actually work with this.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245
Fixes: f6cc15dccd5 "radv/gfx9: fix block compression texture views. (v2)"
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We used the layer count which results in an off by one error.
Not sure this really affects anything.
Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
We really need to refactor this...
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We don't need to flush anything before these two commands as well.
This is because they have to be externally synchronized, so the
app should have called CmdPipelineBarrier() prior to that and the
driver should have flushed the caches.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'post_flush' is only set to NULL for the normal clear path
(ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage()
are affected commands).
Because these two operations have to be externally synchronized
with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT,
it's useless to set those flags internallY.
VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded
by RADV_CMD_FLAG_INV_GLOBAL_L2.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CP DMA can only be busy when the driver copies buffers. The
only affected Vulkan commands are vkCmdCopyBuffer() and
vkCmdUpdateBuffer() (because we fallback to a copy depending on
a threshold). Clear operations are currently not concerned
because the driver always syncs after the last DMA operation.
Per the spec, these two operations have to be externally
synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Unnecessary as they allow the app to call vkCmdPipelineBarrier()
inside the render pass.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Just give back the same value for now.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We won't have a var to load from, so don't try to the processing
required if we don't need it.
This avoids crashes in:
dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes an assertion in SoTTR.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These force the index to be used in the instruction so we don't need the
workaround.
Totals:
SGPRS: 1321642 -> 1321802 (0.01 %)
VGPRS: 943664 -> 943788 (0.01 %)
Spilled SGPRs: 28468 -> 28480 (0.04 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 52415292 -> 52338932 (-0.15 %) bytes
LDS: 400 -> 400 (0.00 %) blocks
Max Waves: 233903 -> 233803 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 238344 -> 238504 (0.07 %)
VGPRS: 232732 -> 232856 (0.05 %)
Spilled SGPRs: 13125 -> 13137 (0.09 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 15752712 -> 15676352 (-0.48 %) bytes
LDS: 139 -> 139 (0.00 %) blocks
Max Waves: 31680 -> 31580 (-0.32 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.
This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).
BEFORE: 235 us
AFTER: 13 us
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This path will be eventually improved later but as it's only
used on SI (or with RADV_DEBUG=noibs), I'm not sure if that
matters much.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
The chained submission is the fastest path and it should now
be used more often than before. This removes some EOP events.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
DCC and FMASK also imply a fast-clear eliminate, so it should be
safe to reset the predicate unconditionally. We still only skip
FMASK or CMASK decompressions for now.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This is just a small cleanup.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
After doing a bunch of benchmarks, primitive binning helps
some games like The Talos Principle (+5%) or Serious Sam 2017
(+3%). For other titles, either it doesn't change anything or
it hurts very few (less than 1%).
This only affects GFX9.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit 647c2b90e96a9ab8571baf958a7c67c1e816911a. There was
one recently-introduced bug in ac for dvec3 loads, but the other test
failures were actually bugs in the tests. See
https://github.com/KhronosGroup/VK-GL-CTS/commit/9429e621c48848d224e35f30a1ae45a4a079922c
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
this helps reduce the overall code changes when a bit_size parameter is
added to nir_load_system_value
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vkpipeline-db results:
Totals from affected shaders:
SGPRS: 28400 -> 28576 (0.62 %)
VGPRS: 27916 -> 27692 (-0.80 %)
Spilled SGPRs: 140 -> 138 (-1.43 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1534456 -> 1520560 (-0.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 3541 -> 3582 (1.16 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Cc: 18.3 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
not needed anymore (probably since the tile_swizzle fix)
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
Helpful for debugging compiler backend problems: this allows us to
easily retrieve the LLVM IR from RenderDoc.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
HTILE is supported on these chips, not sure how I missed that.
This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used.
Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values")
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This avoids syncing the Micro Engine. This is only supported
for VI+ currently. There is probably a way for using
LOAD_CONTEXT_REG on previous chips but that could be done later.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Inclusive and exclusives scan are missing because older chips
don't have llvm.amdgcn.update.dpp.
This fixes crashes with dEQP-VK.subgroups.arithmetic.*.
CC: [email protected]
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|