| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This looks like a regression from df301237940 ("radv: use
ac_compute_surface"). Before that, the opt4Space addrlib flag was set
to true unless the image has FMASK (ac_compute_surface will similarly
only set that flag for images without FMASK).
This saves multiple gigabytes of VRAM on one of our games, and brings
its VRAM utilisation on RADV in line with AMDGPU-PRO and NVIDIA.
Signed-off-by: Alex Smith <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Coverity warned about dead code below, as meta_va was being shadowed.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Translate the NIR variables directly to LLVM instead of lowering to a
TGSI-style giant array of vec4's and then back to a variable. This
should fix indirect dereferences, make shared variables more tightly
packed, and make LLVM's alias analysis more precise. This should fix an
upcoming Feral title, which has a compute shader that was failing to
compile because the extra padding made us run out of LDS space.
v2: Combine the previous two patches into one, only use this for shared
variables for now until LLVM becomes smarter.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen>
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Alex Smith <[email protected]>
|
|
|
|
|
|
|
|
| |
During bring-up, this is often 0. Prevent automatic disablement of
ARB_timer_query and demotion of the OpenGL version to 3.2 by setting
a non-zero frequency. Print an error message instead.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This should lead to better MSAA performance on GFX9.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jason updated the Khronos spec to explicitly state that Wayland surfaces
must support VK_PRESENT_MODE_MAILBOX_KHR.
ANV did so since day one (back in 2015)
Cc: [email protected]
Cc: Bas Nieuwenhuizen <[email protected]>
Cc: Dave Airlie <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using DCC some clear values don't require a cmask eliminate
step. This patch adds support for black and black with alpha 1,
there are other values, but I don't have access to a comprehensive list.
This works by setting the cmask eliminate predicate when doing the
fast clear, and later when doing the cmask elimination making sure
the draws are predicated.
This increases the fps on Sascha Willems deferred.
Tonga: 580fps->670fps on a Tonga PRO card.
Polaris 730->850fps
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can only fast clear 128-bit images if the r/g/b channels
are the same, and we are using DCC.
For DCC we'll bail out on translate if this isn't true,
and we catch cmask clears explicitly.
v2: remove 64-bit block (Bas), add uint32 as well.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes the misspelling of ALIGNMENTS in addrlib.
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses addrlib to workout the tile swizzles according
to the surface index. It seems to produce the same values as
amdgpu-pro for the deferred test.
v2: don't apply swizzle to CMASK. the eg docs don't mention
it, and we clearly don't align cmask for that.
v3: disable surf index for dedicated images, as these will
most likely be shared, and I don't think the metadata has
space for this info in it yet.
v4: update for shareable images, rename combined_swizzle
to tile_swizzle
This gets the deferred demo from 730->950fps on my rx480.
(dcc cmask elim predication patches get it further)
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the Sascha Willems demos pick a D32/S8 format for the depth
buffer, then do a LOAD_OP_CLEAR/LOAD_OP_DONT_CARE on it, which means
we don't get to merge the undefined->depth and clear htile transitions.
This add the stencil aspect to the pending clears if there is a depth
clear pending and the stencil aspect is don't care.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
To not confuse apps in thinking it might be faster.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Andres Rodriguez <[email protected]>
|
|
|
|
|
|
|
|
| |
NV isn't valid for external images anymore.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Fixes: 6ddc64b93ea "radv: Add support for VK_KHR_dedicated_allocation."
Reviewed-by: Andres Rodriguez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This effectively reverts commit 43a171878bb4b5aedb36a. Technically,
VK_KHR_get_memory_requirements2 and VK_KHR_dedicated_allocation are
required for the KHR version but this at least restores the removed
functionality. This patch builds but has received zero testing.
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fished the SparseImage call out of the headers as the spec missed
the definition.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a cube image has VK_IMAGE_USAGE_STORAGE_BIT set, the type in an image
view's descriptor was set to a 2D array (and a few other fields adjusted
accordingly). This is correct when the image view is actually bound as a
storage image, but not when bound as a sampled image. In that case the
type should be set as a cube.
Fix by generating 2 sets of descriptors at view creation time for both
storage and non-storage usage, and then choose between them based on
descriptor type when writing descriptor sets.
v2: Generate storage descriptors for images with TRANSFER_DST, since
those may be used as storage images internally.
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This free was left in after dynamic descriptors were changed to not be
allocated separately from the descriptor set, and can cause a crash.
Fixes: 39644fa40a3 ("radv: Don't allocate dynamic descriptors separately")
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Since radv uses compute rings and we can't know when we are setting
up the shaders what ring they are to be used on, we should just use
the default xnack setting. This may be suboptimal in some places,
but if we hit a problem, we likely should try and address this
between llvm and mesa.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Rather than using 64k, use what addrlib returns as the base
alignment for vulkan allocations.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Figured out the clear value when we have a combined depth stencil
surface.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The NIR parameters are ordered "compare, data", matching GLSL, but both
the image and buffer LLVM intrinsics take them the other way around.
This is already handled correctly for SSBO atomics.
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
|
|
|
|
|
|
|
|
|
|
|
| |
For depth/stencil formats the surface layer allocates the
stencil separately, so we don't need to include it in the
bpe.
This reduces the side of d32s8 allocates to something closer to pro.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
RADV_PERFTEST=sisched
to enable it.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Use family, but only set xnack+ for gfx9.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Just more moving code around before adding things to it.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This just modifies the API to make it easier to add other flags
to target machine creation.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This doesn't get used yet, it just adds support to various PKT3
emissions to enable it later.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
According to Nicolai the SX can already start work when all
the position exports are done, so do those first.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
We have some cases where changing between depth and stencil only aspect
was causing hangs.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I'm not 100% sure this is all wired up but it looks like it is.
v2: actually enable extension.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
NIR always makes the shift amount 32 bits, but LLVM asserts if the two
sources aren't the same type. Zero-extend the shift amount to make LLVM
happy.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We implement the split opcodes, and tell NIR to lower the original ones.
The lowering to LLVM is a little more complicated, but NIR can optimize
the split ones a little better, and some NIR lowering passes that we
might want to use (particularly for doubles) emit the split ones.
This should fix pack/unpackDouble2x32, which seems like a bug since when
we enabled the Float64 capability. It will also fix pack/unpackInt2x32
when we enable the Int64 capability.
Fixes: 798ae37c ("radv: Enable Float64 support.")
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We apparently still used v16i8 ....
As radeonsi doesn't use it with LLVM version checks I don't think
we need them either.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
llvm doesn't need this workaround anymore.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The buffer intrinsics should be used instead of the image ones.
Signed-off-by: Alex Smith <[email protected]>
Cc: <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using cmpswap on an image, it was being trunctated to
lvm.amdgcn.image.atomic.cmpswa, with the coords type missing entirely.
v2: Add stable CC
CC: <[email protected]>
Reviewed-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Booleans in NIR are ~0 for true, b2i returns 0/1.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
LLVM has required an i1 here for a long time. llvm.ctlz.* was fixed in
commit edd23e06067 ("ac/llvm: fix various findMSB bugs").
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|