| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.
v2:
- We already create a null rt when we don't have any, so reuse that
for this case (Jason)
- Simplify the code a bit (Iago)
v3:
- Take alpha to coverage from the key and don't tie this to depth-only
rendering only, we want the same behavior if we have multiple render
targets but the one at location 0 is not used. (Jason).
- Rewrite commit message (Iago)
v4:
- Make sure we take into account the array length of the shader outputs,
which we were no handling correctly either and make sure we also
create null render targets for any invalid array entries too.
v5:
- Simplify removal of unused outputs by using rt_used[] so we don't have
to special case alpha to coverage there too.
Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Both drivers are feature-complete and should be running more-or-less at
perf at this point. Drop the warning.
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found while running Talos Principle.
As far as I can tell running a draw call with a pipeline having push
constants without the application having called vkCmdPushConstants
gives undefined push constant values.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
It is an input but it comes in as part of the shader payload and doesn't
count towards the limits.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.
Fixes: 6e230d7607f "anv: Implement VK_EXT_descriptor_indexing"
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 0d46e404 ("anv: limit URB reconfigurations when using
blorp") we tried to limit the number of URB reconfiguration by
checking if the last allocation is large enough to fit the blorp
dispatch.
We used the last bound pipeline to compare the allocation. The problem
with this is that the pipeline is bound but its commands might not
have been emitted into the command buffer yet.
Let's just revert commit 0d46e404677264bfb12ada15290e39c10a5eb455
since it didn't seem to yield any performance improvement.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 0d46e404 ("anv: limit URB reconfigurations when using blorp")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110535
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VK_ANDROID_external_memory_android_hardware_buffer requires this
extension. It is safe to enable it since currently aux usage is
disabled for ahw buffers.
Fixes following dEQP extension dependency test on Android:
dEQP-VK.api.info.device#extensions
Cc: <[email protected]>
Signed-off-by: Tapani Pälli <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 105002bd2d617, we fixed a memory leak bug where we weren't properly
destroying descriptor when destroying/resetting a descriptor pool.
However, the only real leak that happened was that we we take a
reference to the descriptor set layout in the descriptor set and we
weren't dropping our reference. Everything else in the descriptor set
is tied to the pool itself and doesn't need to be freed on a per-set
basis. This commit changes the destroy/reset functions to only bother
walking the list of sets to unref the layouts and otherwise we just
assume that the whole-pool destroy/reset takes care of the rest.
Now that we're doing more non-trivial things with descriptor sets such
as allocating things with util_vma_heap, per-set destruction is starting
to show up on perf traces. This takes reset back to where it's supposed
to be as a cheap whole-pool operation.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In c520f4dec9c, we chose to align the sizes of descriptor set buffers to
32 bytes. We have to align the descriptor set buffer to 32B so that
it's valid for using with push constants. We align the size as well so
we don't leave lots of holes with util_vma_heap_alloc. Unfortunately,
we were only aligning it for alloc and not for free so we were still
creating piles of holes when we delete descriptor sets. This causes
terrible perf for the allocator once we've deleted piles of descriptor
sets.
This commit reworks the code so that we align the descriptor set buffer
size to 32B for both alloc and free. The result is that it takes the
new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds.
Fixes: c520f4dec9c "anv: Add a concept of a descriptor buffer"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of aligning and then taking inline uniforms into account, we
need to take inline uniforms into account and then align to a page.
Otherwise, we may not be aligned to a page and allocation may fail.
Fixes: 43f40dc7cb2 "anv: Implement VK_EXT_inline_uniform_block"
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Fixes: 7bb34ecff98 "anv: release memory allocated by bo_heap when..."
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
anv_descriptor_pool_free_set is called on the clean-up path of
anv_descriptor_set_create and the set may not have been added to the
pool's list of sets yet. While we're here, we move adding it to that
list into set_create for symmetry.
Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dri options are optional. When the dri options are not provided
the WSI will not use adaptive sync.
FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.
So then the remaining question is: why do this in the WSI?
It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.
The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.
Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Doesn't fix anything but it's not the right function prototype.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 673f33c77dd765 ("anv: Implement CmdBegin/EndQueryIndexed")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Sagar Ghuge <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop. However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I missed this on the first go round. The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.
Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space. This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of setting it manually, call the helper. When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly. This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:
1. It lets us support a virtually unbounded number of SSBO bindings.
2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
implement before because those atomic messages are only available
in the bindless A64 form.
3. It's way better than messing around with bindless handles for SSBOs
which is the only other option for VK_EXT_descriptor_indexing.
4. It's more future looking, maybe? At the least, this is what NVIDIA
does (they don't have binding based SSBOs at all). This doesn't a
priori mean it's better, it just means it's probably not terrible.
The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach. When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us. It also avoids some potentially expensive
64-bit integer calculations.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless. There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit just sorts the bindings by how often they're used vs the
array size of the binding. This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.
Fixes: ddc40691221 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is really where they belong; not push constants. The one downside
here is that we can't push them anymore for compute shaders. However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders. This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We spend a lot of time in the driver adding things to hash sets to track
residency. The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them. In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency. Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order. However, without relocations, the
overhead of that is pretty small.
If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point. We're better off swapping
out other apps and just letting ours run a whole frame.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If the last graphics pipeline bound to the command buffer has enough
space in its VS URB entries for Blorp then avoid reconfiguring the URB
partitions.
v2: s/0/MESA_SHADER_VERTEX/ (Caio)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 843775bab78a6b ("anv: Rework fences")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Ever since 48ed2a7bb009618ed, we've had one at the top of the function.
Reviewed-by: Caio Marcelo de Oliveira Filho [email protected]
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.
[1] : https://patchwork.freedesktop.org/series/59494/
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Merge shaderFloat16 and shaderInt8 enablement into a single patch.
- Merge extension enable.
Reviewed-by: Jason Ekstrand <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- Merge Float16 and Int8 capabilities into a single patch (Jason)
- Merged patch that enabled SPIR-V front-end checks for these caps
(except for Int8, which was already merged)
v3:
- Keep capabilities sorted (Jason)
v4:
- SpvCapabilityFloat16 support already added in master (Juan)
Reviewed-by: Jason Ekstrand <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a race condition where anv_gen_files are executed before
genxml files, which causes a build failure
v2: add dependency on idep_genxml (Lionel)
Fixes: d1992255bb29054fa51763376d125183a9f602f
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().
This change fixes glsl_type memory leaks we have with anv driver.
v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
rename helper functions (Jason)
v3: move init, destroy to happen on GL context init and destroy
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names. We may as well keep with the times.
Acked-by: Dave Airlie <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pipeline statistics queries should not count BLORP's rectangles.
(23) How do operations like Clear, TexSubImage, etc. affect the
results of the newly introduced queries?
DISCUSSION: Implementations might require "helper" rendering
commands be issued to implement certain operations like Clear,
TexSubImage, etc.
RESOLVED: They don't. Only application submitted rendering
commands should have an effect on the results of the queries.
Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 628c9ca9089789 I forgot to apply the same -4Gb of the high address
of the high heap VMA. This was previously computed in the
HIGH_HEAP_MAX_ADDRESS.
Many thanks to James for pointing this out.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reported-by: Xiong, James <[email protected]>
Fixes: 628c9ca9089789 ("anv: store heap address bounds when initializing physical device")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
We were always programming it with the Broadwell convention which is too
large by a factor of two on Haswell and just plain wrong on IVB and BYT.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
|