| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Found by inspection. However, I expect it fixes real bugs when using
blorp from Vulkan on little-core platforms.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
This fixes a some rendering corruption in The Talos Principle
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
|
| |
Found by inspection
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
This is similar to clflush_range except that it puts the mfence on the
other side to ensure caches are flushed prior to reading.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
| |
Fixes: 4c9dec80ede ("anv: Get rid of the ANV_CALL macro")
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
| |
Analogous to previous commit - never used in any C++ code.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support to radv_GetPhysicalDeviceXlibPresentationSupportKHR
and radv_GetPhysicalDeviceXcbPresentationSupportKHR to check if the
local device file descriptor is compatible with the descriptor
retrieved from the X server via DRI3.
This will stop radv binding to an X server until we have prime
support in place. Hopefully apps use this API before trying
to render things.
v2: drop unneeded function, don't leak memory. (jekstrand)
v3: also check in surface_get_support callback.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
The code for this landed a few days ago.
|
|
|
|
|
|
|
|
| |
The --build-id=... ld flag has been present since binutils-2.18,
released 28 Aug 2007.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vkQueuePresentKHR() takes VkPresentInfoKHR pointer and includes a
pResults fields which must holds the results of all the images
requested to be presented. Currently we're not filling this field.
Also as a side effect we probably want to go through all the images
rather than stopping on the first error.
This commit also makes the QueuePresentKHR() implementation return the
first error encountered.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, this doesn't substantially improve the performance of any
known apps. With Dota 2 on my Sky Lake gt4, it seems help by somewhere
between 0% and 1% but there's enough noise that it's hard to get a clear
picture.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's a bit hard to measure because it almost gets lost in the noise,
but this seemed to help Dota 2 by a percent or two on my Broadwell
GT3e desktop.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan doesn't have a stencilWriteEnable bit like it does for depth.
Instead, you have a stencil mask. Since the stencil mask is handled as
dynamic state, we have to handle it later during command buffer
construction. This, combined with a later commit, seems to help Dota2
on my Broadwell GT3e desktop by a couple percent because it allows the
hardware to move the depth and stencil writes to early in more cases.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This changes the way anv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need. There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Even though we supported both coherent and non-coherent memory types, we
effectively forced apps to use the coherent types by accident. Found by
inspection, only compile tested.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows shaders to write to storage images declared with unknown
format if they are decorated with NonReadable ("writeonly" in GLSL).
Previously an image view would always use a lowered format for its
surface state, however when a shader declares a write-only image, we
should use the real format. Since we don't know at view creation time
whether it will be used with only write-only images in shaders, create
two surface states using both the original format and the lowered
format. When emitting the binding table, choose between the states
based on whether the image is declared write-only in the shader.
Tested on both Sascha Willems' computeshader sample (with the original
shaders and ones modified to declare images writeonly and omit their
format qualifiers) and on our own shaders for which we need support
for this.
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This makes our driver robust to changes in spirv_to_nir which would set
this flag on the variable. Right now, our driver relies on spirv_to_nir
*not* setting var->data.image.write_only for correctness. Any patch
which implements the shaderStorageImageWriteWithoutFormat will need to
effectively revert this commit.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
This fixes a "statement with no effect" compiler warning
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BLORP is now smart enough to handle any swizzle (even those that contain
ZERO or ONE) in a reasonable manner. Just let BLORP handle it. This
fixes the following Vulkan CTS tests on Haswell:
- dEQP-VK.api.image_clearing.clear_color_image.1d_b4g4r4a4_unorm_pack16
- dEQP-VK.api.image_clearing.clear_color_image.2d_b4g4r4a4_unorm_pack16
- dEQP-VK.api.image_clearing.clear_color_image.3d_b4g4r4a4_unorm_pack16
Reviewed-by: Juan A. Suarez Romero <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
Set 3DSTATE_WM/ThreadDispatchEnable bit on/off based on the same
conditions as used in the GL version.
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compressing a render target and decompressing it in the same
single-subpass render pass may waste bandwidth. While this may be
beneficial in some circumstances, it does not help in all. Reclaims
about 1.95% FPS for Dota 2 on some configurations.
v2 (Jason Ekstrand):
- Provide a more thorough comment
- Enable CCS_D for input attachments
v3 (Jason Ekstrand):
- Provide performance numbers
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The data port can't handle CCS at all so replace the finishme with
better comments.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The term "lossless compression" could potentially mean multisample
color compression, single-sample color compression or HiZ because they
are all lossless. The term CCS_E, however, has a very precise meaning;
in ISL and is only used to refer to single-sample color compression.
It's also much shorter which is nice.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 968ffd6c868af7226e8f889573eef709888151cb stored the last subpass
index of all the attachments but that of the depth-stencil attachment.
This could cause depth buffers used in multiple subpasses not to be in
the requested final layout. Fix this error.
Cc: "17.0" <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Enables 10 tests from:
dEQP-VK.draw.shader_draw_parameters.*
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use define for buffer ID (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now on Gen8+ we only allocated a vertex element for
gl_InstanceIndex or gl_VertexIndex when a vertex shader uses
gl_BaseInstanceARB or gl_BaseVertexARB. This is because we would
configure the VF_SGVS packet to make the VF unit write the
gl_InstanceIndex & gl_VertexIndex values right behind the values
computed from the vertex buffers.
In the next commit we will also write the gl_DrawIDARB value. Our
backend expects to pull the gl_DrawIDARB value from the element
following the element containing gl_InstanceIndex, gl_VertexIndex,
gl_BaseInstanceARB and gl_BaseVertexARB (see
vec4_vs_visitor::setup_attributes). Therefore we need to allocate an
element for the SGVS elements as long as at least one of the SGVS
element is read by the shader. Otherwise our shader will use a
gl_DrawIDARB value pulled from the URB one element too far (most
likely garbage).
v2: Fix my english (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use define for buffer ID (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For RGB formats in Vulkan, we use the corresponding RGBA format with a
swizzle of RGB1. While this swizzle is exactly what we want for
texturing, it's not allowed for rendering according to the docs. While
we haven't been getting hangs or anything, we should probably obey the
docs. This commit just sanitizes all render swizzles so that the alpha
channel maps to ALPHA.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes to issues spotted by Emil Velikov:
- set ANV_TIMESTAMP corretly
- fix typo with VULKAN_GEM_FILES
v2: update to use Makefile.sources under vulkan
instead of having own
v3: update to changes to generate from vk.xml
(commit c7fc310)
v4: remove 'hw' relative path
cleanups, remove unnecessary cruft
review from Emil Velikov:
- move to vulkan folder
- remove timestamp gen, no longer necessary
- more cleanups
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is. We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards. The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.
Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again. In particular:
1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
really know whether or not it's pipelined.
2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
is a compute shader.
3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
because the state may be getting cached there (we don't really know).
Reported-by: Mark Janes <[email protected]>
Tested-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed. Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related. This commit fixes them.
Reported-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 2852efcda40274acf3272611c6a3b7731523a72d moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.
Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46
Cc: "17.0" <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
| |
I'm pretty sure we've kept up with the bug fixes.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of the pool is slightly smaller than the size of the
structure containing the whole pool. We need to take that into account
on when setting up the internals.
Fixes a crash due to out of bound memory access in:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
v2: Drop debug traces (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to GL_KHR_vulkan_glsl, the signature of subpassLoad() is:
gvec4 subpassLoad(gsubpassInput subpass);
gvec4 subpassLoad(gsubpassInputMS subpass, int sample);
So the multisampled case always receives an explicit sample index that we
should use. The current implementation was ignoring this parameter
and using gl_SampleID value instead.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add a helper function, anv_get_queue_family_properties(), which fills the
struct. This patch reduces churn in the following patch that implements
vkGetPhysicalDeviceQueueFamilyProperties2KHR.
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|