| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no reason why we need generate trampoline functions for instance
functions or carry N copies of the instance dispatch table around for
every hardware generation. Splitting the tables and being more
conservative shaves about 34K off .text and about 4K off .data when
built with clang.
Before splitting dispatch tables:
text data bss dec hex filename
3224305 286216 8960 3519481 35b3f9 _install/lib64/libvulkan_intel.so
After splitting dispatch tables:
text data bss dec hex filename
3190325 282232 8960 3481517 351fad _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is basically a port of commit,
3ade766684933ac84e41634429fb693f85353c11
("i965: Disable 3DSTATE_WM_HZ_OP fields.")
The BDW+ docs describe how to use the 3DSTATE_WM_HZ_OP instruction in
the section titled, "Optimized Depth Buffer Clear and/or Stencil Buffer
Clear." It mentions that the packet overrides GPU state for the clear
operation and needs to be reset to 0s to clear the overrides. Depending
on the kernel, we may not get a context with the GPU state for this
packet zeroed. Do it ourselves just in case.
Prevents a number of GPU hangs when running crucible on ICL. I tried to
get the exact number of hangs that occurs without this patch, but was
unsuccessful. The test machine became unresponsive before completing the
full run.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on Skylake."
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Not going to matter, but be consistent.
Found by coverity
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: caf41c78c (anv/allocator: Support softpin in the BO cache)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
normal. If we are near enough to the end, this can cause us to start a
new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We
always reserve enough space at the end for an MI_BATCH_BUFFER_START so
we can just increment cmd_buffer->batch.end prior to emitting the
command.
Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
Tested-by: Alex Smith <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Broadwell and above, we have to use different MOCS settings to allow
the kernel to take over and disable caching when needed for external
buffers. On Broadwell, this is especially important because the kernel
can't disable eLLC so we have to do it in userspace. We very badly
don't want to do that on everything so we need separate MOCS for
external and internal BOs.
In order to do this, we add an anv-specific BO flag for "external" and
use that to distinguish between buffers which may be shared with other
processes and/or display and those which are entirely internal. That,
together with an anv_mocs_for_bo helper lets us choose the right MOCS
settings for each BO use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507
Cc: [email protected]
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
This is the minimum value according to the spec.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Let's just be explicit that VK_NV_shading_rate_image is not supported.
Suggested-by: Jason Ekstrand <[email protected]>
Fixes: 6ee17091708a41c4aa81a "vulkan: Update the XML and headers to 1.1.86"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was about to make the claim to someone that every field in isl_surf
is either an enum or has explicit units. Then I looked at isl_surf and
discovered this claim was wrong. We should fix that. This commit does
a few refactors:
* Add _B suffixes to some struct fields
* Add _B to some variables and parameters
* Rename row_pitch_tiles -> row_pitch_tl
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
| |
h/w specification requires this bit to be always set.
Suggested-by: Kenneth Graunke <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This was added as part of 1.1 but it's very hard to track exactly what
extension added it. In any case, we should implement it.
Cc: [email protected]
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
The only thing that matters is the size since we never specify any
offsets in terms of blocks.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Instead of passing around BOs and offsets, use addresses which are anv's
GPU equivalent of pointers.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Each query slot is a uint64_t and we were only zeroing half of it.
Fixes: 7ec6e4e68980 "anv/query: implement multiview interactions"
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Instead of computing an index at the end which we hope maps to the
number of things written, just count the number of things as we go.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
The offsets now come from the anv_address, these references were not
updated and using the old variable.
Fixes: e1ab8345574 "anv/memcpy: Use addresses instead of bo+offset"
Tested-by: Clayton Craft <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
This shouldn't matter as we'll never write OOB anyway but we may as well
get it right. It's supposed to be in dwords - 1.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[63/93] Compiling C object 'src/intel/vulkan/...intel@vulkan@@anv_common@sta/anv_device.c.o'.
../src/intel/vulkan/anv_device.c:685:30: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
vk_free(&instance->alloc, instance->app_info.app_name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/vulkan/util/vk_alloc.h:62:51: note: passing argument to parameter 'data' here
vk_free(const VkAllocationCallbacks *alloc, void *data)
^
../src/intel/vulkan/anv_device.c:686:30: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
vk_free(&instance->alloc, instance->app_info.engine_name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/vulkan/util/vk_alloc.h:62:51: note: passing argument to parameter 'data' here
vk_free(const VkAllocationCallbacks *alloc, void *data)
^
[65/93] Compiling C object 'src/intel/vulkan/...ommon@sta/anv_nir_apply_pipeline_layout.c.o'.
../src/intel/vulkan/anv_nir_apply_pipeline_layout.c:519:13: warning: unused variable 'image_uniform' [-Wunused-variable]
unsigned image_uniform;
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan 1.1.81 spec says:
"It is legal for offset.x + extent.width or offset.y + extent.height
to exceed the dimensions of the framebuffer - the scissor test still
applies as defined above. Rasterization does not produce fragments
outside of the framebuffer, so such fragments never have the scissor
test performed on them."
Elsewhere, the Vulkan 1.1.81 spec says:
"The application must ensure (using scissor if necessary) that all
rendering is contained within the render area, otherwise the pixels
outside of the render area become undefined and shader side effects
may occur for fragments outside the render area. The render area
must be contained within the framebuffer dimensions."
Unfortunately, there's some room for interpretation here as to what the
consequences are of having the render area set to exactly the
framebuffer dimensions and having a scissor that is larger than the
framebuffer. Given that GL and other APIs provide automatic clipping to
the framebuffer, it makes sense that applications would assume that
Vulkan does this as well. It costs us very little to play it safe and
just clamp client-provided scissors to the framebuffer dimensions.
Fortunately, the user is required to provide us with at least one
scissor so we don't need to handle the case where they don't.
Fixes: fb2a5ceb3264 "anv: Emit DRAWING_RECTANGLE once at driver..."
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I have no idea if I'm correct about what's going wrong or if this is the
correct fix. However, in my multiple weeks of banging my head on this
hang, a VUE reference counting bug seems to match all the symptoms and
it definitely fixes the hang.
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107280
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Known to fix nothing whatsoever but it's in the docs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Some of the bits of VERTEX_BUFFER_STATE such as access type, instance
data step rate, and pitch come from the pipeline.
Cc: [email protected]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.
v2: - Fix additional uses of _mesa_bitcount added after this was
originally written
Acked-by: Eric Engestrom <[email protected]> (v1)
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Seems in case of 32-bit library, usage of msse2 makes
some stack corruption or incorrect instructions.
Usage with mstackrealign fixes that case.
v2: Fixed meson.
v3: Definition of c_sse2_args moved on the top (L.Landwerlin).
Added mstackrealign for Android's mks where msee4.1 is used.
v4: Added for Vulkan also.
v5: Commit message correction.
CC: <[email protected]>
Fixes: 6b05c080f202 (i965: Compile with -msse3)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107779
Signed-off-by: Sergii Romantsov <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_vs_prog_data::double_inputs_read field comes directly from
shader_info::double_inputs which may contain inputs which are not
actually read. Instead of using it directly, AND it with inputs_read
which is only things which are read. Otherwise, we may end up
subtracting too many elements when computing elem_count.
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103241
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
This accidentally didn't make it into 62378c5e9e5
|
|
|
|
|
|
|
|
|
|
| |
We make the flush after a HiZ clear unconditional and add a flush/stall
before the clear as well.
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params. As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index. There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.
This also has the side-effect of making most image use compile-time
binding table indices. Previously, all image access pulled the binding
table index from a uniform. Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart. Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
instructions in affected programs: 115834 -> 113255 (-2.23%)
helped: 191
HURT: 0
total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
cycles in affected programs: 4757115 -> 4642085 (-2.42%)
helped: 73
HURT: 67
total spills in shared programs: 10951 -> 10926 (-0.23%)
spills in affected programs: 742 -> 717 (-3.37%)
helped: 7
HURT: 0
total fills in shared programs: 22226 -> 22201 (-0.11%)
fills in affected programs: 1146 -> 1121 (-2.18%)
helped: 7
HURT: 0
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit expands the current memory access enum to contain the extra
two bits provided for images. We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Dead code will get rid of them eventually but it's better if they're
just gone so we guarantee they won't trip up later passes.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Fixes: 8c048af5890d4 "anv: Copy the appliation info into the instance"
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 9f44745eca0e41 ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Cc: "18.2" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Cc: "18.2" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
anv_GetPhysicalDeviceProperties2()
VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.
Fixes Vulkan CTS CL#2849.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a GPU hang in DOOM 2016 running under wine.
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension can be supported on SKL+. With this patch,
all corresponding tests (6K+) in CTS can pass. No test fails.
I verified CTS with the command below:
deqp-vk --deqp-case=dEQP-VK.pipeline.sampler.view_type.*reduce*
v2: 1) support all depth formats, not depth-only formats, 2) fix
a wrong indention (Jason).
v3: fix a few nits (Lionel).
v4: fix failures in CI: disable sampler reduction when sampler
reduction mode is not specified via this extension (Lionel).
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It fixes simulator warnings in vulkancts tests complaining about missing
support for headerless sampler messages for pre-emptable contexts.
Bit 5 in SAMPLER MODE register is newly introduced for ICLLP.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. We have a similar patch for i965 driver in Mesa
commit a5889d70.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|