| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
vk_error() is a macro that calls __vk_errorf() with instance == NULL.
Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks
to vk_debug_error(), which segfaults as this pointer is invalid but not
NULL.
Fixes: e5b1bd6ab8 "vulkan: move anv VK_EXT_debug_report implementation to common code."
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel is moving to a $class$instance naming scheme in preparation
for accommodating more rings in the future in a consistent manner. It is
already using the naming scheme internally, and now we are looking at
updating some soft-ABI such as the error state to use the new naming
scheme. This of course means we need to teach aubinator_error_decode how
to map both sets of ring names onto its register maps.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Michel Thierry <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Lionel Landwerlin <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Michel Thierry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Vulkan spec with KHX extensions:
"If queries are used while executing a render pass instance that has
multiview enabled, the query uses N consecutive query indices
in the query pool (starting at query) where N is the number of bits
set in the view mask in the subpass the query is used in.
How the numerical results of the query are distributed among the
queries is implementation-dependent. For example, some implementations
may write each view's results to a distinct query, while other
implementations may write the total result to the first query and write
zero to the other queries. However, the sum of the results in all the
queries must accurately reflect the total result of the query summed
over all views. Applications can sum the results from all the queries to
compute the total result."
In our case we only really emit a single query (in the first query index)
that stores the aggregated result for all views, but we still need to manage
availability for all the other query indices involved, even if we don't
actually use them.
This is relevant when clients call vkGetQueryPoolResults and pass all N
queries to retrieve the results. In that scenario, without this patch,
we will never see queries other than the first being available since we
never emit them.
v2: we need the same treatment for timestamp queries.
v3 (Jason):
- Better an if instead of an early return.
- We can't write to this memory in the CPU, we should use
MI_STORE_DATA_IMM and emit_query_availability (Jason).
v4 (Jason):
- No need to take the value to write as parameter, just hard code it to 0.
Fixes test failures in some work-in-progress CTS multiview+query tests.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm. The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets. This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.
The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks. This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.
The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms). This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.
According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue). On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.
v2: Initialize live-out set to the universal set to avoid rather
pessimistic dataflow estimation in shaders with cycles (Addresses
performance regression reported by Eero in GpuTest Piano).
Performance numbers given above still apply. No shader-db changes
with respect to master.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.
This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From Vulkan spec:
"descriptorCount is the number of descriptors contained in the binding,
accessed in a shader as an array. If descriptorCount is zero this
binding entry is reserved and the resource must not be accessed from
any stage via this binding within any pipeline using the set layout."
Fixes:
dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.
This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For things like:
loop
x = func()
list += x
end
just do:
loop
list += func()
end
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
Don't use intermediate variables, use consistent whitespace.
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the meosn build has a mix of two styles:
arg : [foo, ...
bar],
and
arg : [
foo, ...,
bar,
]
For consistency let's pick one. I've picked the later style, which I
think is more readable, and is more common in the mesa code base.
v2: - fix commit message
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
We already had to switch all of the W types to UW to prevent issues
with vector immediates on gen10. We may as well use unsigned types
everywhere.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen 10 has a strange hardware bug involving V immediates with W types.
It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom
four nibbles are repeated instead of the top four being taken. (A mov
of 0x00003210V yields the same result.) This bug does not appear in any
hardware documentation as far as we can tell and the simulator does not
implement the bug either.
Commit 6132992cdb858268af0e985727d80e4140be389c was mostly a no-op
except that it changed the type of the subgroup invocation from UW to W
and caused us to tickle this bug with basically every compute shader
that uses any sort of invocation ID (which is most of them). This is
also potentially an issue for geometry shader input pulls and SampleID
setup. The easy solution is just to change the few places where we use
a vector integer immediate with a W type to use a UW type.
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
Fixes: 6132992cdb858268af0e985727d80e4140be389c
|
|
|
|
|
|
| |
This reverts commit 2d0457203871c843ebfc90fb895b65a9b14cd9bb.
Acked-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some cases weren't handled, such as stride 4 which is needed for 64-bit
operations. Presumably fixes the assertion failure mentioned in commit
2d0457203871 (Revert "i965/fs: Use align1 mode on ternary instructions
on Gen10+") but who can really say since the commit neglected to list
any of them!
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After executing a secondary command buffer, we need to update certain
state on the primary command buffer to reflect changes by the secondary.
Otherwise subsequent commands may not have the correct state set.
This fixes various issues (rendering errors, GPU hangs) seen after
executing secondary command buffers in some cases.
v2 (Jason Ekstrand):
- Reset to invalid values instead of pulling from the secondary
- Change the comment to be more descriptive
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
anv_extensions usage from anv_icd was bringing the unwanted dependency
of mako templates for the latter. We don't want that since it will
force the dependency even for distributable tarballs which was not
needed until now.
Jason suggested this approach.
v2: Patch simplification (Jason).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551
Fixes: 0ab04ba979b ("anv: Use python to generate ICD json files")
Cc: Jason Ekstrand <[email protected]>
Cc: Emil Velikov <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"The maxDescriptorSet* limit is n times the corresponding
maxPerStageDescriptor* limit, where n is the number of shader stages
supported by the VkPhysicalDevice. If all shader stages are supported,
n = 6 (vertex, tessellation control, tessellation evaluation,
geometry, fragment, compute)."
Fixes:
dEQP-VK.api.info.device.properties
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: do not try to handle it as a system value directly for the SPIR-V
path. In GL we rather handle it as a uniform like we do for the
GLSL path (Jason).
v3:
- Remove the uniform variable, it is alwats -1 now (Jason)
- Also do the lowering for the TessEval stage (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This will make aubinator_error_decode decode them properly.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently, Geminilake requires you to whack a chicken bit to select
either compute or tessellation mode for barriers. The recommendation
is to switch between them at PIPELINE_SELECT time.
We may not need to do this all the time, but I don't know that it hurts
either. PIPELINE_SELECT is already a pretty giant stall.
This appears to fix hangs in tessellation control shaders with barriers
on Geminilake. Note that this requires a corresponding kernel change,
drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
in order for the register write to actually happen. Without an updated
kernel, this register write will be noop'd and the fix will not work.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Memtrace aubs are similar to classic aubs, with the major
difference being how command submission is serialized (as register
writes instead of a high-level submit message). Some internal
tools generate or consume only memtrace aubs.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
A later patch will use the aubinator_init() function from the
memtrace aub header handler.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.
If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.
This improves performance by ~13% on an internal benchmark on Skylake.
v2: Use anv_cmd_buffer_get_depth_stencil_view().
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.
This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.
Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.
Signed-off-by: Alex Smith <[email protected]>
Cc: [email protected]
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes hangs seen due to the lock not being released here.
Signed-off-by: Alex Smith <[email protected]>
Cc: [email protected]
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Older OpenGL defines two equations for converting from signed-normalized
to floating point data. These are:
f = (2c + 1)/(2^b - 1) (equation 2.2)
f = max{c/2^(b-1) - 1), -1.0} (equation 2.3)
Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2. DirectX uses equation
2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.
This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.
Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile. That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.
With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules. Zero cases
have been reported of this being a problem. However, we've received a
report that following the old rules breaks expectations. SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:
https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405
So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.
If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too. We'd
likely restore this support as a driconf option. Until then, drop it.
This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):
draw-vertices-2101010: Accept either SNORM conversion formula.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
| |
These are the same, we don't need a separate opcode enum per backend.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Previously, we were flagging the instruction state buffer for capture
but not surface state or dynamic state. We want those captured too.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Some older versions of the Vulkan driver didn't properly tag dynamic
state as needing to be captured. Also, this prevents crashes when
looking at dumps on older kernels.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We were walking the sections, printing the batches, and then freeing
them in one pass. If the batch happens to reference any earlier
sections (which it almost certainly will since it's at the end), we will
access freed memory.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 9cd60fce9c22737000a8f8dc711141f8a523fe75.
Above commit caused 2000+ piglit tests to assert fail. Disabling
the align1 mode on gen10 for now to avoid failures.
Cc: Matt Turner <[email protected]>
Cc: Rafael Antognolli <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Tested-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should shut up some Valgrind errors during pre-regalloc
scheduling. The errors were harmless since they could only have led
to the estimation of the bank conflict penalty of an instruction
pre-regalloc, which is inaccurate at that point of the program
compilation, but no less accurate than the intended "return 0"
fall-back path. The scheduling pass is normally re-run after regalloc
with a well-defined grf_used value and accurate bank conflict
information.
Fixes: acf98ff933d "intel/fs: Teach instruction scheduler about GRF bank conflict cycles."
Reported-by: Eero Tamminen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
obtain vector storage.
The weight_vector_type constructor was inadvertently assuming C++17
semantics of the new operator applied on a type with alignment
requirement greater than the largest fundamental alignment.
Unfortunately on earlier C++ dialects the implementation was allowed
to raise an allocation failure when the alignment requirement of the
allocated type was unsupported, in an implementation-defined fashion.
It's expected that a C++ implementation recent enough to implement
P0035R4 would have honored allocation requests for such over-aligned
types even if the C++17 dialect wasn't active, which is likely the
reason why this problem wasn't caught by our CI system.
A more elegant fix would involve wrapping the __SSE2__ block in a
'__cpp_aligned_new >= 201606' preprocessor conditional and continue
taking advantage of the language feature, but that would yield lower
compile-time performance on old compilers not implementing it
(e.g. GCC versions older than 7.0).
Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226
Reported-by: Józef Kucia <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed
in the passed VkClearRect struct.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We still have gpu hangs on Cannonlake when using push constants, so
disable them for now until we have a proper fix for these hangs.
v2: Add warning message when creating context too.
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: Ben Widawsky <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the RENDER_SURFACE_STATE internal documentation, the
R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply
it to Ivybridge and Baytrail, but not Haswell.
Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell.
Changes these tests from crashing to skipping on Haswell:
- KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
- KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Unfortunately, in aubinator and aubinator_error_decode we don't always
know how many of a given state we have, so we must guess. One day,
we'll come up with a way to annotate the batch to solve this problem.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
The shared framework can now do everything that aubinator_error_decode
ever did and more. It's time to make the switch.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|