| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The two new unit tests verify that propagating a saturate between
instructions of different exec sizes does not happen.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Will allow us to test that propagation between instructions of different
exec sizes does not happen (in the next commit).
The stray-looking change in intervening_dest_write is to adjust the size
of the texture result to keep the test functioning identically when the
instructions' exec sizes are doubled. Without the change, the texture
does not overwrite the destination fully as the unit test intends.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.
v2: Reduce size of the "i" array from 4 to 2 (Matt).
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On gen11, instead of using a PLN instruction, we convert
FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the
fs_generator code.
This patch adds a lowering pass that does the same thing at the
fs_visitor. It also drops the usage of NF types, since we don't need the
extra precision and it lets us skip the accumulator. With all that, some
optimizations will still be run on the generated code, and we should get
better scheduling.
v2: Update comment about saturation and conditional mod (Matt)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.
v2: Better commit message (Matt)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I left code indented one level too far in the previous commit to make
the diff easier to review. Drop that extra level now.
Fixes: 6981069fc80 i965: Ignore uniform storage for samplers or images, use binding info
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gl_nir_lower_samplers_as_deref creates new top level sampler and image
uniforms which have been split from structure uniforms. i965 assumed
that it could walk through gl_uniform_storage slots by starting at
var->data.location and walking forward based on a simple slot count.
This assumed that structure types were walked in a particular order.
With samplers and images split out of structures, it becomes impossible
to assign meaningful locations. Consider:
struct S {
sampler2D a;
sampler2D b;
} s[2];
The gl_uniform_storage locations for these follow this map:
0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0].
But the new split variables look like:
sampler2D lowered_a[2];
sampler2D lowered_b[2];
and there is no way to know that there's effectively a stride to get to
the location for successive elements of a[] or b[]. So, working with
location becomes effectively impossible.
Ultimately, the point of looking at uniform storage was to pull out the
bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs
can obtain this information while doing the splitting, however, and sets
up var->data.binding to have the desired values.
We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so
gl_nir_lower_samplers_as_derefs has the opportunity to set proper image
bindings. Then, we make the uniform handling code skip sampler(-array)
variables, and handle image param setup based on var->data.binding.
Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct,
this time without regressing dEQP-GLES2.functional.uniform_api.random.3.
Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 9e0c744f07a21fc7bb018a77cf83b057436d0d1b, which
regressed dEQP-GLES2.functional.uniform_api.random.3. It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.
Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.
See the next commit for a longer description of the problem.
This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert. The next commit
will fix it again.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a workaround for a thread deadlock that I have no idea
why it occurs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Epic Games Launcher could be launched in opengl mode
with "-opengl" option. It creates 4.4 opengl core context
however it uses deprecated functionality e.g. default
vertex buffer object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462
Signed-off-by: Danylo Piliaiev <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only). We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
|
|
|
|
|
|
|
| |
Clear vertex_array_dirty after the state is emitted.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).
On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.
Signed-off-by: Lubomir Rintel <[email protected]>
Cc: <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
getHostCPUFeatures() is also available on ARM, for even longer time than
for x86. Use it -- it potentially enables instructions that may speed
things up.
Signed-off-by: Lubomir Rintel <[email protected]>
Cc: <[email protected]>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.
|
|
|
|
|
|
| |
Based on Nicolai's 0f8c5de8690e7c87aa2e24383065efaca7e6fe78.
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
We have a macro for this now; no reason to hand-roll it for derefs.
While we're here, move the NIR_DEFINE_CAST for derefs down to where all
the other ones are.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only computeDerivativeGroupLinear is supported for now.
All crucible tests pass.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign
to its own function") criss-crossed with c2b8fb9a810 ("anv/device:
expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying
enough attention when I rebased. This adds back the float16 changes and
enables the optimization.
v2: Incorporate more changes from 19cd2f5debd and a8d8b1a1391 that I
missed in the previous version.
Fixes: ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474
Reviewed-by: Matt Turner <[email protected]> [v1]
|
|
|
|
|
|
|
|
|
| |
Currently only meson build supported is added for lima driver.
Add Android build support for lima.
Signed-off-by: Icenowy Zheng <[email protected]>
Acked-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For HW cursors, "cursor.pos" doesn't hold the current position of the
pointer, just the position of the last call to SetCursorPosition().
Skip the check against stale values and bump the d3dadapter9 drm version
to expose this change of behaviour.
Signed-off-by: Andre Heider <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop. However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I missed this on the first go round. The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.
Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
| |
Fixes failures in shaders.operator.common_functions.sign.*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Mali attribute buffers have to be 64-byte aligned. However, Gallium
enforces no such requirement; for unaligned buffers, we were previously
forced to create a shadow copy (slow!). To prevent this, we instead use
the offseted buffer's address with the lower bits masked off, and then
add those masked off bits to the src_offset. Proof of correctness
included, possibly for the opportunity to say "QED" unironically.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This (fairly large) patch continues work surrounding the panfrost_job
abstraction to improve job lifetime management. In particular, we add
infrastructure to track which BOs are used by a particular job
(currently limited to the vertex buffer BOs), to reference count these
BOs, and to automatically manage the BOs memory based on the reference
count. This set of changes serves as a code cleanup, as a way of future
proofing for allowing flushing BOs, and immediately as a bugfix to
workaround the missing reference counting for vertex buffer BOs.
Meanwhile, there are a few cleanups to vertex buffer handling code
itself, so in the short-term, this allows us to remove the costly VBO
staging workaround, since this patch addresses the underlying causes.
v2: Use pipe_reference for BO reference counting, rather than managing
it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer
counting.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space. This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of setting it manually, call the helper. When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle. This lets us avoid any extra shifts in the shader. Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address. We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed. It means we can't
tightly pack samplers but that's probably not a big deal.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When we have a bindless sampler, we need an instruction header. Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers. Instead, we have to lower TXD to TXL.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly. This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:
1. It lets us support a virtually unbounded number of SSBO bindings.
2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
implement before because those atomic messages are only available
in the bindless A64 form.
3. It's way better than messing around with bindless handles for SSBOs
which is the only other option for VK_EXT_descriptor_indexing.
4. It's more future looking, maybe? At the least, this is what NVIDIA
does (they don't have binding based SSBOs at all). This doesn't a
priori mean it's better, it just means it's probably not terrible.
The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach. When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us. It also avoids some potentially expensive
64-bit integer calculations.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We're about to start doing 64-bit pointer calculations in ANV. They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering. Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless. There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit just sorts the bindings by how often they're used vs the
array size of the binding. This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.
Fixes: ddc40691221 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is really where they belong; not push constants. The one downside
here is that we can't push them anymore for compute shaders. However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders. This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We spend a lot of time in the driver adding things to hash sets to track
residency. The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them. In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency. Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order. However, without relocations, the
overhead of that is pretty small.
If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point. We're better off swapping
out other apps and just letting ours run a whole frame.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|