| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Set REG_A2XX_RB_COPY_DEST_OFFSET in the tile init as it won't get touched
by the draw batch. Then gmem2mem is the same for all tiles.
Similar to what is done in a6xx, but only for gmem2mem.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Batch reordering on a2xx is now tested and functional.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Allows removing the load_deref/store_deref code in the compiler.
tgsi_to_nir now uses screen instead of options so we can simplify that too.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
a2xx driver is currently broken when PIPE_CAP_PACKED_UNIFORMS is enabled,
disable it for now.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
tgsi_to_nir now requires a screen pointer and is used by fd2_prog_init.
fd2_prog_init is used before fd_context_init so set the pointer manually.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Fixes the static assertion error.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
so that bound compute shader resources won't be added when they are not
needed and same for graphics.
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
The assertion considers max_dw from the current IB in the chain, but
big_ib_buffer is a buffer for the next IB, which can be smaller.
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data. If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.
u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization. At
the very least, it would let us measure the benefit of threading
independently from this optimization. And it's not a lot of code.
Removes most stall avoidance blits in "Total War: WARHAMMER."
On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):
----------------------------------------------
| DiRT Rally | +2% (avg) | + 2% (max) |
| Bioshock Infinite | +3% (avg) | + 9% (max) |
| Shadow of Mordor | +7% (avg) | +20% (max) |
----------------------------------------------
|
|
|
|
| |
This checks both "is it busy" and "do we have work queued up for it"?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.
On my Skylake GT4e at 1920x1080, this improves performance in games:
-----------------------------------------------
| DiRT Rally | +25% (avg) | +17% (max) |
| Bioshock Infinite | +22% (avg) | +11% (max) |
| Shadow of Mordor | +27% (avg) | +83% (max) |
-----------------------------------------------
|
|
|
|
| |
I want to use this in iris_resource.c.
|
|
|
|
|
|
| |
This is probably not the best place for it, but I don't feel like moving
the one out of the TGSI translator today, and we already have the other
direction here, so...*shrug*
|
|
|
|
| |
This will be useful when rebinding images.
|
|
|
|
|
|
| |
This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.
|
|
|
|
|
| |
This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS)
looking to see if any resources are bound.
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have various conditions in place to try and avoid unnecessary
PIPE_CONTROL flushes, especially to batches which may have never
used the buffer being mapped. But if we do a CPU map to a bound
constant buffer, we still need to mark push constants dirty, even
if there's nothing happening in batches that would warrant a flush.
Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus
(lots of rainbow colored triangles). Fixes lots of blinking elements
in "Shadow of Mordor". Fixes missing crowd rendering in "DiRT Rally".
|
|
|
|
|
|
|
|
|
|
| |
This is a workaround for a thread deadlock that I have no idea
why it occurs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only). We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
|
|
|
|
|
|
|
| |
Clear vertex_array_dirty after the state is emitted.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).
On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.
Signed-off-by: Lubomir Rintel <[email protected]>
Cc: <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
getHostCPUFeatures() is also available on ARM, for even longer time than
for x86. Use it -- it potentially enables instructions that may speed
things up.
Signed-off-by: Lubomir Rintel <[email protected]>
Cc: <[email protected]>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.
|
|
|
|
|
|
| |
Based on Nicolai's 0f8c5de8690e7c87aa2e24383065efaca7e6fe78.
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently only meson build supported is added for lima driver.
Add Android build support for lima.
Signed-off-by: Icenowy Zheng <[email protected]>
Acked-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For HW cursors, "cursor.pos" doesn't hold the current position of the
pointer, just the position of the last call to SetCursorPosition().
Skip the check against stale values and bump the d3dadapter9 drm version
to expose this change of behaviour.
Signed-off-by: Andre Heider <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
Fixes failures in shaders.operator.common_functions.sign.*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Mali attribute buffers have to be 64-byte aligned. However, Gallium
enforces no such requirement; for unaligned buffers, we were previously
forced to create a shadow copy (slow!). To prevent this, we instead use
the offseted buffer's address with the lower bits masked off, and then
add those masked off bits to the src_offset. Proof of correctness
included, possibly for the opportunity to say "QED" unironically.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This (fairly large) patch continues work surrounding the panfrost_job
abstraction to improve job lifetime management. In particular, we add
infrastructure to track which BOs are used by a particular job
(currently limited to the vertex buffer BOs), to reference count these
BOs, and to automatically manage the BOs memory based on the reference
count. This set of changes serves as a code cleanup, as a way of future
proofing for allowing flushing BOs, and immediately as a bugfix to
workaround the missing reference counting for vertex buffer BOs.
Meanwhile, there are a few cleanups to vertex buffer handling code
itself, so in the short-term, this allows us to remove the costly VBO
staging workaround, since this patch addresses the underlying causes.
v2: Use pipe_reference for BO reference counting, rather than managing
it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer
counting.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Remove unused functions and mark unhandled default case with
unreachable.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
It's there to hold the static asserts, don't warning about it being
unused.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add a few missing ppir_op_ceil enum handling entries to implement
nir_op_fceil in lima ppir.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We empty the cache sets when flushing the batch, at which point we need
to add any framebuffer related BOs even though the bindings haven't
changed. So, we now do the cache set tracking unconditionally.
For now, we continue skipping resolve work based on the same conditions
in the predraw functions - the thinking is if we didn't trigger
resolves, there's nothing to update here. Time will tell if this works.
Partly reverts commit 365886ebe1a54f893b688b457553eead6aa572ea, and
fixes Unigine Valley rendering on Gen9+. Drops drawoverhead scores
by about 10-12%.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353
|
|
|
|
|
|
|
|
| |
This more accurately reflects what the drm winsys does.
Signed-off-by: Gurchetan Singh <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
Reviewed-By: Piotr Rak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, there's artifacts when running Unigine Valley with
protocol version 2.
We can get away with not waiting for most buffers, but let's
be conservative.
Signed-off-by: Gurchetan Singh <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
Reviewed-By: Piotr Rak <[email protected]>
|
|
|
|
|
|
|
|
| |
We need to copy the shared memory region to the display target.
Signed-off-by: Gurchetan Singh <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
Reviewed-By: Piotr Rak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only tricky part is with protocol 0 we can either have
a display target or resource backing store. With protocol
2 we can have both. Make the map/unmap functions only deal
with the resource backing store.
v2: Handle MSAA texture case.
v3: spelling
v4: Fix dangling else (@prak)
v5: mmap --> os_mmap (@prak) + added comments (@gerddie)
Signed-off-by: Gurchetan Singh <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
Reviewed-By: Piotr Rak <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Gurchetan Singh <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
Reviewed-By: Piotr Rak <[email protected]>
|