| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
These implementations of whole-utile load/stores would be the same for
v3d, though the layouts of blocks of utiles has changed.
|
|
|
|
|
|
|
| |
This lets us store the non-PBO glTexImage data directly into the tiled
image without making an extra untiled memcpy for the gallium transfer.
Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around
in the kernel mapping and unmapping the transfer's temporary area.
|
| |
|
|
|
|
|
|
|
| |
They're raster order anyway, so we'd assertion fail along with wasting
bandwidth.
Fixes: 6ad9e8690d14 ("v3d: Add support for texturing from linear.")
|
|
|
|
|
|
|
|
|
|
| |
We're waiting for the jobs-completed count to increment (with wrapping),
not to reach its starting state. This mostly ended up working out because
the next v3d_hw_tick() for a submit CL would end up doing the TFU
operation first, but it did fail when a blit was used for glReadPixels()
at the end of a test.
Fixes: ee0549ff9ab3 ("v3d: Add the V3D TFU submit interface to the simulator.")
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the UAPI, the first BO is the destination, and the one the kernel
should do an exclusive reservation on. Currently we only do exclusive
reservations, anyway. However, in the simulator path I was only copying
back the "destination" BO (actually src in this case), and this caused
regressions once I fixed the simulator to actually complete TFU before
returning (since otherwise, the TFU op would happen at the start of the
next CL submit and the draw would get the right contents).
Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c61469255 "nir: Copy propagation between blocks"
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes build failure if the LLVM headers aren't in a standard include
directory.
Fixes: ec22dd34c88f "radeonsi: move SI_FORCE_FAMILY functionality to
winsys"
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c8 "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
We can remove some duplicated code.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
A resource is just a buffer with some metadata.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we ignored the the glUnmap(..) operation and
flushed before we flush the cbuf. Now, let's just flush
the data when we unmap.
Neither method is optimal, for example:
glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT)
glFlushMappedBufferRange(.., 25, 30)
glFlushMappedBufferRange(.., 65, 70)
We'll end up flushing 25 --> 70. Maybe we can fix this later.
v2: Add fixme comment in the code (Elie)
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
We can reuse the helpers we created.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
| |
util_format_get_blocksize returns 1 for R8 formats (all
PIPE_BUFFERs are R8).
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
| |
We could allocate and destroy transfers in one place.
v2: Keep l_stride around.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
| |
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With commit 89b479, we moved to tracking buffer cleanliness
when binding.
TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
It's used for all types of resources.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Remove a level of indirection to make the code more explicit -- should
make it easier to follow what's going on.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a move towards using composition instead of inheritance for
different query types.
This change weakens out-of-memory error reporting somewhat, though this
should be acceptable since we didn't consistently report such errors in
the first place.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Other callers of si_set_constant_buffer don't need it.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Reduce the number of places that encode buffer descriptors.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This is rather important for merged VS/TCS as LSHS shaders...
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
There is never a read-after-write hazard because the command doesn't read.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Prepare for some later refactoring.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This helps some debugging cases by initializing addrlib with
slightly more appropriate settings.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Allow for a unified but efficient treatment of adding a bitmask over a
wave or an entire threadgroup.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This happened to bite me while doing some experiments.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-By: Gert Wollny <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following race condition could occur in the no-timeout case:
API thread Gallium thread Watchdog
---------- -------------- --------
dd_before_draw
u_threaded_context draw
dd_after_draw
add to dctx->records
signal watchdog
dump & destroy record
execute draw
dd_after_draw_async
use-after-free!
Alternatively, the same scenario would assert in a debug build when
destroying the record because record->driver_finished has not signaled.
Fix this and simplify the logic at the same time by
- handing the record pointers off to the watchdog thread *before* each
draw call and
- waiting on the driver_finished fence in the watchdog thread
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|