| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Tie in the check whether the host supports tweaks and whether this tweak
is enabled.
v2: Add comment about the emulated formats not being used directly in the
guest (Gurchetan)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL_MAP_INVALIDATE_BUFFER_BIT cannot be treated as
GL_MAP_INVALIDATE_RANGE_BIT naively. When we run into
ptr = glMapBufferRange(buf, 0, size,
GL_WRITE_BIT|GL_MAP_INVALIDATE_BUFFER_BIT);
memcpy(ptr, data1, size);
glUnmapBuffer(buf);
ptr = glMapBufferRange(buf, size, size,
GL_WRITE_BIT|GL_MAP_UNSYNCHRONIZED_BIT);
memcpy(ptr, data2, size);
glUnmapBuffer(buf);
we never want data1 to be copy_transfer'ed. Because that would mean
that data2 might overwrite valid data.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis [email protected]
Fixes: a22c5df0794 ("virgl: Use buffer copy transfers to avoid waiting when mapping")
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the resource to be mapped is busy and the backing storage can
be discarded, reallocate the backing storage to avoid waiting.
In this new path, we allocate a new buffer, emit a state change,
write, and add the transfer to the queue . In the
PIPE_TRANSFER_DISCARD_RANGE path, we suballocate a staging buffer,
write, and emit a copy_transfer (which may allocate, memcpy, and
blit internally). The win might not always be clear. But another
win comes from that the new path clears res->valid_buffer_range and
does not clear res->clean_mask. This makes it much more preferable
in scenarios such as
access = enough_space ? GL_MAP_UNSYNCHRONIZED_BIT :
GL_MAP_INVALIDATE_BUFFER_BIT;
glMapBufferRange(..., GL_MAP_WRITE_BIT | access);
memcpy(...); // append new data
glUnmapBuffer(...);
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE is properly supported,
virgl_transfer might refer to a different virgl_hw_res than
virgl_resource does. We need to save the virgl_hw_res and use the
saved one.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
| |
It works similar to pipe_resource_reference but is for virgl_hw_res.
It can also replace resource_unref.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
| |
We should avoid having potentially dangling pointers to
pipe_resources in general.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
| |
A pipe_transfer is a context object. It is fine for the
constructor/destructor to have access to the context.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.
To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.
Fixes kernel errors reported when running texture_upload tests in glbench.
Signed-off-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.
Signed-off-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Extend copy transfers to also be used for busy textures.
Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS
Signed-off-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.
This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.
Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS
Signed-off-by: Alexandros Frantzis <[email protected]>
Suggested-by: Gurchetan Singh <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.
Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.
Signed-off-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.
Signed-off-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we are not allowed to block, and we know that we will have to wait,
either because the resource is busy, or because it will become busy due
to a readback, return early to avoid performing an incomplete
transfer_get. Such an incomplete transfer_get may finish at any time,
during which another unsynchronized map could write to the resource
contents, leaving the contents in an undefined state.
Signed-off-by: Alexandros Frantzis <[email protected]>
Suggested-by: Chia-I Wu <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When readback is true, and there are pending writes in the transfer
queue, we should flush to avoid reading back outdated data. This
fixes piglit arb_copy_buffer/dlist and a subtest of
arb_copy_buffer/data-sync.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Imagine this
resource_copy_region(ctx, dst, ..., src, ...);
transfer_map(ctx, src, 0, PIPE_TRANSFER_WRITE, ...);
at the beginning of a cmdbuf. We need to flush in transfer_map so
that the transfer is not reordered before the resource copy. The
check for "vctx->num_draws == 0 && vctx->num_compute == 0" is not
enough. Removing the optimization entirely.
Because of the more precise resource tracking in the previous
commit, I hope the performance impact is minimized. We will have to
go with perfect resource tracking, or attempt a more limited
optimization, if there are specific cases we really need to optimize
for.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
virgl_transfer_queue_is_queued was used to avoid flushing. That
fails when the resource is being accessed by previous cmdbufs but
not the current one.
The new approach of tracking the valid buffer range does not apply
to textures however. But hopefully it is fine because the goal is
to avoid waiting for this scenario
glBufferSubData(..., offset, size, data1);
glDrawArrays(...);
// append new vertex data
glBufferSubData(..., offset+size, size, data2);
glDrawArrays(...);
If glTex(Sub)Image* turns out to be an issue, we will need to track
valid level/layer ranges as well.
v2: update virgl_buffer_transfer_extend as well
v3: do not remove virgl_transfer_queue_is_queued
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]> (v1)
Reviewed-by: Gurchetan Singh <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
|
| |
Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY.
Make virgl_resource_transfer_prepare return an enum instead of a
bool for extensibility (e.g., instruct the callers to map
differently).
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
virgl_resource_transfer_prepare should be called before mapping to
prepare the resource. It does flush, readback, and wait as needed.
virgl_res_needs_flush and virgl_res_needs_readback become internal
helpers to the new function.
There should be no externally visible change.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
| |
Add comments and follow the coding style of virgl_res_needs_flush.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Add comments and some minor cleanups.
v2: document the function
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]> (v1)
Reviewed-by: Gurchetan Singh <[email protected]>
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Both apps and we (see virgl_buffer_transfer_flush_region) might
flush regions that are unmodified. We have to read back for those
flushes.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inline writes skip transfer map/unamp at the cost of an extra copy
on the data during execbuffer. That is generally a win for small
transfers. But the heuristic to use inline writes based on buffer
sizes rather than transfer sizes makes little sense. More
importantly, inline writes miss optimizations that are done for
buffer transfers.
Let's just use transfers.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
| |
Small buffer subdatas which are essentially doing a memcpy were getting
bogged down by all the overhead of creating new transfers.
Signed-off-by: David Riley <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
| |
When I added indirect support I forgot this, however to use it
now we need to check for a new enough capability on the host side.
Reviewed-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
| |
This should save some space.
Suggested-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've noticed the Team Fortress 2 engine seems to do many small
calls to glSubData(..). Let's pick our heuristic based on the
resource base width, not the size of a particular upload.
This will cause transfers to be batched together in the transfer
queue.
Revelant glbench microbenchmark --
Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec
After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This improves Unigine Valley benchmark by 3 to 10 fps (depending
on the scene).
It also improves the Team Fortress 2 benchmark from 6 fps to 13
fps (host: 20 fps).
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is motivated by the following scenario:
glSubBufferData(GL_ARRAY_BUFFER, ...)
glFlush(..)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
This increases @davidriley's Team Fortress 2 apitrace from
1 fps to 6 fps and helps with the Chromium glbench
microbenchmarks:
Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec
buffer_upload_dynamic_array_12 = 0.02 mbytes_sec
buffer_upload_dynamic_array_576 = 1.07 mbytes_sec
After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec
buffer_upload_dynamic_array_12 = 2.22 mbytes_sec
buffer_upload_dynamic_array_576 = 164.89 mbytes_sec
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to destroy transfers w/o having a pointer
to the context.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
This allows a minor optimization for texture upload.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
There are levels to cleanliness.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
We can remove some duplicated code.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
| |
util_format_get_blocksize returns 1 for R8 formats (all
PIPE_BUFFERs are R8).
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
| |
We could allocate and destroy transfers in one place.
v2: Keep l_stride around.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
| |
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We flush everytime the command buffer (16 kB) is full, which is
quite costly.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw
from 111.16 MB/s to 1930.36 MB/s.
In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4,
and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc.
I didn't notice any clear differences, so let's just go with the most obvious
heuristic.
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
| |
All drivers were already doing it except virgl.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.
v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
virgl is the 3D acceleration backend for the
virtio-gpu shipping with qemu.
The 3D acceleration is designed around gallium
and TGSI as the virtualisation layer. The backend
renderer translates the virgl interface into
OpenGL currently.
This is the initial import of the driver to mesa.
The kernel driver portions are lined up for drm-next.
Currently this driver supports up to GL3.3 and some
misc extensions if the host driver exposes it. It is
planned to iterate the virgl API to new GL levels
as mesa host drivers gain features.
v2: fix resource tracking across flushes to avoid
->bind hack in mapping.
consolidate mapping and waiting code for transfers.
use u_range for dirt tracking.
handle larger shaders in protocol.
include virtgpu_drm.h in mesa for now.
add translation layer for gallium tgsi to virgl tgsi.
Signed-off-by: Dave Airlie <[email protected]>
|