| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've noticed the Team Fortress 2 engine seems to do many small
calls to glSubData(..). Let's pick our heuristic based on the
resource base width, not the size of a particular upload.
This will cause transfers to be batched together in the transfer
queue.
Revelant glbench microbenchmark --
Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec
After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This improves Unigine Valley benchmark by 3 to 10 fps (depending
on the scene).
It also improves the Team Fortress 2 benchmark from 6 fps to 13
fps (host: 20 fps).
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transfers will be placed here at unmap time instead of incurring
a VM exit. There's an attempt to deduplicate intersecting 1D transfers,
which are surprisingly common.
This can also help with mipmapped texture upload and smaller
textures, where the majority of the time is spent in the guest
kernel / QEMU -- not virglrenderer. This is shown by the GLbench
texture upload benchmark:
Before:
texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec
After:
texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec
v2: Split up list iteration functions (@gerddie)
v3: Support for optimizing glBufferSubData
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
Let's encode the new protocol with new helper functions.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to have two command buffers:
1) One for transfers
2) One for commands, which can include transfers
At flush time, (2) will be filled. Otherwise, (1) will be
used to submit transfers if there are enough of them.
v2: Pass size directly to cmd_buf_create (@gerddie)
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is motivated by the following scenario:
glSubBufferData(GL_ARRAY_BUFFER, ...)
glFlush(..)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
This increases @davidriley's Team Fortress 2 apitrace from
1 fps to 6 fps and helps with the Chromium glbench
microbenchmarks:
Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec
buffer_upload_dynamic_array_12 = 0.02 mbytes_sec
buffer_upload_dynamic_array_576 = 1.07 mbytes_sec
After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec
buffer_upload_dynamic_array_12 = 2.22 mbytes_sec
buffer_upload_dynamic_array_576 = 164.89 mbytes_sec
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
It's good to keep track of these things.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Much of our logic is based around the idea the upper 16 bits
of a command dword can encode the length of the command.
Now that the command buffer >= 2^16 - 1, we should check for
this.
v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's define a helper function and use it.
This commit also allows resources to be emitted into different command
buffers.
Like the ioctls, send 0 for layer_stride and stride. If we actually
send the real values, there are various assumptions in virglrenderer
for non-1D buffers that may need to be modified.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE. However, this
uses the resource's already attached iovecs rather than the command
buffer to transfer the data.
v2: Used (1 << 16) not (1 << 15) [@gerddie]
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to destroy transfers w/o having a pointer
to the context.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
This should save some memory when allocating and freeing transfers.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we're just uploading to guest memory, let's just align to dword
size.
Fixes: e0f932 ("u_upload_mgr: pass alignment to u_upload_data manually")
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
This allows a minor optimization for texture upload.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
| |
The guest memory is still clean until host GL touches it,
which we should track elsewhere.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
There are levels to cleanliness.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.
Fixes KHR-GL45.limits.max_fragment_input_components
Signed-off-by: Karol Herbst <[email protected]>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
| |
v1.1: fix size define.
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
| |
GL underneath always has GL_TIME_ELAPSED so always enable these.
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: - Use the renamed CAPS
- add assetions to make sure that mesa doesn't try to switch
destination surface formats when it is not supported. (Ilia Mirkin)
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
| |
This stores the raster state and calls the correct primconvert interface
using the currently bound raster state.
Reviewed-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Fixes: 174f53 ("virgl: consolidate transfer code")
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, the gl-1.0-long-dlist Piglit test crashes.
Fixes: db7757 ("virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT")
Reported by airlied@
v2: Exit on any invalid range (Erik)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109190
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
Tested-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
| |
We can remove some duplicated code.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
A resource is just a buffer with some metadata.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we ignored the the glUnmap(..) operation and
flushed before we flush the cbuf. Now, let's just flush
the data when we unmap.
Neither method is optimal, for example:
glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT)
glFlushMappedBufferRange(.., 25, 30)
glFlushMappedBufferRange(.., 65, 70)
We'll end up flushing 25 --> 70. Maybe we can fix this later.
v2: Add fixme comment in the code (Elie)
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
We can reuse the helpers we created.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
| |
util_format_get_blocksize returns 1 for R8 formats (all
PIPE_BUFFERs are R8).
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
| |
We could allocate and destroy transfers in one place.
v2: Keep l_stride around.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
| |
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Will be reused.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With commit 89b479, we moved to tracking buffer cleanliness
when binding.
TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
It's used for all types of resources.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Virglrenderer does the wrong thing when given an instance divisor;
it tries to use the element-index rather than the binding-index as
the argument to glVertexBindingDivisor(). This worked fine as long
as there was a 1:1 relationship between elements and bindings,
which was the case util 19a91841c34 "st/mesa: Use Array._DrawVAO in
st_atom_array.c.".
So let's detect instance divisors, and restore a 1:1 relationship in
that case. This will make old versions of virglrenderer behave
correctly. For newer versions, we can consider making a better
interface, where the instance divisor isn't specified per element,
but rather per binding. But let's save that for another day.
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: 19a91841c34 "st/mesa: Use Array._DrawVAO in st_atom_array.c."
Reviewed-by: Mathias Fröhlich <[email protected]>
Tested-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This just has one member for now; the handle. But this is about to
change.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Tested-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Tested-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Tested-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I made sure that half-float texture-filtering was required for ES3,
I didn't realize that virgl doesn't report support for this correctly.
This regressed the GLES version available on top of several drivers,
including i965 from 3.2 to 2.0.
This is going to need protocol changes to fix properly, so let's just
restore the previous behavior by enabling floating-point filtering
unconditionally for now.
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: fcf9fcee3c8 "mesa/main: do not require float-texture filtering for es3"
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
| |
Fixes: 8d4bb6e5c (virgl: Add command and flags to initiate debugging on the host (v2))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can mark the buffer unclean if it's ever bound as a TBO,
SSBO, ABO, or image.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.map_buffer_range.new_specified_buffer.flag_write_full.stream_draw
from 9.58 MB/s to 451.17 MB/s.
v2: Track buffer cleanliness as a function of bindings (Ilia).
v3: virgl_modify_clean --> virgl_dirty_res (Erik)
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We flush everytime the command buffer (16 kB) is full, which is
quite costly.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw
from 111.16 MB/s to 1930.36 MB/s.
In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4,
and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc.
I didn't notice any clear differences, so let's just go with the most obvious
heuristic.
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tested running WebGL aquarium on Nvidia host (10,000 fishes)
This moves us from 7 fps to 9 fps. After quadrupling, performance
gains diminish.
v2: Remove change ID (Erik)
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
| |
Ported from virglrenderer.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vtest doesn't implement the according API and would segfault:
Program received signal SIGSEGV, Segmentation fault.
#0 0x0000000000000000 in ?? ()
#1 in virgl_fence_server_sync at
src/gallium/drivers/virgl/virgl_context.c:1049
#2 in st_server_wait_sync at
src/mesa/state_tracker/st_cb_syncobj.c:155
so just don't do the call when the function pointers are not set.
Fixes dEQP:
dEQP-GLES3.functional.fence_sync.wait_sync_smalldraw
dEQP-GLES3.functional.fence_sync.wait_sync_largedraw
Fixes: d1a1c21e7621b5177febf191fcd3d3b8ef69dc96
virgl: native fence fd support
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Robert Foss <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Verify the pipe_fd_type to be of PIPE_FD_TYPE_NATIVE_SYNC.
Fixes: d1a1c21e7621b5177feb "virgl: native fence fd support"
Suggested-by: Eric Engestrom <[email protected]>
Signed-off-by: Robert Foss <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Remove a dead variable, a int->bool conversion and some
whitespace changes.
Signed-off-by: Robert Foss <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|