| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've noticed the Team Fortress 2 engine seems to do many small
calls to glSubData(..). Let's pick our heuristic based on the
resource base width, not the size of a particular upload.
This will cause transfers to be batched together in the transfer
queue.
Revelant glbench microbenchmark --
Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec
After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This improves Unigine Valley benchmark by 3 to 10 fps (depending
on the scene).
It also improves the Team Fortress 2 benchmark from 6 fps to 13
fps (host: 20 fps).
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transfers will be placed here at unmap time instead of incurring
a VM exit. There's an attempt to deduplicate intersecting 1D transfers,
which are surprisingly common.
This can also help with mipmapped texture upload and smaller
textures, where the majority of the time is spent in the guest
kernel / QEMU -- not virglrenderer. This is shown by the GLbench
texture upload benchmark:
Before:
texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec
After:
texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec
v2: Split up list iteration functions (@gerddie)
v3: Support for optimizing glBufferSubData
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
Let's encode the new protocol with new helper functions.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to have two command buffers:
1) One for transfers
2) One for commands, which can include transfers
At flush time, (2) will be filled. Otherwise, (1) will be
used to submit transfers if there are enough of them.
v2: Pass size directly to cmd_buf_create (@gerddie)
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is motivated by the following scenario:
glSubBufferData(GL_ARRAY_BUFFER, ...)
glFlush(..)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
This increases @davidriley's Team Fortress 2 apitrace from
1 fps to 6 fps and helps with the Chromium glbench
microbenchmarks:
Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec
buffer_upload_dynamic_array_12 = 0.02 mbytes_sec
buffer_upload_dynamic_array_576 = 1.07 mbytes_sec
After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec
buffer_upload_dynamic_array_12 = 2.22 mbytes_sec
buffer_upload_dynamic_array_576 = 164.89 mbytes_sec
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
It's good to keep track of these things.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Much of our logic is based around the idea the upper 16 bits
of a command dword can encode the length of the command.
Now that the command buffer >= 2^16 - 1, we should check for
this.
v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's define a helper function and use it.
This commit also allows resources to be emitted into different command
buffers.
Like the ioctls, send 0 for layer_stride and stride. If we actually
send the real values, there are various assumptions in virglrenderer
for non-1D buffers that may need to be modified.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE. However, this
uses the resource's already attached iovecs rather than the command
buffer to transfer the data.
v2: Used (1 << 16) not (1 << 15) [@gerddie]
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to destroy transfers w/o having a pointer
to the context.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
This should save some memory when allocating and freeing transfers.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we're just uploading to guest memory, let's just align to dword
size.
Fixes: e0f932 ("u_upload_mgr: pass alignment to u_upload_data manually")
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
This allows a minor optimization for texture upload.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
| |
The guest memory is still clean until host GL touches it,
which we should track elsewhere.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
There are levels to cleanliness.
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Fixes regressions with EGL clients
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Setting this is required for desktop-style occlusion queries.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We have cases where we would not like to expose these.
v2: call the option allow_rgb565_configs for consistency
with existing allow_rgb10_configs (Eric, Jason)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few differenes between Mali T860 (Panfrost's primary
reference target) and the older Midgard generations (T600/T700):
- Miscellaneous different magic numbers. It's not clear what these
numbers mean on either the old or new configurations yet.
- Errata fixes. T800 is the final Midgard generation and presumably the
least buggy. Older Midgard has some extra hardware errata we have to
workaround.
- SFBD vs MFBD split. Essentially, older Midgard use a Single
FrameBuffer Descriptor (SFBD), which corresponds to single
render-target rendering. Newer Midgard (T760+) use a Multiple
FrameBuffer Descriptor (MFBD), allowing multiple RTs. On ES 2.0, these
descriptors serve the same function, but we implement both, depending on
the version of the hardware.
- CPU bitness. 32-bit systems generally use 32-bit GPU descriptors, and
vice versa for 64-bit. Our target T760 systems are 32-bit whereas our
target T860 systems are 64-bit. More work is needed in this area.
This patch fixes support in these areas for supporting older Midgard
hardware. It is tested on Mali T760 and Mali T860.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The liveness analysis pass is fairly expensive because it has to build
large bit-sets and run a fix-point algorithm on them. Instead of
requiring liveness for detecting if values escape a CF node, just take
advantage of the structured nature of NIR and use block indices instead.
This only requires the block index metadata which is the fastest we have
metadata to generate.
No shader-db changes on Kaby Lake
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to handle live SSA values differently and it's going to involve
walking the instructions. We can make it a single instruction walk if
we combine it with cf_node_has_side_effects.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug in runscape where we were optimizing x >> 16 to an
extract and then negating and converting to float. The NIR to fs pass
was dropping the negate on the floor breaking a geometry shader and
causing it to render nothing.
Fixes: 1f862e923cb "i965/fs: Optimize float conversions of byte/word..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601
Tested-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately swr was missed in the original commit. The number of
varyings should generally match up to what's reported as the shader
caps for fragment inputs.
Fixes: 6010d7b8e8be (gallium: add PIPE_CAP_MAX_VARYINGS)
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alok Hota <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
Just a little higher up in the function we assert that the aspect masks
are actually equal so there's no reason for the weaker check. Also, the
temporary variables were causing compiler warnings in release builds.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'.
../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’:
../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable]
unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0};
^~~~~~~~~~
[36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'.
../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function]
instr_each_src_and_dest_is_ssa(nir_instr *instr)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spirv_to_nir can generate input/output variables which are illegal
for the current shader stage, which would cause nir_validate_shader
to balk. After my recent commit to start decorating arrays as compact,
dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started
hitting validation errors due to outputs in a TCS (not intended for the
TCS at all) not being per-vertex arrays.
Thanks to Jason Ekstrand for suggesting this approach.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573
Fixes: ef99f4c8d17 compiler: Mark clip/cull distance arrays as compact before lowering.
Reviewed-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
My patch to switch from struct-based MOCS to numeric MOCS accidentally
divided all MOCS entries by 2 in the Vulkan driver.
MOCS on Gen9+ is just an array index into a table. But in the hardware
packets, the index starts at bit 1. So we need to shift it.
Fixes: 0b44644ca68 (genxml: Consistently use a numeric "MOCS" field)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: c6465fec0c5 ("spirv: add SpvCapabilityInt64Atomics")
CID: 1442555
|
|
|
|
|
|
|
|
| |
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
These headers are used by a lot more than just the intel drivers nowadays.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
We should check that num_channels is 4, otherwise that breaks
the world. Sorry for the short breakage.
Fixes: 4b3549c0846 ("radv: reduce the number of loaded channels for vertex input fetches")
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's unnecessary to load more channels than the vertex attribute
format. The remaining channels are filled with 0 for y and z,
and 1 for w.
29077 shaders in 15096 tests
Totals:
SGPRS: 1321605 -> 1318869 (-0.21 %)
VGPRS: 935236 -> 932252 (-0.32 %)
Spilled SGPRs: 24860 -> 24776 (-0.34 %)
Code Size: 49832348 -> 49819464 (-0.03 %) bytes
Max Waves: 242101 -> 242611 (0.21 %)
Totals from affected shaders:
SGPRS: 93675 -> 90939 (-2.92 %)
VGPRS: 58016 -> 55032 (-5.14 %)
Spilled SGPRs: 172 -> 88 (-48.84 %)
Code Size: 2862740 -> 2849856 (-0.45 %) bytes
Max Waves: 15474 -> 15984 (3.30 %)
This mostly helps Croteam games (Talos/Sam2017).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
The formats will be used for reducing the number of loaded channels.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
And make ac_build_expand() a static function.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
I think this will save an instruction and hopefully not increase any other
costs (possibly the immediate -1 and 1?), but I haven't actually tested.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Drops one instruction from fs-sign-int.shader_test. No change in
shader-db due to it having 0 instances of sign(genIType). This may hurt
isign64 if algebraic runs before int64 lowering, but I wasn't sure how to
mark the algebraic opt as "every bit size but 64".
v2: Update commit message about shader-db.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|