| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While we lack value range tracking, this patch tries to 'manually' propogate
the division by 4 to calculate SSBO element-offset, into a possible previous
shift operation (shift left or right); checking that it is safe to do so.
This should help in cases like ie. when accessing a field in an array of
structs, where the offset is likely defined as base plus a multiplication
by a struct or array element size.
See dEQP test 'dEQP-GLES31.functional.ssbo.atomic.xor.highp_uint'
for an example of a shader that benefits from this.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These intrinsics have the offset in dwords already computed in the last
source, so the change here is basically using that instead of emitting
the ir3_SHR to divide the byte-offset by 4.
The improvement in shader stats is significant, of up to ~15% in
instruction count in some cases. Tested only on a5xx.
shader-db is unfortunately not very useful here because shaders that use
SSBO require GLSL versions that are not supported by freedreno yet.
For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
are helped.
A random case:
dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2
with current master:
; CL prog 14/1: 1252 instructions, 0 half, 48 full
; 8 const, 8 constlen
; 61 (ss), 43 (sy)
with the SSBO dword-offset moved to NIR:
; CL prog 14/1: 1053 instructions, 0 half, 45 full
; 7 const, 7 constlen
; 34 (ss), 73 (sy)
The SHR previously emitted for every single SSBO instruction disappears
in most cases, and the dword-offset ends up embedded in the STGB
instruction as immediate in many cases as well.
There are also a few of those tests that are currently failing on register
allocation, that start to pass as a result of reducing the pressure. At least
these, probably more:
dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7
No regressions observed with relevant CTS and piglit tests.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.
For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.
Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.
Reviewed-by: Rob Clark <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
| |
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
|
| |
|
|
|
|
|
| |
This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers,
etc.
|
|
|
|
| |
It will hold draw commands.
|
| |
|
| |
|
|
|
|
| |
Compile all shaders and upload the binaries to a BO.
|
|
|
|
|
|
| |
Save SPIR-V in tu_shader_module. Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile. Both will be called during pipeline creation.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Still dummy, but at least it is created from tu_pipeline_builder.
|
|
|
|
|
|
| |
Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no
longer required to call them (but can still do if they choose to).
|
|
|
|
|
| |
Update cs->start in tu_cs_end_sub_stream. Otherwise, the entry
would include commands from all prior sub-streams.
|
|
|
|
|
|
| |
Array version of tu_cs_emit. Useful for updating multiple
consecutive array-like registers, or loading a shader binary with
SS6_DIRECT.
|
|
|
|
|
|
| |
We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass. With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.
|
|
|
|
|
|
|
|
| |
Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end). We should never
write more commands than what we have reserved.
Assert IB is non-empty and sane in tu_cs_emit_ib.
|
|
|
|
| |
We don't support nor expect BOs to be that big in tu_cs.
|
|
|
|
| |
Includes IBs in kernel cmdbuf dumps.
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
|
| |
|
|
|
|
| |
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*
|
|
|
|
| |
Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*
|
|
|
|
| |
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.*
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Make tu6_get_native_format available to tu_cmd_buffer and start
using of it.
|
|
|
|
|
|
| |
This should be quite complete feature-wise. External fences are
still missing. We probably also want to add a simpler path to
tu_WaitForFences for when fenceCount == 1.
|
|
|
|
|
|
| |
Add tu_pack_clear_value to correctly pack VkClearValue according to
VkFormat. It ignores the component order defined by VkFormat, and
always packs to WZYX order.
|
| |
|
|
|
|
|
| |
AFAICT, it is supported. We don't need to handle any of the new
structs because our BOs can always be exported.
|
|
|
|
| |
AFAICT, it is supported.
|
|
|
|
|
| |
Add tu_bo_init_dmabuf, tu_bo_export_dmabuf, tu_gem_import_dmabuf,
and tu_gem_export_dmabuf.
|
|
|
|
|
|
|
|
| |
If the handle type is unsupported, then the spec requires us to return
VK_ERROR_FORMAT_NOT_SUPPORTED.
Reviewed-by: Chia-I Wu <[email protected]>
Closes: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/merge_requests/17
|
|
|
|
|
|
|
|
|
|
|
|
| |
A format table is an array of tu_native_format. Table lookup is
done through array indexing.
This commit defines a single format table for core VkFormat. It is
derived from the table in the gallium driver. There might be errors
introduced in the process of the conversion.
When an extension that defines new VkFormat is supported, we need to
add a new table for the extension.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- create tile_load_ib and tile_store_ib at the beginning of each
subpass
- execute the IBs at the end of each subpass
- no DONT_CARE support
- no subpass dependency analysis and subpass merging
- no zs support
- no true VkImageView support
- assume VK_FORMAT_B8G8R8A8_UNORM
- no tiling
- no MSAA
This also removes cur_cs from tu_cmd_buffer.
|
|
|
|
|
|
|
| |
When in TU_CS_MODE_SUB_STREAM, tu_cs_begin_sub_stream (or
tu_cs_end_sub_stream) should be called instead of tu_cs_begin (or
tu_cs_end). It gives the caller a TU_CS_MODE_EXTERNAL cs to emit
commands to.
|
|
|
|
|
|
|
|
|
| |
Add tu_cs_mode and TU_CS_MODE_EXTERNAL. When in
TU_CS_MODE_EXTERNAL, tu_cs wraps an external buffer and can not
grow.
This also moves tu_cs* up in tu_private.h, such that other structs
can embed tu_cs_entry.
|
|
|
|
|
| |
tu_cs_emit_ib emits a CP_INDIRECT_BUFFER for a BO. tu_cs_emit_call
emits a CP_INDIRECT_BUFFER for each entry of a target cs.
|
|
|
|
|
| |
It replaces tu_cs_reserve_space_assert and can be called at any
time to sanity check tu_cs.
|
|
|
|
|
|
| |
Error checking tu_cs_begin/tu_cs_end is too tedious for the callers.
Move tu_cs_add_bo and tu_cs_reserve_entry to tu_cs_reserve_space
such that tu_cs_begin/tu_cs_end never fails.
|