| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU. To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle. We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Tested-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the future, some images will need to be aligned to a larger value
than 4096. Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.
v2: Fix missing alignment in vma_alloc, caught by Caio!
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Tested-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally, we achieve this by skipping the flush on calls to
v3d_flush_jobs_writing_resource() when we detect that the resource is written
in the current job from a transform feedback write.
The exception to this is the case where the caller is about to map the
resource, in which case we need to flush immediately since we can only emit
'Wait for transform feedback' commands on rendering jobs. We add a parameter
to the function so the caller can identify that scenario.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware can flush transform feedback writes before reads in the same
job by inserting this command.
This patch detects when the rendering state for the current draw call reads
resources that had been previously written by transform feedback in the
same job and inserts the 'Wait for transform feedback' command before
emitting the new draw.
v2 (Eric):
- this was intended to look at job->tf_write_prscs for TF jobs.
- clear job->tf_write_prscs after we emit the TF flush.
- can skip flushes for fragment shader reads from TF.
v3 (Eric):
- all resources in job->tf_write_prscs are resources written by TF so
we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT.
- documented optimization opportunity for geometry stages.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware provides a feature to sync reads from previous transform feedback
writes in the same job so if we use this mechanism we no longer have to flush
the job.
In order to identify this scenario we need a mechanism to identify resources
that are written by transform feedback.
v2: use _mesa_pointer_set_create (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
the dri image format here should match the fourcc format
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
this makes the entire struct available for use here
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
the formats handled in the switch statement will always return an
unknown mesa format, so process them directly and leave the default
case for other/unknown formats
no functional changes
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall). We also try and select the optimal batch.
Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4). It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO. Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.
I arbitrarily decided to support 4/8/12/16 bytes. Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
|
|
|
|
|
|
| |
Fixes: b5e04e9217b "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Sagar Ghuge<[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We can greatly simplify our builds by just hardcoding GLvector4f and
GLmatrix's layouts.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of these haven't been used since the conversion from checked-in
matypes to generation. By cutting down the generated contents, this
should clarify why the file is generated: we need
architecture-specific offsets to the V4F fields in the asm that uses
it.
v2: Keep matrix offsets to prevent x86 build breakage..
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
These two are already pulled from `idep_vulkan_util_headers`.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Sagar Ghuge <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently only 7 formats are supported, but we don't want the 16 limit
(it's an `unsigned`) to hit us by surprise :]
Let's use bitset.h's BITSET magic to allow us to have any number of
formats, with a static assert to make sure we don't forget to update it.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
v2: Reorder patch (Matt Turner)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: 1) Add more optimization rules for ROL/ROR (Matt Turner)
2) Add lowering rules for ROL/ROR (Matt Turner)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
align16 mode (Matt Turner)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We implement GLES3.0 instanced rendering with full support for instanced
arrays (via instance divisors). To do so, we use the new invocation
helpers to invoke a triplet of (1, vertex_count, instance_count), rather
than simply (1, vertex_count, 1). We rewrite the attribute handling code
into a new pan_instancing.c file which handles both the simple LINEAR
case for non-instanced as well as each of the new instancing cases:
MODULO (for per-vertex attributes), POT and NPOT divisors.
As a side effect, we rework how vertex buffers are handled, duplicating
them to be 1:1 with vertex descriptors to simplify instancing code paths
dramatically. This might be a performance regression, but this remains
to be seen; if so, we can always deduplicate later with some added logic
in pan_instancing.c
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Rather than open-coding workgroups_shift_* type fields, we include a
general routine for packing the vertex/tiler/compute descriptor based on
the provided dispatch parameters.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Should not affect lima.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than using a magic lookup table with no explanations, let's add
liberal comments to the code to explain what this tiling scheme is and
how to encode/decode it efficiently.
It's not so mysterious after all -- just reordering bits with some XORs
thrown in.
v2: Correct copyright identifier. Fix spelling error. Switch space_4 to
a LUT. Fix comment typo. Use LUT instead of space_x tricks. Fallback on
generic rather than split up unaligned writes.
v3: Correct stride order (fixes crash loading). Correct coordinate
system mishap.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Tested-by: Andreas Baierl <[email protected]>
|
|
|
|
|
|
| |
Corrects the a3xx texconst state for TILE_MODE.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Just a cleanup, it shouldn't change anything.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Just move around the switch case. GFX9+ is handled below.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
We only need to 2.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Found while working on DCC for arrays.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes link failure since clang r364424 "[clang/DIVar] Emit the flag for
params that have unmodified value", clangCodeGen depends on
clangASTMatchers now.
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
When using ARB_gl_spirv, the block names are optional and the uniform
blocks are referred using Bindings instead. Teach
gl_nir_lower_buffers to handle those.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Among other things, it supports arrays of arrays of UBO/SSBO (default
codepath doesn't).
Acked-by: Timothy Arceri <[email protected]>
v2: nir_address_format_vk_index_offset got renamed to
nir_address_format_32bit_index_offset (after rebase against master)
v3: the ptr_type fields in spirv_to_nir_options got changed to be of
type nir_address_format.
v4: remove phys_ssbo_addr_format and push_const_addr_format as they are
not used by glspirv
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
When using a SPIR-V shader. Note that needs to be done before linking
uniforms, so when creating the uniform storage entries, block_index
could be filled properly (among other things).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
The function had a mix of true/GL_TRUE and false/GL_FALSE
returns. Using GL_TRUE/GL_FALSE as the function returns a GLboolean.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, we were using the uniform explicit location to check if the
current nir variable was already processed while adding entries on the
uniform storage. But for UBOs/SSBOs, entries are added too but we lack
a explicit location.
For those we need to rely on the UBO/SSBO binding and the unifor
storage block_index. In that case several uniforms would need to be
updated at once.
v2: (from Timothy review)
* Improve wording and fix typos of some long comments.
* Rename update_uniform_storage for mark_stage_as_active
v3: (from cmarcelo review)
* Fixed some comment typos
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically, offset, stride (coming from arrays or matrices) and
row_major.
On GLSL, most of that info is computed using the layout qualifier, but
on ARB_gl_spirv they are explicit, and for Mesa, included on the
glsl_type.
From ARB_gl_spirv spec:
"Mapping of layouts
std140/std430 -> explicit *Offset*, *ArrayStride*, and
*MatrixStride* Decoration on struct members""
"7.6.2.spv SPIR-V Uniform Offsets and Strides
The SPIR-V decorations *GLSLShared* or *GLSLPacked* must not be
used. A variable in the *Uniform* Storage Class decorated as a
*Block* must be explicitly laid out using the *Offset*,
*ArrayStride*, and *MatrixStride* decorations"
For offset we needed to include the parent and index_in_parent while
processing the type, as the offset is maintained on glsl_struct_field
of the parent type, not on the type itself.
v2: Fix the default values for MATRIX_STRIDE, ARRAY_STRIDE and
ROW_MAJOR when the variable is not backed by a buffer object
(Antia Puentes).
v3: Update after Jason series "SPIR-V: Use NIR deref instructions for
UBO/SSBO access" that included just one explicit stride, instead
of a previous patch we wrote that had matrix_stride and
array_stride (Alejandro)
Signed-off-by: Antia Puentes <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For this interfaces, the inner members are added only once as uniforms
or resources, in opposite to other cases, like a uniform array of
structs.
For those guessing why a issue (16) from ARB_program_interface_query
was used, instead of a quote of the core spec: The core spec is not
really clear about how members of arrays of blocks should be
enumerated.
On GLSL this was also problematic, specially when we were trying to
pass the 4.5 CTS tests. See commit "glsl: Fix program interface
queries relating to interface blocks"
(4c4d9e4f032d5753034361ee70aa88d16d3a04b4), as a reference. That one
also needed to rely on issue (16) to justify the change, pointing that
the core spec needs to be clarified.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|