| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Caught by the sanity checking in nir_intrinsic_copy_const_indices()
(which is introduced by the next patch).
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
These intrinsics are supposed to map to the underlying hardware
instructions, which don't have wrmask. We use them when we lower
store_output in the geometry pipeline and since store_output gets
lowered to temps, we always see full wrmasks there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intrinsic to get the SIMD width, which not always the same as subgroup
size. Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.
Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument. The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero). This optimization moved together
with lowering of the SIMD.
This is a preparation for letting the drivers call it before the
brw_compile_cs() step.
No shader-db changes in BDW, SKL, ICL and TGL.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
|
|
|
|
|
|
| |
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In f1883cc73d4 we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris. However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy
enough to add them. We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there. It's possible we could do better by walking the whole deref
chain but this should be good enough.
Fixes: f1883cc73d4 "iris: Set alignments on cbuf0 and constant reads"
Closes: #2739
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Ian Romanick <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had missed that LDC actually uses vec4 units for its offset. This
means that we have to create a new instruction, and lower it in
ir3_nir_lower_io_offsets, similar to the existing SSBO instructions.
Unfortunately we can't assume that loads are always vec4-aligned, so we
have to use the alignment information that NIR gives us. Unfortunately,
it's currently woefully inadequate, and will have to be fixed to give us
good codegen in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>
|
|
|
|
| |
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To facilitate lowering SSBOs to globals, we need a load_ssbo_address
intrinsic. This intrinsic takes an SSBO index and loads the address in
global memory of the SSBO (likely implemented via a uniform in the
driver). In the future, we'll support bounds checking, but at the moment
this is not supported (this pass should only be used for trusted
contexts at the moment, i.e. contexts without robustness extensions).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
|
|
|
|
|
|
|
|
|
|
| |
r600 reads vec4 from the UBO, but the offsets in nir are evaluated to the component.
If the offsets are not literal then all non-vec4 reads must resolve the component
after reading a vec4 component (TODO: figure out whether there is a consistent way
to deduct the component that is actually read).
Signed-off-by: Gert Wollny <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3225>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Midgard can't write depth and stencil separately. It has to happen in
a single store operation containing both. Let's add a panfrost specific
intrinsic and turn all depth/stencil stores into a packed depth+stencil
one.
Note that this intrinsic is not yet handled in emit_intrinsic(), but
we'll address that later.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
|
|
|
|
|
|
|
|
|
| |
This introduces a new NIR intrinsic for loading inputs at a specific
vertex index.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
|
|
|
|
|
|
|
|
|
|
|
| |
From the SPV_AMD_shader_explicit_vertex_parameter extension:
"Returns the value of the input <interpolant> without any
interpolation, i.e. the raw output value of previous shader
stage."
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
|
|
|
|
|
|
|
|
| |
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
|
|
|
|
|
|
|
|
|
|
|
| |
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.
According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a NIR instrinsic that represent a memory barrier in SPIR-V /
Vulkan Memory Model, with extra attributes that describe the barrier:
- Ordering: whether is an Acquire or Release;
- "Cache control": availability ("ensure this gets written in the memory")
and visibility ("ensure my cache is up to date when I'm reading");
- Variable modes: which memory types this barrier applies to;
- Scope: how far this barrier applies.
Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory
Semantics" are split into two different attributes so we can use
variable modes for the former.
NIR passes that took barriers in consideration were also changed
- nir_opt_copy_prop_vars: clean up the values for the mode of an
ACQUIRE barrier. Copy propagation effect is to "pull up a load" (by
not performing it), which is what ACQUIRE restricts.
- nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the
pending writes for the modes of an RELEASE barrier. Dead writes
effect is to "push down a store", which is what RELEASE restricts.
- nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for
the modes. This is conservative, but since this is a GL-specific
pass, doesn't make a difference for now.
v2: Fix the scoped barrier handling in copy propagation. (Jason)
Add scoped barrier handling to nir_opt_access and
nir_opt_combine_writes. (Rhys)
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit e8095f2af0736b5937674ca319f29cc9dabb17d4.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jose Maria Casanova <[email protected]>
|
|
|
|
|
|
|
|
| |
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
These intrinsics will let us do all the offset calculations in nir,
which is nicer to work with and lets nir_opt_algebraic eat it all up.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
TCS system values for internal passthru TCS, needed by radeonsi NIR support
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
for internal radeonsi shaders
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This will effectively enable the optimization in anv.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For per-sample color writes we need the output intrinsic to pack the
sample index, which is not provided with regular store_output intrinsics
unless we figured out a way to encode it into the base or the offset.
v2:
- Drop the writemask (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.
v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Acked-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
| |
From SPV_EXT_demote_to_helper_invocation. Demote will be implemented
as a variant of discard, so mark uses_discard if it is used.
v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This is nice to have with radeonsi, where color varyings are handled
specially to avoid recompiles.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
This type information will be used by gather_ssa_types to get usable results
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This represents a float vec4 constant color, as passed to glBlendColor.
While the existing 4 shader sysvals are retained to minimize code churn,
a single vectorized intrinsic is required for efficient blending on
vector architectures. (This may also apply to archictectures like
Bifrost where ALU is scalar but load/store is vector; it largely depends
on how blending is implemented per-driver.)
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Calculates i,j at specified offset within a pixel. A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
I was thinking about a refactor, and needed to read this first.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Please don't make me read a const_index[] expression ever again.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync. Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.
Fixes: 8ed583fe523 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f23 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <[email protected]>
|