aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler/nir/nir_intrinsics.py
Commit message (Collapse)AuthorAgeFilesLines
* nir: fix indices for ir3 ssbo_atomic intrinsicsRob Clark2020-05-131-10/+10
| | | | | | | | | Caught by the sanity checking in nir_intrinsic_copy_const_indices() (which is introduced by the next patch). Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: Drop wrmask for ir3 local and global store intrinsicsKristian H. Kristensen2020-05-131-2/+2
| | | | | | | These intrinsics are supposed to map to the underlying hardware instructions, which don't have wrmask. We use them when we lower store_output in the geometry pipeline and since store_output gets lowered to temps, we always see full wrmasks there.
* intel/fs: Add and use a new load_simd_width_intel intrinsicCaio Marcelo de Oliveira Filho2020-05-011-0/+3
| | | | | | | | | | | | | | | | | | | | | Intrinsic to get the SIMD width, which not always the same as subgroup size. Starting with a small scope (Intel), but we can rename it later to generalize if this turns out useful for other drivers. Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of a width will be passed as argument. The pass also used to optimized load_subgroup_id for the case that the workgroup fitted into a single thread (it will be constant zero). This optimization moved together with lowering of the SIMD. This is a preparation for letting the drivers call it before the brw_compile_cs() step. No shader-db changes in BDW, SKL, ICL and TGL. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
* nir: Add r600 specific intrinsics for tesselation shader IOGert Wollny2020-04-231-0/+12
| | | | | | Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
* nir: Add an alignment to nir_intrinsic_load_constantJason Ekstrand2020-04-161-1/+2
| | | | | | | | | | | | | | | | | In f1883cc73d4 we tried to pass through alignments from load_constant intrinsics when rewriting them to load_ubo in iris. However, those intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy enough to add them. We just call the size/align function on the vector type at the end of our deref chain and use the alignment returned from there. It's possible we could do better by walking the whole deref chain but this should be good enough. Fixes: f1883cc73d4 "iris: Set alignments on cbuf0 and constant reads" Closes: #2739 Reviewed-by: Eric Anholt <[email protected]> Tested-by: Ian Romanick <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
* ir3: Fix LDC offset unitsConnor Abbott2020-04-151-0/+6
| | | | | | | | | | | | I had missed that LDC actually uses vec4 units for its offset. This means that we have to create a new instruction, and lower it in ir3_nir_lower_io_offsets, similar to the existing SSBO instructions. Unfortunately we can't assume that loads are always vec4-aligned, so we have to use the alignment information that NIR gives us. Unfortunately, it's currently woefully inadequate, and will have to be fixed to give us good codegen in the future. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>
* ir3: Plumb through bindless supportConnor Abbott2020-04-091-0/+6
| | | | Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
* nir: Add SSBO->global lowering passAlyssa Rosenzweig2020-02-211-0/+2
| | | | | | | | | | | | | To facilitate lowering SSBOs to globals, we need a load_ssbo_address intrinsic. This intrinsic takes an SSBO index and loads the address in global memory of the SSBO (likely implemented via a uniform in the driver). In the future, we'll support bounds checking, but at the moment this is not supported (this pass should only be used for trusted contexts at the moment, i.e. contexts without robustness extensions). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
* r600/sfn: Add lowering UBO access to r600 specific codesGert Wollny2020-02-101-0/+8
| | | | | | | | | | r600 reads vec4 from the UBO, but the offsets in nir are evaluated to the component. If the offsets are not literal then all non-vec4 reads must resolve the component after reading a vec4 component (TODO: figure out whether there is a consistent way to deduct the component that is actually read). Signed-off-by: Gert Wollny <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3225>
* pan/midgard: Turn Z/S stores into zs_output_pan intrinsicsBoris Brezillon2020-02-051-0/+1
| | | | | | | | | | | | | | Midgard can't write depth and stencil separately. It has to happen in a single store operation containing both. Let's add a panfrost specific intrinsic and turn all depth/stencil stores into a packed depth+stencil one. Note that this intrinsic is not yet handled in emit_intrinsic(), but we'll address that later. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
* nir: lower interp_deref_at_vertex to load_input_vertexSamuel Pitoiset2020-01-291-0/+2
| | | | | | | | | This introduces a new NIR intrinsic for loading inputs at a specific vertex index. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
* nir: add nir_intrinsic_interp_deref_at_vertexSamuel Pitoiset2020-01-291-3/+5
| | | | | | | | | | | From the SPV_AMD_shader_explicit_vertex_parameter extension: "Returns the value of the input <interpolant> without any interpolation, i.e. the raw output value of previous shader stage." Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
* nir: add nir_intrinsic_load_barycentric_modelSamuel Pitoiset2020-01-291-10/+12
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
* nir: Rename nir_intrinsic_barrier to control_barrierJason Ekstrand2020-01-131-1/+6
| | | | | | | | This is a more explicit name now that we don't want it to be doing any memory barrier stuff for us. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
* nir: Add a new memory_barrier_tcs_patch intrinsicJason Ekstrand2020-01-131-0/+3
| | | | | | | | | | | Right now, it's implemented as a no-op for everyone. For most drivers, it's a switch case in the NIR -> whatever which just breaks. For ir3, they already have code to delete tessellation barriers so we just add a case to also delete memory_barrier_tcs_patch. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
* spirv,nir: add new lod parameter to image_{load,store} intrinsicsSamuel Pitoiset2020-01-091-2/+2
| | | | | | | | | | | | SPV_AMD_shader_image_load_store_lod allows to use a lod parameter with OpImageRead, OpImageWrite and OpImageSparseRead. According to the specification, this parameter should be a 32-bit integer. It is initialized to 0 when no lod parameter is found during SPIR-V->NIR translation. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* v3d: handle writes to gl_Layer from geometry shadersIago Toral Quiroga2019-12-161-0/+4
| | | | | | | | | | | | | | | | When geometry shaders write a value to gl_Layer that doesn't correspond to an existing layer in the target framebuffer the rendering behavior is undefined according to the spec, however, there are CTS tests that trigger this scenario on purpose, probably to ensure that nothing terrible happens. For V3D, this situation is problematic because the binner uses the layer index to select the offset to write into the tile state data, and we only allocate tile state for MAX2(num_layers, 1), so we want to make sure we don't produce values that would lead to out of bounds writes. The simulator has an assert to catch this, although we haven't observed issues in actual hardware it is probably best to play safe. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: Add load_sampler_lod_paramaters_pan intrinsicAlyssa Rosenzweig2019-11-221-0/+4
| | | | | | | | | This loads in the <min_lod, max_lod, lod_bias> settings for a given sampler, which is necessary for lowering clamps/biases on certain Midgard chips. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* nir: Add load_output_u8_as_fp16_pan intrinsicAlyssa Rosenzweig2019-11-111-0/+6
| | | | | | | | | This is a single opcode, at least on newer Midgard chips. It's easier to have this represented in NIR rather than trying to optimize out the conversions, so let's add the intrinsic. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
* freedreno/ir3: Implement TCS synchronization intrinsicsKristian H. Kristensen2019-11-071-0/+8
| | | | | | | | | We add two new IR3 specific nir intrinsics that map to the new condend and endpatch instructions. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Add ir3 intrinsics for tessellationKristian H. Kristensen2019-11-071-0/+6
| | | | | | | | | These provide the iovas for system memory buffers used for tessellation as well as a new HW specific system value. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Add load and store intrinsics for global ioKristian H. Kristensen2019-11-071-0/+11
| | | | | | | | | These intrinsics take a ivec2 for the 64 bit base address and a integer offset. Signed-off-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: Add scoped_memory_barrier intrinsicCaio Marcelo de Oliveira Filho2019-10-241-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a NIR instrinsic that represent a memory barrier in SPIR-V / Vulkan Memory Model, with extra attributes that describe the barrier: - Ordering: whether is an Acquire or Release; - "Cache control": availability ("ensure this gets written in the memory") and visibility ("ensure my cache is up to date when I'm reading"); - Variable modes: which memory types this barrier applies to; - Scope: how far this barrier applies. Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory Semantics" are split into two different attributes so we can use variable modes for the former. NIR passes that took barriers in consideration were also changed - nir_opt_copy_prop_vars: clean up the values for the mode of an ACQUIRE barrier. Copy propagation effect is to "pull up a load" (by not performing it), which is what ACQUIRE restricts. - nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the pending writes for the modes of an RELEASE barrier. Dead writes effect is to "push down a store", which is what RELEASE restricts. - nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for the modes. This is conservative, but since this is a GL-specific pass, doesn't make a difference for now. v2: Fix the scoped barrier handling in copy propagation. (Jason) Add scoped barrier handling to nir_opt_access and nir_opt_combine_writes. (Rhys) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "nir: drop unused alpha_ref_float"Erik Faye-Lund2019-10-231-0/+1
| | | | | | | This reverts commit e8095f2af0736b5937674ca319f29cc9dabb17d4. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jose Maria Casanova <[email protected]>
* freedreno/ir3: Implement lowering passes for VS and GSKristian H. Kristensen2019-10-171-0/+8
| | | | | | | | This introduces two new lowering passes. One to lower VS to explicit outputs using STLW and one to lower GS to load input using LDLW and implement the GS specific functionality. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add intrinsics that map to LDLW/STLWKristian H. Kristensen2019-10-171-0/+8
| | | | | | | These intrinsics will let us do all the offset calculations in nir, which is nicer to work with and lets nir_opt_algebraic eat it all up. Signed-off-by: Kristian H. Kristensen <[email protected]>
* nir: drop unused alpha_ref_floatErik Faye-Lund2019-10-171-1/+0
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2019-08-211-2/+4
| | | | | | | | | | | This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* compiler: add SYSTEM_VALUE_TESS_LEVEL_OUTER/INNER_DEFAULTMarek Olšák2019-08-121-0/+2
| | | | | | TCS system values for internal passthru TCS, needed by radeonsi NIR support Reviewed-by: Connor Abbott <[email protected]>
* compiler: add SYSTEM_VALUE_USER_DATA_AMDMarek Olšák2019-08-121-0/+3
| | | | for internal radeonsi shaders
* nir: add atomic_inc_wrap/atomic_dec_wrap image intrinsicsPierre-Eric Pelloux-Prayer2019-08-061-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond)Daniel Schürmann2019-07-241-1/+2
| | | | | | | This will effectively enable the optimization in anv. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add gl_PointCoord system valueAndreas Baierl2019-07-181-0/+1
| | | | | | | | | | gl_PointCoord handling needs some special bits set in lima/ppir code generation. Treating gl_PointCoord as a system value makes it easier to distinguish from a regular varying. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add a V3D-specific intrinsic for per-sample color writesIago Toral Quiroga2019-07-181-0/+9
| | | | | | | | | | | For per-sample color writes we need the output intrinsic to pack the sample index, which is not provided with regular store_output intrinsics unless we figured out a way to encode it into the base or the offset. v2: - Drop the writemask (Eric) Reviewed-by: Eric Anholt <[email protected]>
* nir: add a new v3d-specific intrinsic for tile buffer color readsIago Toral Quiroga2019-07-121-0/+9
| | | | | | | | | | This is intended to be used, for example, with OpenGL logic operations. It takes a render target as source and a sample index in the base index for MSAA color reads. v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric). Reviewed-by: Eric Anholt <[email protected]>
* nir: Add Panfrost-specific blending intrinsicAlyssa Rosenzweig2019-07-091-0/+16
| | | | | | | | | | This gives more flexibility than the normal store_deref/store_output versions (particularly, it allows us to abuse the type system in awful ways, which is necessary for efficient format conversion in blend shaders.) Signed-off-by: Alyssa Rosenzweig <[email protected]> Acked-by: Karol Herbst <[email protected]>
* nir: Add demote and is_helper_invocation intrinsicsCaio Marcelo de Oliveira Filho2019-07-081-0/+10
| | | | | | | | | From SPV_EXT_demote_to_helper_invocation. Demote will be implemented as a variant of discard, so mark uses_discard if it is used. v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler: Add color system valueConnor Abbott2019-07-081-0/+6
| | | | | | | | This is nice to have with radeonsi, where color varyings are handled specially to avoid recompiles. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: add pass to lower load_interpolated_inputRob Clark2019-07-021-0/+13
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Allow qualifiers on copy_deref and image instructionsConnor Abbott2019-06-191-2/+5
| | | | | | | | In the next commit, we'll properly handle access qualifiers on struct members by propagating them to load/store instructions, but these instructions had no way to specify the qualifier. Reviewed-by: Timothy Arceri <[email protected]>
* nir: add intrinsics for AMD_shader_ballotDaniel Schürmann2019-06-131-0/+10
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: add type information to load uniform/input and store output intrinsicsJonathan Marek2019-05-311-3/+5
| | | | | | | This type information will be used by gather_ssa_types to get usable results Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add blend_const_color_rgba sysvalAlyssa Rosenzweig2019-05-101-1/+4
| | | | | | | | | | | | | This represents a float vec4 constant color, as passed to glBlendColor. While the existing 4 shader sysvals are retained to minimize code churn, a single vectorized intrinsic is required for efficient blending on vector architectures. (This may also apply to archictectures like Bifrost where ALU is scalar but load/store is vector; it largely depends on how blending is implemented per-driver.) Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3: lower load_barycentric_at_offsetRob Clark2019-04-251-0/+3
| | | | | | | | | Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: lower load_barycentric_at_sampleRob Clark2019-04-251-0/+7
| | | | | | | This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <[email protected]>
* nir: Add a pass for selectively lowering variables to scratch spaceJason Ekstrand2019-04-121-1/+4
| | | | | | | | | | This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <[email protected]>
* nir: Add a comment about how intrinsic definitions work.Eric Anholt2019-04-121-0/+11
| | | | | | I was thinking about a refactor, and needed to read this first. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Drop remaining references to const_index in favor of the call to use.Eric Anholt2019-04-121-5/+5
| | | | | | Please don't make me read a const_index[] expression ever again. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Drop comments about the constant_index slots for load/stores.Eric Anholt2019-04-121-21/+15
| | | | | | | | | | The constant_index slots are named right there in the intrinsic definition, and the comment is just a chance to get out of sync. Noticed while reviewing the lower_to_scratch changes that copy-and-pasted wrong comments, and load_ubo and load_per_vertex_output had incorrect comments currently. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add access qualifiers on load_ubo intrinsic.Bas Nieuwenhuizen2019-04-101-1/+1
| | | | | | | | | Otherwise nir_lower_non_uniform_access crashes when it tries to get the access of a load_ubo. Fixes: 8ed583fe523 "spirv: Handle the NonUniformEXT decoration" Fixes: e50ab2c0f23 "nir: Add access flags to deref and SSBO atomics" Reviewed-by: Samuel Pitoiset <[email protected]>