aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler/nir
Commit message (Collapse)AuthorAgeFilesLines
* nir/lower_io_to_temporaries: Handle interpolation intrinsicsConnor Abbott2019-07-081-0/+166
| | | | | | | These weren't properly supported. This does pretty much the same thing that the radv code did. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Avoid coalescing vars created by lower_io_to_temporariesConnor Abbott2019-07-083-0/+20
| | | | | | | | | | | | | | | Right now nir_copy_prop_vars is effectively undoing nir_lower_io_to_temporaries for inputs by propagating the original variable through the copy created in lower_io_to_temporaries. A theoretical variable coalescing pass would have the same issue with output variables, although that doesn't exist yet. To fix this, add a new bit to nir_variable, and disable copy propagation when it's set. This doesn't seem to affect any drivers now, probably since since no one uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but it will fix radv once we flip on lower_io_to_temporaries for fs inputs. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Return correct size in nir_assign_io_var_locations()Connor Abbott2019-07-081-2/+4
| | | | | | | | It was double-counting cases where multiple variables were assigned to the same slot, and not handling the case where the last variable is a compact variable. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Handle compact variables when assigning i/o locationsConnor Abbott2019-07-081-2/+22
| | | | | | | These are used in Vulkan for clip/cull distances, instead of the GLSL lowering when the clip/cull arrays are shared. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Move st_nir_assign_var_locations() to common codeConnor Abbott2019-07-082-0/+114
| | | | | | | | It isn't really doing anything Gallium-specific, and it's needed for handling component packing, overlapping, etc. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Make FragCoord a sysvalConnor Abbott2019-07-082-5/+9
| | | | | | | | | | load_fragcoord is already handled in common code for radeonsi, so we don't need to do anything to handle it. However, there were some passes creating NIR with the varying, so we switch them over to the sysval. In the case of nir_lower_input_attachments which is used by both radv and anv, we add handling for both until intel switches to using a sysval. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv,nir: Move lower_input_attachments pass from ANV to NIR.Daniel Schürmann2019-07-083-0/+153
| | | | | Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: add pass to lower load_interpolated_inputRob Clark2019-07-025-0/+192
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add optimization to use ROR/ROL instructionsSagar Ghuge2019-07-012-0/+15
| | | | | | | | v2: 1) Add more optimization rules for ROL/ROR (Matt Turner) 2) Add lowering rules for ROL/ROR (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add urol and uror opcodesSagar Ghuge2019-07-011-0/+11
| | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: add is_in_ubo/ssbo/block helpersAlejandro Piñeiro2019-06-301-0/+20
| | | | | | | | | Equivalent to the already existing ir_variable is_in_buffer_block and is_in_shader_storage_block, adding the uniform buffer object one. I'm using the short forms (ssbo, ubo) to avoid having method names too long. Reviewed-by: Timothy Arceri <[email protected]>
* nir/serach: Increase maximum commutative expressions from 4 to 8Ian Romanick2019-06-282-2/+4
| | | | | | | | No shader-db change on any Intel platform. No shader-db run-time difference on a certain 36-core / 72-thread system at 95% confidence (n=20). Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: Don't mark expression with duplicate sources as commutativeIan Romanick2019-06-281-1/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason to mark the fmul in the expression ('fmul', ('fadd', a, b), ('fadd', a, b)) as commutative. If a source of an instruction doesn't match one of the ('fadd', a, b) patterns, it won't match the other either. This change is enough to make this pattern work: ('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)), ('fadd', 1.0, ('fneg', a))), ('fmul', ('flrp', a, 1.0, a), b)) This pattern has 5 commutative expressions (versus a limit of 4), but the first fmul does not need to be commutative. No shader-db change on any Intel platform. No shader-db run-time difference on a certain 36-core / 72-thread system at 95% confidence (n=20). There are more subpatterns that could be marked as non-commutative, but detecting these is more challenging. For example, this fadd: ('fadd', ('fmul', a, b), ('fmul', a, c)) The first fadd: ('fmul', ('fadd', a, b), ('fadd', a, b)) And this fadd: ('flt', ('fadd', a, b), 0.0) This last case may be easier to detect. If all sources are variables and they are the only instances of those variables, then the pattern can be marked as non-commutative. It's probably not worth the effort now, but if we end up with some patterns that bump up on the limit again, it may be worth revisiting. v2: Update the comment about the explicit "len(self.sources)" check to be more clear about why it is necessary. Requested by Connor. Many Python fixes style / idom fixes suggested by Dylan. Add missing (!!!) opcode check in Expression::__eq__ method. This bug is the reason the expected number of commutative expressions in the bitfield_reverse pattern changed from 61 to 45 in the first version of this patch. v3: Use all() in Expression::__eq__ method. Suggested by Connor. Revert away from using __eq__ overloads. The "equality" implementation of Constant and Variable needed for commutativity pruning is weaker than the one needed for propagating and validating bit sizes. Using actual equality caused the pruning to fail for my ('fmul', ('fadd', 1, a), ('fadd', 1, a)) case. I changed the name to "equivalent" rather than the previous "same_as" to further differentiate it from __eq__. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir/search: Log Boolean constants instead of assertingIan Romanick2019-06-281-0/+3
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: Fail build when too many commutative expressions are usedIan Romanick2019-06-283-1/+51
| | | | | | | | | | | | | | Search patterns that are expected to have too many (e.g., the giant bitfield_reverse pattern) can be added to a white list. This would have saved me a few hours debugging. :( v2: Implement the expected-failure annotation as a property of the search-replace pattern instead of as a property of the whole list of patterns. Suggested by Connor. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir/algebraic: Fix whitespace errorIan Romanick2019-06-281-1/+0
| | | | | | | Trivial Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* nir: Fix lowering of bitfield_insert to shifts.Eric Anholt2019-06-281-3/+5
| | | | | | | | | | | | | | | The bfi/bfm behavior change replaced the bfi/bfm usage in lower_bitfield_insert_to_shifts with actual shifts like the name says, but it failed to handle the offset=0, bits==32 case in the new lowering. v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits, 31) -> false optimization. Fixes regressions in dEQP-GLES31.*bitfield_insert* on freedreno. Fixes: 165b7f3a4487 ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.") Reviewed-by: Daniel Schürmann <[email protected]>
* nir/algebraic: Add helpers and a rule involving wrappingCaio Marcelo de Oliveira Filho2019-06-262-0/+15
| | | | | | | | The helpers are needed so we can use the syntax `instr(cond)` in the algebraic rules. Add simple rule for dropping a pair of mul-div of the same value when wrapping is guaranteed to not happen. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a no wrapping bits to nir_alu_instrCaio Marcelo de Oliveira Filho2019-06-265-8/+36
| | | | | | | | | | | | | | | | | | | | | | They indicate the operation does not cause overflow or underflow. This is motivated by SPIR-V decorations NoSignedWrap and NoUnsignedWrap. Change the storage of `exact` to be a single bit, so they pack together. v2: Handle no_wrap in nir_instr_set. (Karol) v3: Use two separate flags, since the NIR SSA values and certain instructions are typeless, so just no_wrap would be insufficient to know which one was referred to. (Connor) v4: Don't use nir_instr_set to propagate the flags, unlike `exact`, consider the instructions different if the flags have different values. Fix hashing/comparing. (Jason) Reviewed-by: Karol Herbst <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]>
* nir: remove fnot/fxor/fand/for opcodesJonathan Marek2019-06-264-19/+2
| | | | | | | | | | There doesn't seem to be any reason to keep these opcodes around: * fnot/fxor are not used at all. * fand/for are only used in lower_alu_to_scalar, but easily replaced Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: opt_vectorize: combine different constant sourcesJonathan Marek2019-06-261-2/+25
| | | | | | | | We can vectorize instructions with different constant sources by creating a new load_const and using that. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* nir: add tess support to nir_lower_clamp_color_outputs()Timothy Arceri2019-06-261-0/+1
| | | | | | | This will be used to add compat profile support for higher GL versions. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: introduce lowering of bitfield_insert to bfm and a new opcode ↵Daniel Schürmann2019-06-243-0/+11
| | | | | | | | | | bitfield_select. bitfield_select is defined as: bitfield_select(mask, base, insert) = (mask & base) | (~mask & insert) matching the behavior of AMD's BFI instruction. Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: Use unsigned comparison when lowering bitfield insert/extractDaniel Schürmann2019-06-241-2/+2
| | | | | | | | This lets us use the optimization pattern (('ult', 31, ('iand', b, 31)), False) to remove the bcsel instruction for code originating in D3D shaders. Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: Remove unnecessary iand of [iu]bfe and bfm sourcesDaniel Schürmann2019-06-241-0/+8
| | | | | | | | The [iu]bfe and bfm instructions are defined to only use the five least significant bits. This optimizes a common pattern from D3D -> SPIR-V translation. Reviewed-by: Connor Abbott <[email protected]>
* nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.Daniel Schürmann2019-06-243-31/+18
| | | | | | | | | | | That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: add optimization pattern for ('ult', a, ('and', b, a)) and ↵Daniel Schürmann2019-06-241-0/+4
| | | | | | | | | | | friends. These optimizations are based on the fact that 'and(a,b) <= umin(a,b)'. For AMD, this series moves the optimization from LLVM to NIR, so currently no vkpipeline-db changes here. Reviewed-by: Ian Romanick <[email protected]>
* nir/lower_tex: Add an assert() in nir_lower_txs_lod()Boris Brezillon2019-06-201-0/+1
| | | | | | | | | We don't expect the output of a TXS instruction to be wider than a vec3. Add an assert() to make sure this never happens. Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* nir: Make nir_constant a vector rather than a matrixJason Ekstrand2019-06-193-41/+40
| | | | | | | | | | Most places in NIR, we treat matrices like arrays. The one annoying exception to this has been nir_constant where a matrix is a first-class thing. This commit changes that so a matrix nir_constant is the same as an array nir_constant. This makes matrix nir_constants a tiny bit more expensive but shrinks all others by 96B. Reviewed-by: Karol Herbst <[email protected]>
* nir: Use reorderable access flagConnor Abbott2019-06-191-4/+12
| | | | | | No changes with radeonsi shader-db. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add a helper to determine if an intrinsic can be reorderedConnor Abbott2019-06-193-11/+13
| | | | | | | This is simple now, but we're going to be adding a few more conditions to this later. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add reorderable memory access enumConnor Abbott2019-06-191-1/+2
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* nir/copy_prop_vars: Ignore volatile accessesConnor Abbott2019-06-191-0/+13
| | | | | | | | | The spec explicitly says that volatile writes can't be removed and volatile reads do not guarantee that the same value will still be around after the read, as if there were a barrier after each read/write. Just ignore them. Reviewed-by: Timothy Arceri <[email protected]>
* glsl/nir: Propagate access qualifiersConnor Abbott2019-06-191-1/+0
| | | | | | | | | | We were completely ignoring these before, except for putting them on variables. While we're here, don't set access qualifiers when converting to bindless since glsl_to_nir will already have set a more accurate qualifier that includes any qualifiers on struct members that are dereferenced. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Allow qualifiers on copy_deref and image instructionsConnor Abbott2019-06-196-12/+48
| | | | | | | | In the next commit, we'll properly handle access qualifiers on struct members by propagating them to load/store instructions, but these instructions had no way to specify the qualifier. Reviewed-by: Timothy Arceri <[email protected]>
* nir: add a vectorization passConnor Abbott2019-06-183-0/+456
| | | | | | | | | | | | | | | | | | | | | | | | | | This effectively does the opposite of nir_lower_alus_to_scalar, trying to combine per-component ALU operations with the same sources but different swizzles into one larger ALU operation. It uses a similar model as CSE, where we do a depth-first approach and keep around a hash set of instructions to be combined, but there are a few major differences: 1. For now, we only support entirely per-component ALU operations. 2. Since it's not always guaranteed that we'll be able to combine equivalent instructions, we keep a stack of equivalent instructions around, trying to combine new instructions with instructions on the stack. The pass isn't comprehensive by far; it can't handle operations where some of the sources are per-component and others aren't, and it can't handle phi nodes. But it should handle the more common cases, and it should be reasonably efficient. [Alyssa: Rebase on latest master, updating with respect to typeless moves] Acked-by: Alyssa Rosenzweig <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* nir/lower_tex: Add a way to lower TXS(non-0-LOD) instructionsBoris Brezillon2019-06-182-0/+52
| | | | | | | | | | | | | | | The V3D driver has an open-coded solution for this, and we need the same thing for Panfrost, so let's add a generic way to lower TXS(LOD) into max(TXS(0) >> LOD, 1). Changes in v2: * Use == 0 instead of ! * Rework the minification logic as suggested by Jason * Assign cursor pos at the beginning of the function * Patch the LOD just after retrieving the old value Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* nir/lower_tex: Update ->sampler_dim value before calling get_texture_size()Boris Brezillon2019-06-181-2/+5
| | | | | | | | | | | | | | | | | | | | | | get_texture_size() will create a txs instruction with ->sampler_dim set to the original tex->sampler_dim. The condition to call lower_rect() only checks the value of ->sampler_dim and whether lower_rect is requested or not. This leads to an infinite loop when calling nir_lower_tex() with the same options until it returns false. In order to avoid that, let's move the tex->sampler_dim patching before get_texture_size() is called. This way the txs instruction will have ->sampler_dim set to GLSL_SAMPLER_DIM_2D and nir_lower_tex() won't try to lower it on the subsequent passes. Changes in v2: * Add Jason R-b * Add a comment explaining why we patch ->sampler_dim at the beginning of the lower_rect() func Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* nir/lower_tex: Actually report when projector lowering happenedBoris Brezillon2019-06-181-4/+4
| | | | | | | | | | | | | | | | | The code considers that projector lowering was done even if it's not really the case. Change the project_src() prototype to return a bool encoding whether projector lowering happened or not and update the progress var accordingly in nir_lower_tex_block(). --- Changes in v2: * Add Jason R-b * Drop the part suggesting that nir_lower_rect() could be called in a do-while(progress) loop. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* nir: detect more dynamically uniform expressionsIago Toral Quiroga2019-06-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results for v3d: total instructions in shared programs: 9132728 -> 9119238 (-0.15%) instructions in affected programs: 596886 -> 583396 (-2.26%) helped: 1118 HURT: 224 total threads in shared programs: 234298 -> 234308 (<.01%) threads in affected programs: 10 -> 20 (100.00%) helped: 5 HURT: 0 total uniforms in shared programs: 3022949 -> 3022622 (-0.01%) uniforms in affected programs: 29163 -> 28836 (-1.12%) helped: 108 HURT: 37 total max-temps in shared programs: 1328030 -> 1327762 (-0.02%) max-temps in affected programs: 10097 -> 9829 (-2.65%) helped: 263 HURT: 15 total spills in shared programs: 3793 -> 3777 (-0.42%) spills in affected programs: 432 -> 416 (-3.70%) helped: 16 HURT: 0 total fills in shared programs: 4380 -> 4266 (-2.60%) fills in affected programs: 828 -> 714 (-13.77%) helped: 16 HURT: 0 Reviewed-by: Eric Anholt <[email protected]>
* nir: Don't manually index intrinsic index enumConnor Abbott2019-06-131-20/+20
| | | | | | | This fixes a rebase fail in ea51275e07b, and prevents it from happening again. There's no reason to do this manually. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add intrinsics for AMD_shader_ballotDaniel Schürmann2019-06-133-0/+31
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/opt_algebraic: Fix rules for imadsh_mix16Eduardo Lima Mitev2019-06-101-2/+2
| | | | | | | | | | | | | | | | | | The rules added in patch 3addd7c are inverted: It should be: (al * bh) << 16 + c instead of: (ah * bl) << 16 + c Fixes a number of regressions under dEQP-GLES31.functional.draw_indirect.compute_interop.large.* on Freedreno. Reviewed-by: Rob Clark <[email protected]>
* nir: fix s/&&/||/ typoEric Engestrom2019-06-071-1/+1
| | | | | | Fixes: cd73b6174b093b75f581 "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16Eduardo Lima Mitev2019-06-072-0/+55
| | | | | | | | | | | | | | | | | | For umul_low (al * bl), zero is returned if the low 16-bits word of either source is zero. for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl' is zero. A couple of nir_search_helpers are added: is_upper_half_zero() returns true if the highest word of all components of an integer NIR alu src are zero. is_lower_half_zero() returns true if the lowest word of all components of an integer nir alu src are zero. Reviewed-by: Eric Anholt <[email protected]>
* nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodesEduardo Lima Mitev2019-06-071-1/+14
| | | | | | | | | | | | | 'umul_low' is the low 32-bits of unsigned integer multiply. It maps directly to ir3's MULL_U. 'imadsh_mix16' is multiply add with shift and mix, an ir3 specific instruction that maps directly to ir3's IMADSH_M16. Both are necessary for the lowering of integer multiplication on Freedreno, which will be introduced later in this series. Reviewed-by: Eric Anholt <[email protected]>
* nir/propagate_invariant: Don't add NULL vars to the hash tableJason Ekstrand2019-06-061-1/+10
| | | | | | | Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars" Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Combine lower_fmod16/32 back into a single lower_fmod.Kenneth Graunke2019-06-052-5/+4
| | | | | | | | | | | | | | We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
* nir: Drop lower_fmod64 option.Kenneth Graunke2019-06-052-2/+0
| | | | | | | | nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <[email protected]>
* nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1Jason Ekstrand2019-06-052-10/+16
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rob Clark <[email protected]>