summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir/nir.h
Commit message (Collapse)AuthorAgeFilesLines
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <[email protected]> (v1)
* glsl: add subpass image type (v2)Dave Airlie2016-09-161-0/+1
| | | | | | | | | | | | | | | | | | SPIR-V/Vulkan have a special image type for input attachments called the subpass type. It has different characteristics than other images types. The main one being it can only be an input image to fragment shaders and loads from it are relative to the frag coord. This adds support for it to the GLSL types. Unfortunately we've run out of space in the sampler dim in types, so we need to use another bit. v2: Fixup subpass input name (Jason) Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir: Add a flag to lower_io to force "sample" interpolationJason Ekstrand2016-09-151-1/+9
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Report progress from nir_lower_phis_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Report progress from nir_lower_alu_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: move tex_instr_remove_srcRob Clark2016-09-141-0/+2
| | | | | | | I want to re-use this in a different pass, so move to nir.h Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/gcm: Add global value numbering supportJason Ekstrand2016-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike the current CSE pass, global value numbering is capable of detecting common values even if one does not dominate the other. For instance, in you have if (...) { ssa_1 = ssa_0 + 7; /* use ssa_1 */ } else { ssa_2 = ssa_0 + 7; /* use ssa_2 */ } Global value numbering doesn't care about dominance relationships so it figures out that ssa_1 and ssa_2 are the same and converts this to if (...) { ssa_1 = ssa_0 + 7; /* use ssa_1 */ } else { /* use ssa_1 */ } Obviously, we just broke SSA form which is bad. Global code motion, however, will repair this for us by turning this into ssa_1 = ssa_0 + 7; if (...) { /* use ssa_1 */ } else { /* use ssa_1 */ } This intended to eventually mostly replace CSE. However, conventional CSE may still be useful because it's less of a scorched-earth approach and doesn't require GCM. This makes it a bit more appropriate for use as a clean-up in a late optimization run. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: remove some fields from nir_shader_compiler_optionsConnor Abbott2016-09-031-3/+0
| | | | I accidentally added these with 0dc4cab. Oops!
* nir: add nir_after_phis() cursor helperConnor Abbott2016-09-031-5/+14
| | | | | | | And re-implement nir_after_cf_node_and_phis() using it. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Change nir_shader_get_entrypoint to return an impl.Kenneth Graunke2016-08-251-3/+5
| | | | | | | | | | | | | | | | | Jason suggested adding an assert(function->impl) here. All callers of this function actually want ->impl, so I decided just to change the API. We also change the nir_lower_io_to_temporaries API here. All but one caller passed nir_shader_get_entrypoint(), and with the previous commit, it now uses a nir_function_impl internally. Folding this change in avoids the need to change it and change it back. v2: Fix one call I missed in ir3_compiler (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.Francisco Jerez2016-08-251-0/+9
| | | | | | | | | The NIR representation of framebuffer fetch is the same as the GLSL IR's until interface variables are lowered away, at which point it will be translated to load output intrinsics. The GLSL-to-NIR pass just needs to copy the bits over to the NIR program. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add an IO scalarizing pass using the intrinsic's first_component.Eric Anholt2016-08-191-0/+1
| | | | | | | | | | vc4 wants to have per-scalar IO load/stores so that dead code elimination can happen on a more granular basis, which it has been doing in the backend using a multiplication by 4 of the intrinsic's driver_location. We can represent it properly in the NIR using the first_component field, though. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Move the undef of nir_intrinsics.h macros to the .h.Eric Anholt2016-08-191-3/+0
| | | | | | | I wanted to include this from nir_builder as well, so it also needed the undefs. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Make nir_alu_srcs_equal non-static.Kenneth Graunke2016-08-041-0/+3
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* glsl: Separate overlapping sentinel nodes in exec_list.Matt Turner2016-07-261-2/+2
| | | | | | | | | | | I do appreciate the cleverness, but unfortunately it prevents a lot more cleverness in the form of additional compiler optimizations brought on by -fstrict-aliasing. No difference in OglBatch7 (n=20). Co-authored-by: Davin McCall <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir/lower_tex: Add support for lowering coordinate offsetsJason Ekstrand2016-07-221-0/+10
| | | | | | | | | | On i965, we can't support coordinate offsets for texelFetch or rectangle textures. Previously, we were doing this with a GLSL pass but we need to do it in NIR if we want those workarounds for SPIR-V. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir: Add a helper for determining the type of a texture sourceJason Ekstrand2016-07-221-0/+44
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir: Add a nir_deref_foreach_leaf helperJason Ekstrand2016-07-201-0/+4
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir: Add nir_load_interpolated_input lowering code.Kenneth Graunke2016-07-201-0/+8
| | | | | | | | | | | | | | Now nir_lower_io can optionally produce load_interpolated_input and load_barycentric_* intrinsics for fragment shader inputs. flat inputs continue using regular load_input. v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean passing (in response to review feedback from Chris Forbes). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add new intrinsics for fragment shader input interpolation.Kenneth Graunke2016-07-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backends can normally handle shader inputs solely by looking at load_input intrinsics, and ignore the nir_variables in nir->inputs. One exception is fragment shader inputs. load_input doesn't capture the necessary interpolation information - flat, smooth, noperspective mode, and centroid, sample, or pixel for the location. This means that backends have to interpolate based on the nir_variables, then associate those with the load_input intrinsics (say, by storing a map of which variables are at which locations). With GL_ARB_enhanced_layouts, we're going to have multiple varyings packed into a single vec4 location. The intrinsics make this easy: simply load N components from location <loc, component>. However, working with variables and correlating the two is very awkward; we'd much rather have intrinsics capture all the necessary information. Fragment shader input interpolation typically works by producing a set of barycentric coordinates, then using those to do a linear interpolation between the values at the triangle's corners. We represent this by introducing five new load_barycentric_* intrinsics: - load_barycentric_pixel (ordinary variable) - load_barycentric_centroid (centroid qualified variable) - load_barycentric_sample (sample qualified variable) - load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample()) - load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset()) Each of these take the interpolation mode (smooth or noperspective only) as a const_index, and produce a vec2. The last two also take a sample or offset source. We then introduce a new load_interpolated_input intrinsic, which is like a normal load_input intrinsic, but with an additional barycentric coordinate source. The intention is that flat inputs will still use regular load_input intrinsics. This makes them distinguishable from normal inputs that need fancy interpolation, while also providing all the necessary data. This nicely unifies regular inputs and interpolateAt functions. Qualifiers and variables become irrelevant; there are just load_barycentric intrinsics that determine the interpolation. v2: Document the interp_mode const_index value, define a new BARYCENTRIC() helper rather than using SYSTEM_VALUE() for some of them (requested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.Kenneth Graunke2016-07-171-1/+1
| | | | | | | | | | | | | | | | | Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Dave Airlie <[email protected]>
* nir: use the same driver location for packed varyingsTimothy Arceri2016-07-071-2/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir: add new intrinsic field for storing component offsetTimothy Arceri2016-07-071-0/+6
| | | | | | This offset is used for packing. Reviewed-by: Kenneth Graunke <[email protected]>
* Remove wrongly repeated words in commentsGiuseppe Bilotta2016-06-231-1/+1
| | | | | | | | | | | | | | | | | Clean up misrepetitions ('if if', 'the the' etc) found throughout the comments. This has been done manually, after grepping case-insensitively for duplicate if, is, the, then, do, for, an, plus a few other typos corrected in fly-by v2: * proper commit message and non-joke title; * replace two 'as is' followed by 'is' to 'as-is'. v3: * 'a integer' => 'an integer' and similar (originally spotted by Jason Ekstrand, I fixed a few other similar ones while at it) Signed-off-by: Giuseppe Bilotta <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* nir: Add a pass for propagating invariant decorationsJason Ekstrand2016-06-201-0/+2
| | | | | | | | | | This pass is similar to propagate_invariance in the GLSL compiler. The real "output" of this pass is that any algebraic operations which are eventually consumed by an invariant variable get marked as "exact". Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir/info: Get rid of uses_interp_var_at_offsetJason Ekstrand2016-06-031-3/+0
| | | | | | | | | We were using this briefly in the i965 driver to trigger recompiles but we haven't been using it since we switched to the NIR y-transform lowering pass. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: support for power-of-two optimizationsRob Clark2016-06-031-0/+3
| | | | | | | | | | | | Some optimizations, like converting integer multiply/divide into left/ right shifts, have additional constraints on the search expression. Like requiring that a variable is a constant power of two. Support these cases by allowing a fxn name to be appended to the search var expression (ie. "a#32(is_power_of_two)"). Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Make lowering gl_LocalInvocationIndex optionalJordan Justen2016-06-011-0/+2
| | | | | | Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Make nir_const_value a unionJason Ekstrand2016-05-261-9/+7
| | | | | | | | There's no good reason for it to be a struct of an anonymous union. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96221 Tested-by: Vinson Lee <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add a lowering pass for YUV texturesKristian Høgsberg Kristensen2016-05-241-0/+7
| | | | | | | | This lowers sampling from YUV textures to 1) one or more texture instructions to sample each plane and 2) color space conversion to RGB. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add new 'plane' texture source typeKristian Høgsberg Kristensen2016-05-241-0/+1
| | | | | | | | This will be used to select the plane to sample from for planar textures. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a simple nir_lower_wpos_center() pass for Vulkan drivers.Kenneth Graunke2016-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | nir_lower_wpos_ytransform() is great for OpenGL, which allows applications to choose whether their coordinate system's origin is upper left/lower left, and whether the pixel center should be on integer/half-integer boundaries. Vulkan, however, has much simpler requirements: the pixel center is always half-integer, and the origin is always upper left. No coordinate transform is needed - we just need to add <0.5, 0.5>. This means that we can avoid using (and setting up) a uniform. I thought about adding more options to nir_lower_wpos_ytransform(), but making a new pass that never even touched uniforms seemed simpler. v2: Use normal iterator rather than _safe variant (noticed by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir/print: add support for print annotationsRob Clark2016-05-171-0/+1
| | | | | | | | | | | | | | | Caller can pass a hashtable mapping NIR object (currently instr or var, but I guess others could be added as needed) to annotation msg to print inline with the shader dump. As the annotation msg is printed, it is removed from the hashtable to give the caller a way to know about any unassociated msgs. This is used in the next patch, for nir_validate to try to associate error msgs to nir_print dump. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: add double input bitmapJuan A. Suarez Romero2016-05-171-0/+2
| | | | | | This bitmap tracks which input attributes are double-precision. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Mark nir_start_block()/nir_impl_last_block() with returns_nonnull.Matt Turner2016-05-161-4/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a nir->info.uses_interp_var_at_offset flag.Kenneth Graunke2016-05-151-0/+3
| | | | | | | | | I've added this to nir_gather_info(), but also to glsl_to_nir() as a temporary measure, since the i965 GL driver today doesn't use nir_gather_info() yet. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: return progress from lower_idivRob Clark2016-05-151-1/+1
| | | | | | | | | | With algebraic-opt support for lowering div to shift, the driver would like to be able to run this pass *after* the main opt-loop, and then conditionally re-run the opt-loop if this pass actually lowered some- thing. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add texture opcodes and source types for multisample compressionJason Ekstrand2016-05-141-0/+6
| | | | | | | | | | | Intel hardware does a form of multisample compression that involves an auxilary surface called the MCS. When an MCS is in use, you have to first sample from the MCS with a special opcode and then pass the result of that operation into the next sample instrucion. Normally, we just do this ourselves in the back-end, but we want to expose that functionality to NIR so that we can use MCS values directly in NIR-based blorp. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add an info bit for uses_sample_qualifierJason Ekstrand2016-05-141-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Separate ffma lowering from fusingJason Ekstrand2016-05-111-0/+1
| | | | | | | | The i965 driver has its own pass for fusing mul+add combinations that's much smarter than what nir_opt_algebraic can do so we don't want to get the nir_opt_algebraic one just because we didn't set lower_ffma. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower-io: add support for lowering inputsRob Clark2016-05-111-1/+2
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: rename lower_outputs_to_temporaries -> lower_io_to_temporariesRob Clark2016-05-111-2/+2
| | | | | | Since it will gain support to lower inputs, give it a more generic name. Signed-off-by: Rob Clark <[email protected]>
* nir: lower-io-types passRob Clark2016-05-111-0/+1
| | | | | | | | | | | | | | | | | A pass to lower complex (struct/array/mat) inputs/outputs to primitive types. This allows, for example, linking that removes unused components of a larger type which is not indirectly accessed. In the near term, it is needed for gallium (mesa/st) support for NIR, since only used components of a type are assigned VBO slots, and we otherwise have no way to represent that to the driver backend. But it should be useful for doing shader linking in NIR. v2: use glsl_count_attribute_slots() rather than passing a type_size fxn pointer Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: passthrough-edgeflags supportRob Clark2016-05-111-0/+2
| | | | | | | | Handled by tgsi_emulate for glsl->tgsi case. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add lowering pass for glBitmapRob Clark2016-05-111-0/+7
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add lowering pass for glDrawPixelsRob Clark2016-05-111-0/+13
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add lowering pass for y-transformRob Clark2016-05-111-0/+11
| | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: remove now-unused nir_foreach_block*_call()Connor Abbott2016-05-051-38/+0
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Separate 32 and 64-bit fmod loweringSamuel Iglesias Gonsálvez2016-05-041-1/+2
| | | | | | | | Split 32-bit and 64-bit fmod lowering as the drivers might need to lower them separately inside NIR depending on the HW support. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir/lower_double_ops: lower mod()Samuel Iglesias Gonsálvez2016-05-041-1/+2
| | | | | | | | | | | There are rounding errors with the division in i965 that affect the mod(x,y) result when x = N * y. Instead of returning '0' it was returning 'y'. This lowering pass fixes those cases. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>