mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir: Add nir_lower_clip_disable.c to SCons build.	Vinson Lee	2020-07-04	1	-0/+1
\| \| \| \| \| \| \| \|	Fixes: fb2fe802f638 ("nir: add lowering pass for clip plane enabling") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3217 Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5741>
*	nir: add pass to lower disjoint wrmask's	Rob Clark	2020-05-13	1	-0/+1
\| \| \| \| \| \|	Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: Move nir_lower_mediump_outputs from ir3	Alyssa Rosenzweig	2020-04-27	1	-0/+1
\| \| \| \| \| \| \| \|	(Original code from ir3) Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4716>
*	glsl: Hard-code noise to zero in builtin_functions.cpp	Jason Ekstrand	2020-04-21	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Version 4.4 of the GLSL spec changed the definition of noise() to always return zero and earlier versions of the spec allowed zero as a valid implementation. All drivers, as far as I can tell, unconditionally call lower_noise() today which turns ir_unop_noise into zero. We've got a 10-year-old comment in there saying "In the future, ir_unop_noise may be replaced by a call to a function that implements noise." Well, it's the future now and we've not yet gotten around to that. In the mean time, the GLSL spec has made doing so illegal. To make things worse, we then pretend to handle the opcode in glsl_to_nir, ir_to_mesa, and st_glsl_to_tgsi even though it should never get there given the lowering. The lowering in st_glsl_to_tgsi defines noise() to be 0.5 which is an illegal implementation of the noise functions according to pre-4.4 specs. We also have opcodes for this in NIR which are never used because, again, we always call lower_noise(). Let's just kill the whole opcode and make builtin_builder.cpp build a bunch of functions that just return zero. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
*	nir: add common convert_ycbcr for vulkan csc	Jonathan Marek	2020-04-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Copied from anv, replaced state with passing model/range directly. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: D Scott Phillips <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
*	glsl: Inline builtins in a separate pass	Neil Roberts	2020-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the ir_call functions for builtin functions were replaced with the inline implementation immediately after being added to the instruction list. This patch replaces that with a separate pass that lowers them after the conversion from AST to IR is complete. This will be useful to be able to insert some handling for the precision lowering pass before the inlining. This needs to happen because the precision of the operations in the inlined implementation depends on the highest precision of all of the arguments to the call. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
*	nir: add a bool bitsize lowering pass	Iago Toral Quiroga	2020-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The pass lowers 1-bit booleans produced by NIR to the native bitsize of the operations that produce them. v2: change on lower_load_const_instr after upstream changes. Added TODO2 to explain it, as it was not properly tested yet (see already existing TODO) (Neil) Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
*	glsl: Add an IR lowering pass to convert mediump operations to 16-bit	Neil Roberts	2020-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This works by finding the first rvalue that it can lower using an ir_rvalue_visitor. In that case it adds a conversion to float16 after each rvalue and a conversion back to float before storing the assignment. Also it uses a set to keep track of rvalues that have been lowred already. The handle_rvalue method of the rvalue visitor doesn’t provide any way to stop iteration. If we handle a value in find_precision_visitor we want to be able to stop it from descending into the lowered rvalue again. Additionally this pass disallows converting nodes containing non-float. The can_lower_rvalue function explicitly excludes any branches that have non-float types except bools. This avoids the need to have special handling for functions that convert to int or double. Co-authored-by: Hyunjun Ko <[email protected]> v2. Adds lowering for texture samples v3. Instead of checking whether each node can be lowered while walking the tree, a separate tree walk is now done to check all of the nodes in a single pass. The lowerable nodes are added to a set which is checked during find_precision_visitor instead of calling can_lower_rvalue. v4. Move the special case for temporaries to find_lowerable_rvalues. This needs to be handled while checking for lowerable rvalues so that any later dereferences of the variable will see the right precision. v5. Add an override to visit ir_call instructions and apply the same technique to override the precision of the temporary variable in the same way as done for builtin temporaries and ir_assignment calls. v6. Changes the pass so that it doesn’t need to lower an entire subtree in order do perform a lowering. Instead, certain instructions can be marked as being indepedent of their child instructions. For example, this is the case with array dereferences. The precision of the array index doesn’t have any bearing on whether things using the result of the array deref can be lowered. Now, only toplevel lowerable nodes are added to the lowerable_rvalues instead instead of additionally adding all of the subnodes. It now also only needs one hash table instead of two. v7. Don’t try to lower sampler types. Instead, the sample instruction is now treated as an independent point where the result of the sample can be used in a lowered section. The precision of the sampler type determines the precision of the sample instruction. This also means the coordinates to the sampler can be lowered. v8. Use f2fmp instead of f2f16. v9. Disable lowering derivatives calcualtions, which might not work properly on some hw backends. Reviewed-by: Kristian H. Kristensen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
*	nir: Add pass to combine adjacent scoped memory barriers	Caio Marcelo de Oliveira Filho	2020-03-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	SPIR-V generates very granular barriers, however HW and backends might not necessarily take advantage of those. This pass provides a general mechanism to combine such barriers. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
*	nir: add pass to lower discard() to demote()	Daniel Schürmann	2020-03-09	1	-0/+1
\| \| \| \| \| \| \| \| \|	This pass is intended to work around game bugs, only! It also lowers nir_intrinsic_load_helper_invocation to nir_intrinsic_is_helper_invocation for consistency. Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
*	glsl/linker: handle array/struct members for DisableXfbPacking	Louis-Francis Ratté-Boulianne	2020-03-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When varying packing is disabled for transform feedback and a xfb declaration points to an array element or structure member, the element/member should be aligned to the start of a slot as well. If that's not the case, a new varying is created and the element/member value is copied. There might a way to further optimize the number of slots allocated or the number of copies necessary if the performance cost is problematic. For example, in cases where simply padding the top level variable might correctly align all the captured values. Signed-off-by: Louis-Francis Ratté-Boulianne <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Acked-by: Daniel Stone <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2433>
*	nir: Rename gl_nir_lower_bindless_images.c in preparation for extending it.	Eric Anholt	2020-02-24	1	-1/+1
\| \| \| \| \| \| \| \| \|	The bulk of it can be reused to implement iris's internal non-bindless image lowering, which I would like to reuse in freedreno, v3d, and nir-to-tgsi. Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
*	nir: Add SSBO->global lowering pass	Alyssa Rosenzweig	2020-02-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	To facilitate lowering SSBOs to globals, we need a load_ssbo_address intrinsic. This intrinsic takes an SSBO index and loads the address in global memory of the SSBO (likely implemented via a uniform in the driver). In the future, we'll support bounds checking, but at the moment this is not supported (this pass should only be used for trusted contexts at the moment, i.e. contexts without robustness extensions). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
*	Rename nir_lower_constant_initializers to nir_lower_variable_initalizers	Arcady Goldmints-Orlov	2020-02-12	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is naming is more clear as nir_variables can be initializes not just with a nir_constant but with a pointer to another nir_variable. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
*	nir/zink: move clip_halfz-lowering to common code	Erik Faye-Lund	2020-01-03	1	-0/+1
\| \| \| \| \| \| \| \|	Etnaviv also does the same thing, so let's try to avoid repetition here, and use the same for it code as well. Reviewed-by: Jonathan Marek <[email protected]> Tested-by: Paul Cercueil <[email protected]>
*	android: nir: add a load/store vectorization pass	Mauro Rossi	2019-12-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Fixes the following aco building error: external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:846: error: undefined reference to 'nir_opt_load_store_vectorize' Fixes: ce9205c ("nir: add a load/store vectorization pass") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir: Add a scheduler pass to reduce maximum register pressure.	Eric Anholt	2019-11-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to a scheduler I've written for vc4 and i965, but this time written at the NIR level so that hopefully it's reusable. A notable new feature it has is Goodman/Hsu's heuristic of "once we've started processing the uses of a value, prioritize processing the rest of their uses", which should help avoid the heuristic otherwise making such systematically bad choices around getting texture results consumed. Results for v3d: total instructions in shared programs: 6497588 -> 6518242 (0.32%) total threads in shared programs: 154000 -> 152828 (-0.76%) total uniforms in shared programs: 2119629 -> 2068681 (-2.40%) total spills in shared programs: 4984 -> 472 (-90.53%) total fills in shared programs: 6418 -> 1546 (-75.91%) Acked-by: Alyssa Rosenzweig <[email protected]> (v1) Reviewed-by: Alejandro Piñeiro <[email protected]> (v2) v2: Use the DAG datastructure, fold in the scheduling-for-parallelism patch, include SSA defs in live values so we can switch to bottom-up if we want. v3: Squash in improvements from Alejandro Piñeiro for getting V3D to successfully register allocate on GLES3.1 dEQP. Make sure that discards don't move after store_output. Comment spelling fix.
*	nir: strip as we serialize to remove the nir_shader_clone call	Marek Olšák	2019-11-21	1	-1/+0
\| \| \| \| \| \|	Serializing stripped NIR is faster now. Reviewed-by: Connor Abbott <[email protected]>
*	compiler: move build definition of pp_standalone_scaffolding.c	Timothy Arceri	2019-11-21	1	-2/+1
\| \| \| \| \| \| \| \| \|	This should fix android build issues while still allowing scons to build the standalone compiler. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2129 Reviewed-by: Mark Janes <[email protected]>
*	glsl: add preprocessor #include support	Timothy Arceri	2019-11-20	1	-1/+2
\| \| \| \|	Reviewed-by: Witold Baryluk <[email protected]>
*	nir: Build nir_lower_point_size.c in libmesa_nir	Robert Foss	2019-10-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	nir_lower_point_size.c was not build into the libmesa_nir library for non-meson builds. However it was included in the meson build. This patch fixes that. Signed-off-by: Robert Foss <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
*	nir: add nir_lower_amul pass	Rob Clark	2019-10-18	1	-0/+1
\| \| \| \| \| \| \|	Lower amul to either imul or imul24, depending on whether 24b is enough bits to calculate an offset within the thing being dereferenced. Signed-off-by: Rob Clark <[email protected]>
*	nir: add lowering-pass for point-size mov	Erik Faye-Lund	2019-10-17	1	-0/+1
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	nir: add a pass to lower flat shading.	Dave Airlie	2019-10-17	1	-0/+1
\| \| \| \| \| \| \|	This takes any color or backcolor that has unspecified shading and converts it to flat shading. Reviewed-by: Marek Olšák <[email protected]>
*	nir: move gl_nir_opt_access from glsl directory	Marek Olšák	2019-10-10	1	-1/+1
\| \| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	android: compiler/nir: build nir_divergence_analysis.c	Mauro Rossi	2019-09-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Prerequisite to avoid following radv linking error happening with aco FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so ... external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178: error: undefined reference to 'nir_divergence_analysis' clang.real: error: linker command failed with exit code 1 (use -v to see invocation) Fixes: df86c5f ("nir: add divergence analysis pass.") Signed-off-by: Mauro Rossi <[email protected]>
*	Move blob from compiler/ to util/	Jason Ekstrand	2019-09-19	1	-2/+0
\| \| \| \| \| \| \| \|	There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir: Carve out nir_lower_samplers from GLSL code.	Timur Kristóf	2019-09-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Lowering samplers is needed to produce NIR that can actually be consumed by some gallium drivers, so it doesn't make sense to to keep it only in the GLSL code. This commit introduces nir_lower_samplers to compiler/nir, while maintains the GL-specific function too. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo	Rhys Perry	2019-08-12	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: replace nir_move_load_const() with nir_opt_sink()	Rhys Perry	2019-08-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly the same as nir_move_load_const() but can also move undef instructions, comparisons and some intrinsics (being careful with loops). v2: actually delete nir_move_load_const.c v3: fix nir_opt_sink() usage in freedreno v3: update Makefile.sources v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def v4: handle if uses v4: fix handling of nested loops v5: re-write adjust_block_for_loops v5: re-write setting of use_block for if uses Signed-off-by: Rhys Perry <[email protected]> Co-authored-by: Daniel Schürmann <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir/range-analysis: Rudimentary value range analysis pass	Ian Romanick	2019-08-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most integer operations are omitted because dealing with integer overflow is hard. There are a few things that could be smarter if there was a small amount more tracking of ranges of integer types (i.e., operands are Boolean, operand values fit in 16 bits, etc.). The changes to nir_search_helpers.h are included in this patch to simplify reordering the changes to nir_opt_algebraic.py. v2: Memoize range analysis results. Without this, some shaders appear to get stuck in infinite loops. v3: Rebase on many months of Mesa changes, including 1-bit Boolean changes. v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction". v5: Use nir_alu_srcs_equal for detecting (aa). Previously just the SSA value was compared, and this incorrectly matched (a.xa.y). v6: Many code improvements including (but not limited to) better names, more comments, and better use of helper functions. All suggested by Caio. Rework the handling of several opcodes to use a table for mapping source ranges to a result range. This change fixed a bug that caused fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero. Slightly tighten the range of fmul by recognizing that xx is gt_zero if x is gt_zero. Add similar handling for -xx. v7: Use _______ in the tables as an alias for unknown. Suggested by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: replace lower_sincos with algebraic opt	Jonathan Marek	2019-07-24	1	-1/+0
\| \| \| \| \| \| \| \|	This version has less ops for the same precision. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Matt Turner <[email protected]>
*	anv,nir: Move lower_input_attachments pass from ANV to NIR.	Daniel Schürmann	2019-07-08	1	-0/+1
\| \| \| \| \|	Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	nir: add pass to lower load_interpolated_input	Rob Clark	2019-07-02	1	-0/+1
\| \| \| \| \| \|	Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir/linker: add gl_nir_link_uniform_blocks.c	Alejandro Piñeiro	2019-06-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding the ability to link uniform blocks and shader storage blocks using NIR, intended for ARB_gl_spirv support. Among other things, this linking needs to take into account that everything should work without names, as they could be not present, while the GLSL IR uniform block linking was wrote with the names on its core. The other major difference compared with the GLSL IR linker is that we don't deal with layouts. There are no references to std140, std430, etc. Layouts are expressed through explicit offset, array stride and matrix stride. That simplifies how the buffer size are computed. But also means that we couldn't use the existing methods at glsl_types, so we needed to implement new methods. It is worth to note that this linking do a iteration over the glsl_types, similarly to what the linking uniforms do. A possible future improvement would be refactor both cases to try to share more code that it sharing right now. On GLSL IR there are a class visitor, specialized on each case, for that sharing. As adding a class visitor on C would more complicated, for now we are just iterating on both. Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Neil Roberts <[email protected]> Signed-off-by: Antia Puentes <[email protected]> v2: (from Timothy review) * Fix variable name convention * Stop to use _function_name convention * Don't use // for comments * "nir/linker: Keep track of the stages referencing an UBO/SSBO" squashed with this patch v3: (from Caio review) * Don't delete the linked shader on failure * Use rzalloc_array to avoid some explicit initializations Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	glsl/nir: Add optimization pass for access flags	Connor Abbott	2019-06-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, this just deduces when we can arbitrarily reorder SSBO and image loads, matching the existing logic in radeonsi's TGSI->LLVM pass. This approach can't handle some things that nir_opt_copy_prop_vars can, but it can handle images, and with GCM it lets us hoist reads outside of loops. We can also pass this information to LLVM which lets it do its own optimizations on it. This is GLSL only as I haven't tested it on Vulkan yet, and it would probably need a few changes to work there. Reviewed-by: Timothy Arceri <[email protected]>
*	nir: add a vectorization pass	Connor Abbott	2019-06-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This effectively does the opposite of nir_lower_alus_to_scalar, trying to combine per-component ALU operations with the same sources but different swizzles into one larger ALU operation. It uses a similar model as CSE, where we do a depth-first approach and keep around a hash set of instructions to be combined, but there are a few major differences: 1. For now, we only support entirely per-component ALU operations. 2. Since it's not always guaranteed that we'll be able to combine equivalent instructions, we keep a stack of equivalent instructions around, trying to combine new instructions with instructions on the stack. The pass isn't comprehensive by far; it can't handle operations where some of the sources are per-component and others aren't, and it can't handle phi nodes. But it should handle the more common cases, and it should be reasonably efficient. [Alyssa: Rebase on latest master, updating with respect to typeless moves] Acked-by: Alyssa Rosenzweig <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	nir: Rematerialize compare instructions	Ian Romanick	2019-05-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some architectures, Boolean values used to control conditional branches or condtional selection must be propagated into a flag. This generally means that a stored Boolean value must be compared with zero. Rather than force the generation of extra compares with zero, re-emit the original comparison instruction. This can save register pressure by not needing to store the Boolean value. There are several possible ares for future improvement to this pass: 1. Be more conservative. If both sources to the comparison instruction are non-constants, it may be better for register pressure to emit the extra compare. The current shader-db results on Intel GPUs (next commit) lead me to believe that this is not currently a problem. 2. Be less conservative. Currently the pass requires that all users of the comparison match the pattern. The idea is that after the pass is complete, no instruction will use the resulting Boolean value. The only uses will be of the flag value. It may be beneficial to relax this requirement in some cases. 3. Be less conservative. Also try to rematerialize comparisons used for discard_if intrinsics. After changing the way the Intel compiler generates cod e for discard_if (see MR!935), I tried implementing this already. The changes were pretty small. Instructions were helped in 19 shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit. 4. Copy the preceeding ALU instruction. If the comparison is a comparison with zero, and it is the only user of a particular ALU instruction (e.g., (a+b) != 0.0), it may be a further improvment to also copy the preceeding ALU instruction. On Intel GPUs, this may enable cmod propagation to make additional progress. v2: Use much simpler method to get the prev_block for an if-statement. Suggested by Tim. Reviewed-by: Matt Turner <[email protected]>
*	nir: implement lowering for fsin and fcos	Vasily Khoruzhick	2019-05-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Lower sin and cos using Nick's fast sin/cos approximation from https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Tested-by: Qiang Yu <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
*	nir/flrp: Add new lowering pass for flrp instructions	Ian Romanick	2019-05-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This pass will soon grow to include some optimizations that are difficult or impossible to implement correctly within nir_opt_algebraic. It also include the ability to generate strictly correct code which the current nir_opt_algebraic lowering lacks (though that could be changed). v2: Document the parameters to nir_lower_flrp. Rebase on top of 3766334923e ("compiler/nir: add lowering for 16-bit flrp") Reviewed-by: Matt Turner <[email protected]>
*	nir: add int_to_float lowering pass	Vasily Khoruzhick	2019-05-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This new pass lowers ints and bools to floats. It allows hardware that doesn't have native integers (e.g. Mali4x0) use the same code paths as modern hardware. It uses newly introduced pass to gather SSA types and should be used as late as possible. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
*	mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list	John Stultz	2019-05-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit a99c360a4630 (nir: add pass to lower fb reads), a new file was added that needs to also be added to the Makefile.sources list used by the Android and SCons build system. Cc: Rob Clark <[email protected]> Cc: Emil Velikov <[email protected]> Cc: Amit Pundir <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Alistair Strachan <[email protected]> Cc: Greg Hartman <[email protected]> Cc: Tapani Pälli <[email protected]> Cc: Jason Ekstrand <[email protected]> Fixes: a99c360a463 ("nir: add pass to lower fb reads") Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: John Stultz <[email protected]>
*	nir: Add a SSA type gathering pass	Jason Ekstrand	2019-05-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new pass (which isn't even compile-tested) attempts to determine the ALU type of all the SSA values in a function impl. It takes a greedy approach and assigns intness or floatness to everything it thinks can possibly contain an int or a float. Some values will be labled as both int and float and some will be labled as neither and it is up to the caller to decide what to do with this information. However, for a "nice" shader where the original source contained no bit-casts and no implicit bit-casts were introduced by optimizations, there shouldn't be any overlap in the two sets save for the odd CSEd zero constant. Reviewed-by: Vasily Khoruzhick <[email protected]>
*	nir: add rcp(w) lowering for gl_FragCoord	Andreas Baierl	2019-04-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some hardware (e.g. Mali400) the shader needs to apply some transformations for correct gl_FragCoord handling. The lowering actions look like the following in pseudocode: gl_FragCoord.xyz = gl_FragCoord_orig.xyz gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w Add this lowering as a nir pass in preparation for using it in the driver. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: Add nir_lower_viewport_transform	Alyssa Rosenzweig	2019-04-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <[email protected]> Suggested-by: Qiang Yu <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	nir: Add a pass for selectively lowering variables to scratch space	Jason Ekstrand	2019-04-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <[email protected]>
*	glsl/nir: add support for lowering bindless images_derefs	Karol Herbst	2019-04-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v3) Reviewed-by: Marek Olšák <[email protected]>
*	nir: Get rid of global registers	Jason Ekstrand	2019-04-09	1	-1/+0
\| \| \| \| \| \| \| \| \|	We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add partial redundancy elimination for compares	Ian Romanick	2019-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass attempts to dectect code sequences like if (x < y) { z = y - x; ... } and replace them with sequences like t = x - y; if (t < 0) { z = -t; ... } On architectures where the subtract can generate the flags used by the if-statement, this saves an instruction. It's also possible that moving an instruction out of the if-statement will allow nir_opt_peephole_select to convert the whole thing to a bcsel. Currently only floating point compares and adds are supported. Adding support for integer will be a challenge due to integer overflow. There are a couple possible solutions, but they may not apply to all architectures. v2: Fix a typo in the commit message and a couple typos in comments. Fix possible NULL pointer deref from result of push_block(). Add missing (-A + B) case. Suggested by Caio. v3: Fix is_not_const_zero to work correctly with types other than nir_type_float32. Suggested by Ken. v4: Add some comments explaining how this works. Suggested by Ken. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add a lowering pass for non-uniform resource access	Jason Ekstrand	2019-03-25	1	-0/+1
\| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]>