summaryrefslogtreecommitdiffstats
path: root/src/glsl/Makefile.sources
Commit message (Collapse)AuthorAgeFilesLines
* nir: add an optimization to remove useless phi nodesConnor Abbott2015-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | This removes phi nodes whose sources all point to the same thing. Shader-db results: total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%) NIR instructions in affected programs: 126564 -> 122480 (-3.23%) helped: 615 HURT: 0 total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%) FS instructions in affected programs: 24622 -> 23174 (-5.88%) helped: 138 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Jason Ekstrand <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* nir: Add a pass to lower vector phi nodes to scalar phi nodesJason Ekstrand2015-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | v2 Jason Ekstrand <[email protected]>: - Add better comments - Use nir_ssa_dest_init and nir_src_for_ssa more places - Fix some void * casts v3 Jason Ekstrand <[email protected]>: - Rework the way we determine whether or not to sccalarize a phi node to make the recursion non-bogus - Treat load_const instructions as scalarizable v4 Jason Ekstrand <[email protected]>: - Allow uniform and input loads to be scalarizable v5 Jason Ekstrand <[email protected]>: - Also consider loads of inputs (varying, uniform, or ubo) to be scalarizable. We were already doing this for load_var on uniforms and inputs. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: add new constant folding infrastructureJason Ekstrand2015-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Add a required field to the Opcode class, const_expr, that contains an expression or statement that computes the result of the opcode given known constant inputs. Then take those const_expr's and expand them into a function that takes an opcode and an array of constant inputs and spits out the constant result. This means that when adding opcodes, there's one less place to update, and almost all the opcodes are self-documenting since the information on how to compute the result is right next to the definition. The helper functions in nir_constant_expressions.c were taken from ir_constant_expressions.cpp. v3 Jason Ekstrand <[email protected]> - Use mako to generate one function per opcode instead of doing piles of string splicing v4 Jason Ekstrand <[email protected]> - More comments and better indentation in the mako - Add a description of the constant expression language in nir_opcodes.py - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: use Python to autogenerate opcode informationConnor Abbott2015-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v3 Jason Ekstrand <[email protected]> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration
* nir: Add nir_lower_alu_to_scalar.Eric Anholt2015-01-231-0/+1
| | | | | | | | | | | | | | | | This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted for vc4. v2: Use the nir_src_for_ssa() helper, and another instance of nir_alu_src_copy(). v3: Drop the non-SSA support. All intended callers will have SSA-only ALU ops. v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused unsupported() function, drop lower_context struct. v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert about weird input_sizes[]. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Build with subdir-objects.Matt Turner2015-01-231-170/+167
| | | | | | Apparently $(top_srcdir) is not expanded in a source list when using subdir-objects, so remove that. It's not clear to me why we were going to such lengths to prefix each source file anyway.
* nir: Add headers to distribution.Matt Turner2015-01-231-0/+2
|
* glsl: Add blob.c---a simple interface for serializing dataCarl Worth2015-01-161-0/+2
| | | | | | | | | | | | | | | This new interface allows for writing a series of objects to a chunk of memory (a "blob").. The allocated memory is maintained within the blob itself, (and re-allocated by doubling when necessary). There are also functions for reading objects from a blob as well. If code attempts to read beyond the available memory, the read functions return 0 values (or its moral equivalent) without reading past the allocated memory. Once the caller is done with the reads, it can check blob->overrun to ensure whether any invalid values were previously returned due to attempts to read too far. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a worklist helper structureJason Ekstrand2015-01-151-0/+2
| | | | | | | A worklist is a common concept in optimizations. This adds a structure that we can reuse for many different types of optimizations. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a pass for lowering copy instructionsJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Rename lower_variables to lower_vars_to_ssaJason Ekstrand2015-01-151-1/+1
| | | | | | | | The original name wasn't particularly descriptive. This one indicates that it actually gives you SSA values as opposed to the old pass which lowered variables to registers. Reviewed-by: Connor Abbott <[email protected]>
* nir: Remove the ffma peepholeJason Ekstrand2015-01-151-1/+0
| | | | | | | This is no longer needed because it's now part of the algebraic optimization pass Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a basic constant folding passJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add an algebraic optimization passJason Ekstrand2015-01-151-1/+5
| | | | | | | | | This pass uses the previously built algebraic transformations framework and should act as an example for anyone else wanting to make an algebraic transformation pass for NIR. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Add an expression matching frameworkJason Ekstrand2015-01-151-0/+2
| | | | | | | | | | | | | | | This framework provides a simple way to do simple search-and-replace operations on NIR code. The nir_search.h header provides four simple data structures for representing expressions: nir_value and four subtypes: nir_variable, nir_constant, and nir_expression. An expression tree can then be represented by nesting these data structures as needed. The nir_replace_instr function takes an instruction, an expression, and a value; if the instruction matches the expression, it is replaced with a new chain of instructions to generate the given replacement value. The framework keeps track of swizzles on sources and automatically generates the currect swizzles for the replacement value. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a lowering pass for adding source modifiers where possibleJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Remove the old variable lowering codeJason Ekstrand2015-01-151-1/+0
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a pass to lower global variables to local variablesJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a pass for lowering input/output loads/storesJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a pass to lower local variables to registersJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a pass to lower local variable accesses to SSA valuesJason Ekstrand2015-01-151-0/+1
| | | | | | | | | This pass analizes all of the load/store operations and, when a variable is never aliased (potentially used by an indirect operation), it is lowered directly to an SSA value. This pass translates to SSA directly and does not require any fixup by the original to-SSA pass. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a copy splitting passJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a basic CSE passJason Ekstrand2015-01-151-0/+1
| | | | | | | This pass is still fairly basic. It only handles ALU operations, constant loads, and phi nodes. No texture ops or intrinsics yet. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a fused multiply-add peepholeJason Ekstrand2015-01-151-0/+1
|
* nir: Add a peephole select optimizationJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add an SSA-based liveness analysis pass.Jason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a basic metadata management systemJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a lower_vec_to_movs passJason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a naieve from-SSA passJason Ekstrand2015-01-151-0/+1
| | | | | | | This pass is kind of stupidly implemented but it should be enough to get us up and going. We probably want something better that doesn't generate all of the redundant moves eventually. However, the i965 backend should be able to handle the movs, so I'm not too worried about it in the short term.
* nir: add an SSA-based dead code elimination passConnor Abbott2015-01-151-0/+1
| | | | | v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add an SSA-based copy propagation passConnor Abbott2015-01-151-0/+1
|
* nir: add a pass to convert to SSAConnor Abbott2015-01-151-0/+1
| | | | | v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: calculate dominance informationConnor Abbott2015-01-151-0/+1
|
* nir: add an optimization to turn global registers into local registersConnor Abbott2015-01-151-0/+1
| | | | | After linking and inlining, this allows us to convert these registers into SSA values and optimise more code.
* nir: add a pass to lower atomicsConnor Abbott2015-01-151-0/+1
| | | | | v2: Jason Ekstrand <[email protected]> whitespace fixes
* nir: add a pass to lower system value readsConnor Abbott2015-01-151-0/+1
| | | | | v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add a pass to lower sampler instructionsConnor Abbott2015-01-151-0/+1
|
* nir: add a pass to remove unused variablesConnor Abbott2015-01-151-0/+1
| | | | | | | | After we lower variables, we want to delete them in order to free up some memory. v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add a pass to lower variables for scalar backendsConnor Abbott2015-01-151-0/+1
|
* nir: add a glsl-to-nir passConnor Abbott2015-01-151-1/+2
| | | | | | v2: Jason Ekstrand <[email protected]>: Make glsl_to_nir build again fix whitespace
* nir: add a validation passConnor Abbott2015-01-151-0/+1
| | | | | | | This is similar to ir_validate.cpp. v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add a printerConnor Abbott2015-01-151-0/+1
| | | | | | | This is similar to ir_print_visitor.cpp. v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add core helper functionsConnor Abbott2015-01-151-3/+7
| | | | | | | | | These include functions for adding and removing various bits of IR and helpers for iterating over all the sources and destinations of an instruction. This is similar to ir.cpp. v2: Jason Ekstrand <[email protected]>: whitespace and automake fixes
* nir: add the core datastructuresConnor Abbott2015-01-151-0/+2
| | | | | | | | | | | | | This includes all the instructions, ifs, loops, functions, etc. This is similar to the information in ir.h. v2: Jason Ekstrand <[email protected]>: Include ralloc and hash_table from the util directory whitespace fixes Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-By glenn.kennard <[email protected]>
* nir: add a simple C wrapper around glsl_types.hConnor Abbott2015-01-151-0/+3
| | | | | | | v2: Jason Ekstrand <[email protected]>: whitespace and automake fixes Reviewed-by: Eric Anholt <[email protected]>
* glsl: Add headers to distribution.Matt Turner2014-12-121-2/+29
|
* glsl: Lower constant arrays to uniform arrays.Kenneth Graunke2014-11-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider GLSL code such as: const ivec2 offsets[] = ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1), ivec2(0, -1), ivec2(0, 0), ivec2(0, 1), ivec2(1, -1), ivec2(1, 0), ivec2(1, 1)); ivec2 offset = offsets[<non-constant expression>]; Both i965 and nv50 currently handle this very poorly. On i965, this becomes a pile of MOVs to load the immediate constants into registers, a pile of scratch writes to move the whole array to memory, and one scratch read to actually access the value - effectively the same as if it were a non-constant array. We'd much rather upload large blocks of constant data as uniform data, so drivers can simply upload the data via constbufs, and not have to populate it via shader instructions. This is currently non-optional because both i965 and nouveau benefit from it, and according to Marek radeonsi would benefit today as well. (According to Tom, radeonsi may want to handle this itself in the long term, but we can always add a flag when it becomes useful.) Improves performance in a terrain rendering microbenchmark by about 2x, and cuts the number of instructions in about half. Helps a lot of "Natural Selection 2" shaders, as well as one "HOARD" shader. total instructions in shared programs: 5473459 -> 5471765 (-0.03%) instructions in affected programs: 5880 -> 4186 (-28.81%) v2: Use ir_var_hidden to avoid exposing the new uniform via the GL uniform introspection API. v3: Alphabetize Makefile.sources properly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957 Signed-off-by: Kenneth Graunke <[email protected]>
* util: add _mesa_strtod and _mesa_strtofChia-I Wu2014-10-301-2/+1
| | | | | | | | | Both core mesa and glsl have their own wrappers for strtof_l. Merge and move them to util/. They are compiled with a C++ compiler so that we can make them thread-safe in a following commit. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Optimize min/max expression treesIago Toral Quiroga2014-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original patch by Petri Latvala <[email protected]>: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 -> 67 (-10.67%) instructions in affected programs: 60 -> 52 (-13.33%) GAINED: 0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 -> 4293575 (-0.00%) instructions in affected programs: 1188 -> 1160 (-2.36%) GAINED: 0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of dropping of a (constant float (0.00000)) operand, which was compiled to a saturate modifier. Version 2 by Iago Toral Quiroga <[email protected]>: Changes from review feedback: - Squashed various cosmetic changes sent by Matt Turner. - Make less_all_components return an enum rather than setting a class member. (Suggested by Mat Turner). Also, renamed it to compare_components. - Make less_all_components, smaller_constant and larger_constant static. (Suggested by Mat Turner) - Change mixmax_range to call its limits "low" and "high" instead of "range[0]" and "range[1]". (Suggested by Connor Abbot). - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by Connor Abbot). - Make the logic more clearer by rearrenging the code and commenting. (Suggested by Connor Abbot). - Added comment to explain why we need to recurse twice. (Suggested by Connor Abbot). - If we cannot prune an expression, do not return early. Instead, attempt to prune its children. (Suggested by Connor Abbot). Other changes: - Instead of having a global "valid" visitor member, let the various functions that can determine this status return a boolean and check for its value to decide what to do in each case. This is more flexible and allows to recurse into children of parents that could not be prunned due to invalid ranges (so related to the last bullet in the review feedback). - Make sure we always check if a range is valid before working with it. Since any use of get_range, combine_range or range_intersection can invalidate a range we should check for this situation every time we use any of these functions. Version 3 by Iago Toral Quiroga <[email protected]>: Changes from review feedback: - Now we can make get_range, combine_range and range_intersection static too (suggested by Connor Abbot). - Do not return NULL when looking for the larger or greater constant into mixed vector constants. Instead, produce a new constant by doing a component-wise minmax. With this we can also remove of the validations when we call into these functions (suggested by Connor Abbot). - Add a comment explaining the meaning of the baserange argument in prune_expression (suggested by Connor Abbot). Other changes: - Eliminate minmax expressions operating on constant vectors with mixed values by resolving them. No piglit regressions observed with Version 3. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 Reviewed-by: Connor Abbott <[email protected]>
* glsl: Eliminate unused built-in variables after compilationIan Romanick2014-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After compilation (and before linking) we can eliminate quite a few built-in variables. Basically, any uniform or constant (e.g., gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can be eliminated. System values, vertex shader inputs (with one exception), and fragment shader outputs that are not used and not re-declared in the shader text can also be removed. gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in function ftransform. There are some complications with eliminating these variables (see the comment in the patch), so they are not eliminated. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0 After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0 Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0 After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0 A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit. v2: Don't remove any built-in with Transpose in the name. v3: Fix comment typo noticed by Anuj. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Anuj Phogat <[email protected]> Cc: Eric Anholt <[email protected]>