aboutsummaryrefslogtreecommitdiffstats
path: root/src/glsl/ir_optimization.h
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Lower constant arrays to uniform arrays.Kenneth Graunke2014-11-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider GLSL code such as: const ivec2 offsets[] = ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1), ivec2(0, -1), ivec2(0, 0), ivec2(0, 1), ivec2(1, -1), ivec2(1, 0), ivec2(1, 1)); ivec2 offset = offsets[<non-constant expression>]; Both i965 and nv50 currently handle this very poorly. On i965, this becomes a pile of MOVs to load the immediate constants into registers, a pile of scratch writes to move the whole array to memory, and one scratch read to actually access the value - effectively the same as if it were a non-constant array. We'd much rather upload large blocks of constant data as uniform data, so drivers can simply upload the data via constbufs, and not have to populate it via shader instructions. This is currently non-optional because both i965 and nouveau benefit from it, and according to Marek radeonsi would benefit today as well. (According to Tom, radeonsi may want to handle this itself in the long term, but we can always add a flag when it becomes useful.) Improves performance in a terrain rendering microbenchmark by about 2x, and cuts the number of instructions in about half. Helps a lot of "Natural Selection 2" shaders, as well as one "HOARD" shader. total instructions in shared programs: 5473459 -> 5471765 (-0.03%) instructions in affected programs: 5880 -> 4186 (-28.81%) v2: Use ir_var_hidden to avoid exposing the new uniform via the GL uniform introspection API. v3: Alphabetize Makefile.sources properly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957 Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Optimize min/max expression treesIago Toral Quiroga2014-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original patch by Petri Latvala <[email protected]>: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 -> 67 (-10.67%) instructions in affected programs: 60 -> 52 (-13.33%) GAINED: 0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 -> 4293575 (-0.00%) instructions in affected programs: 1188 -> 1160 (-2.36%) GAINED: 0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of dropping of a (constant float (0.00000)) operand, which was compiled to a saturate modifier. Version 2 by Iago Toral Quiroga <[email protected]>: Changes from review feedback: - Squashed various cosmetic changes sent by Matt Turner. - Make less_all_components return an enum rather than setting a class member. (Suggested by Mat Turner). Also, renamed it to compare_components. - Make less_all_components, smaller_constant and larger_constant static. (Suggested by Mat Turner) - Change mixmax_range to call its limits "low" and "high" instead of "range[0]" and "range[1]". (Suggested by Connor Abbot). - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by Connor Abbot). - Make the logic more clearer by rearrenging the code and commenting. (Suggested by Connor Abbot). - Added comment to explain why we need to recurse twice. (Suggested by Connor Abbot). - If we cannot prune an expression, do not return early. Instead, attempt to prune its children. (Suggested by Connor Abbot). Other changes: - Instead of having a global "valid" visitor member, let the various functions that can determine this status return a boolean and check for its value to decide what to do in each case. This is more flexible and allows to recurse into children of parents that could not be prunned due to invalid ranges (so related to the last bullet in the review feedback). - Make sure we always check if a range is valid before working with it. Since any use of get_range, combine_range or range_intersection can invalidate a range we should check for this situation every time we use any of these functions. Version 3 by Iago Toral Quiroga <[email protected]>: Changes from review feedback: - Now we can make get_range, combine_range and range_intersection static too (suggested by Connor Abbot). - Do not return NULL when looking for the larger or greater constant into mixed vector constants. Instead, produce a new constant by doing a component-wise minmax. With this we can also remove of the validations when we call into these functions (suggested by Connor Abbot). - Add a comment explaining the meaning of the baserange argument in prune_expression (suggested by Connor Abbot). Other changes: - Eliminate minmax expressions operating on constant vectors with mixed values by resolving them. No piglit regressions observed with Version 3. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 Reviewed-by: Connor Abbott <[email protected]>
* glsl: Eliminate unused built-in variables after compilationIan Romanick2014-09-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After compilation (and before linking) we can eliminate quite a few built-in variables. Basically, any uniform or constant (e.g., gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can be eliminated. System values, vertex shader inputs (with one exception), and fragment shader outputs that are not used and not re-declared in the shader text can also be removed. gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in function ftransform. There are some complications with eliminating these variables (see the comment in the patch), so they are not eliminated. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0 After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0 Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0 After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0 A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit. v2: Don't remove any built-in with Transpose in the name. v3: Fix comment typo noticed by Anuj. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Anuj Phogat <[email protected]> Cc: Eric Anholt <[email protected]>
* glsl: Add a lowering pass for gl_VertexIDIan Romanick2014-09-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Converts gl_VertexID to (gl_VertexIDMESA + gl_BaseVertex). gl_VertexIDMESA is backed by SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, and gl_BaseVertex is backed by SYSTEM_VALUE_BASE_VERTEX. v2: Put the enum in struct gl_constants and propoerly resolve the scope in C++ code. Fix suggested by Marek. v3: Reabase on Matt's foreach_in_list changes (was using foreach_list). v4 (Ken): Use a systemvalue instead of a uniform because STATE_BASE_VERTEX has been removed. v5: Use a boolean to select lowering, and only allow one lowering method. Suggested by Ken. v6 (Ken): Replace strcmp against literal "gl_BaseVertex"/"gl_VertexID" with SYSTEM_VALUE enum checks, for efficiency. v7: Rebase on context constant initialization work. Signed-off-by: Ian Romanick <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)Abdiel Janulgue2014-08-311-0/+1
| | | | | | Signed-off-by: Abdiel Janulgue <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Pass in options to do_algebraic().Matt Turner2014-06-191-1/+2
| | | | | | Will be used in the next commit. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Rebalance expression trees that are reduction operations.Matt Turner2014-06-191-0/+1
| | | | | | | | | | The intention of this pass was to give us better instruction scheduling opportunities, but it unexpectedly reduced some instruction counts as well: total instructions in shared programs: 1666639 -> 1666073 (-0.03%) instructions in affected programs: 54612 -> 54046 (-1.04%) (and trades 4 SIMD16 programs in SS3)
* glsl: add lowering passes for carry/borrowIlia Mirkin2014-05-021-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Remove varying "base" parametersIan Romanick2014-05-021-1/+1
| | | | | | | | | | In February 2013 Paul unified the values used for shader stage outputs and shader stage inputs. See commits 8a076c5f0^..eed6baf76. Since that time, the location_base parameters are always VARYING_SLOT_VAR0. Instead of passing that around, just hard code it. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Drop do_common_optimization's max_unroll_iterations parameter.Kenneth Graunke2014-04-111-1/+0
| | | | | | | | | | | | Now that we pass in gl_shader_compiler_options, it makes sense to just use options->MaxUnrollIterations, rather than passing a separate parameter. Half of the invocations already passed options->MaxUnrollIterations, while the other half passed in a hardcoded value of 32. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Pass ctx->Const.NativeIntegers to do_algebraic.Kenneth Graunke2014-04-081-1/+1
| | | | | | | | | The next patch will introduce an optimization that only works when integers are not represented as floating point values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Pass ctx->Const.NativeIntegers to do_common_optimization().Kenneth Graunke2014-04-081-1/+2
| | | | | | | | | | | The next few patches will introduce an optimization that only works when integers are not represented as floating point values. v2: Re-word-wrap a line, as requested by Ian Romanick. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Delete LRP_TO_ARITH lowering pass flag.Kenneth Graunke2014-02-261-3/+2
| | | | | | | | | | | | | | Tt's kind of a trap---calling do_common_optimization() after lower_instructions() may cause opt_algebraic() to reintroduce ir_triop_lrp expressions that were lowered, effectively defeating the point. Because of this, nobody uses it. v2: Delete more code (caught by Ian Romanick). Cc: "10.1" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Eric Anholt <[email protected]>
* glsl/i965: move lower_offset_array up to GLSL compiler level.Dave Airlie2014-02-251-0/+1
| | | | | | | | This lowering pass will be useful for gallium drivers as well, in order to support the GL TG4 oddity that is textureGatherOffsets. Reviewed-by: Chris Forbes <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: Vectorize multiple scalar assignmentsMatt Turner2014-01-211-0/+1
| | | | | | | | | | Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by 4.61%, and CS:GO by 5.71%. total instructions in shared programs: 1500153 -> 1498191 (-0.13%) instructions in affected programs: 59919 -> 57957 (-3.27%) Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: Get rid of lower_bounded_loops and ir_loop::normative_bound.Paul Berry2013-12-091-1/+0
| | | | | | | | Now that loop_controls no longer creates normatively bound loops, there is no need for ir_loop::normative_bound or the lower_bounded_loops pass. Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: consolidate bounded loop handling into a lowering pass.Paul Berry2013-12-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the i965 fs and vec4 visitors) had nearly identical logic for handling bounded loops. This replaces the duplicate logic with an equivalent lowering pass that is used by all the back-ends. Note: on i965, there is a slight increase in instruction count. For example, a loop like this: for (int i = 0; i < 100; i++) { total += i; } would previously compile down to this (vec4) native code: mov(8) g4<1>.xD 0D mov(8) g8<1>.xD 0D loop: cmp.ge.f0(8) null g8<4;4,1>.xD 100D (+f0) break(8) add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD add(8) g8<1>.xD g8<4;4,1>.xD 1D add(8) g4<1>.xD g4<4;4,1>.xD 1D while(8) loop After this patch, the "(+f0) break(8)" turns into: (+f0) if(8) break(8) endif(8) because the back-end isn't smart enough to recognize that "if (condition) break;" can be done using a conditional break instruction. However, it should be relatively easy for a future peephole optimization to properly optimize this. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Add a CSE pass.Eric Anholt2013-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This only operates on constant/uniform values for now, because otherwise I'd have to deal with killing my available CSE entries when assignments happen, and getting even this working in the tree ir was painful enough. As is, it has the following effect in shader-db: total instructions in shared programs: 1524077 -> 1521964 (-0.14%) instructions in affected programs: 50629 -> 48516 (-4.17%) GAINED: 0 LOST: 0 And, for tropics, that accounts for most of the effect, the FPS improvement is 11.67% +/- 0.72% (n=3). v2: Use read_only field of the variable, manually check the lod_info union members, use get_num_operands(), rename cse_operands_visitor to is_cse_candidate_visitor, move all is-a-candidate logic to that function, and call it before checking for CSE on a given rvalue, more comments, use private keyword. Reviewed-by: Paul Berry <[email protected]>
* glsl: Add ldexp_to_arith lowering pass.Matt Turner2013-09-171-0/+1
| | | | Reviewed-by: Paul Berry <[email protected]>
* glsl: don't eliminate texcoords that can be set by GL_COORD_REPLACEMarek Olšák2013-08-181-1/+1
| | | | | | | | | Tested by examining generated TGSI shaders from piglit/glsl-routing. Cc: [email protected] Reviewed-by: Henri Verbeet <[email protected]> Tested-by: Henri Verbeet <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl/linker: Properly pack GS input varyings.Paul Berry2013-08-011-1/+1
| | | | | | | | Since geometry shader inputs are arrays (where the array index indicates which vertex is being examined), varying packing needs to treat them differently. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Remove comma at end of enumerator list.Vinson Lee2013-07-171-1/+1
| | | | | | | | | Fixes this build error on OpenBSD 5.3. In file included from ../../src/mesa/main/ff_fragment_shader.cpp:53: ./../glsl/ir_optimization.h:64: error: comma at end of enumerator list Signed-off-by: Vinson Lee <[email protected]>
* glsl/linker: eliminate unused and set-but-unused built-in varyingsMarek Olšák2013-07-021-0/+4
| | | | | | | | | | | | | This eliminates built-in varyings such as gl_Color, gl_SecondaryColor, gl_TexCoord, and gl_FogFragCoord if they are unused by the next stage or not written at all (e.g. gl_TexCoord elements). The gl_TexCoord array is broken down into separate vec4s if needed. v2: - use a switch statement in varying_info_visitor::visit(ir_variable*) - use snprintf - disable the optimization for GLES2 Reviewed-by: Ian Romanick <[email protected]>
* glsl linker: remove interface block instance namesJordan Justen2013-05-231-0/+1
| | | | | | | | Convert interface blocks with instance names into flat interface blocks without an instance name. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add lowering pass for ir_triop_vector_insertIan Romanick2013-05-131-0/+1
| | | | | | | | | | | | | | | | | | This will eventually replace do_vec_index_to_cond_assign. This lowering pass is called in all the places where do_vec_index_to_cond_assign or do_vec_index_to_swizzle is called. v2: Use WRITEMASK_* instead of integer literals. Use a more concise method of generating broadcast_index. Both suggested by Eric. v3: Use a series of scalar compares instead of a single vector compare. Suggested by Eric and Ken. It still uses 'if (cond) v.x = y;' instead of conditional assignments because ir_builder doesn't do conditional assignments, and I'd rather keep the code simple. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add a pass to flip matrix/vector multiplies to use dot products.Kenneth Graunke2013-05-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | This pass flips (matrix * vector) operations to (vector * matrixTranspose) for certain built-in matrices (currently gl_ModelViewProjectionMatrix and gl_TextureMatrix). This is equivalent, but results in dot products rather than multiplies and adds. On some hardware, this is more efficient. This pass is conditionalized on ctx->mvp_with_dp4, the flag drivers set to indicate they prefer dot products. Improves performance in Lightsmark by 1.01131% +/- 0.162069% (n = 10) on a Haswell GT2 system. Passes Piglit on Ivybridge. v2: Use struct gl_shader_compiler_options instead of plumbing through another boolean flag for this purpose. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Pass struct shader_compiler_options into do_common_optimization.Kenneth Graunke2013-05-121-1/+2
| | | | | | | | | | | | | do_common_optimization may need to make choices about whether to emit certain kinds of instructions. gl_context::ShaderCompilerOptions contains exactly that information, so it makes sense to pass it in. Rather than passing the whole array, pass the structure for the stage that's currently being worked on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Add a pass to lower bitfield-insert into bfm+bfi.Matt Turner2013-05-061-0/+1
| | | | | | | | | | i965/Gen7+ and Radeon/Evergreen+ have bfm/bfi instructions to implement bitfieldInsert() from ARB_gpu_shader5. v2: Add ir_binop_bfm and ir_triop_bfi to st_glsl_to_tgsi.cpp. Remove spurious temporary assignment and dereference. Reviewed-by: Chris Forbes <[email protected]>
* glsl: Add an optimization pass to flatten simple nested if blocks.Kenneth Graunke2013-04-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLBenchmark 2.7's shaders contain conditional blocks like: if (x) { if (y) { ... } } where the outer conditional's then clause contains exactly one statement (the nested if) and there are no else clauses. This can easily be optimized into: if (x && y) { ... } This saves a few instructions in GLBenchmark 2.7: total instructions in shared programs: 11833 -> 11649 (-1.55%) instructions in affected programs: 8234 -> 8050 (-2.23%) It also helps CS:GO slightly (-0.05%/-0.22%). More importantly, however, it simplifies the control flow graph, which could enable other optimizations. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Convert mix() to use a new ir_triop_lrp opcode.Kenneth Graunke2013-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Many GPUs have an instruction to do linear interpolation which is more efficient than simply performing the algebra necessary (two multiplies, an add, and a subtract). Pattern matching or peepholing this is more desirable, but can be tricky. By using an opcode, we can at least make shaders which use the mix() built-in get the more efficient behavior. Currently, all consumers lower ir_triop_lrp. Subsequent patches will actually generate different code. v2 [mattst88]: - Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a subsequent patch and ir_triop_lrp translated directly. v3 [mattst88]: - Move changes from the next patch to opt_algebraic.cpp to accept 3-src operations. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Add support for lowering 4x8 pack/unpack operationsMatt Turner2013-01-251-0/+6
| | | | | | | Lower them to arithmetic and bit manipulation expressions. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v4)Chad Versace2013-01-241-0/+20
| | | | | | | | | | | | | | | | Lower them to arithmetic and bit manipulation expressions. v2: Rewrite using ir_builder [for idr]. v3: Comment typos. [for mattst88] v4: Fix arithmetic error in comments. Factor out a shift instruction. Don't heap allocate factory.instructions. [for paul] Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Matt Tuner <[email protected]> (v3) Reviewed-by: Paul Berry <[email protected]> (v4) Signed-off-by: Chad Versace <[email protected]>
* glsl: Add a lowering pass for packing varyings.Paul Berry2012-12-141-0/+3
| | | | | | | | | | | | | | This lowering pass generates GLSL code that manually packs varyings into vec4 slots, for the benefit of back-ends that don't support packed varyings natively. No functional change--the lowering pass is not yet used. Reviewed-by: Eric Anholt <[email protected]> v2: Don't use ir_hierarchical_visitor--just loop over instructions directly. Also, make the names of the packed varyings include the names of the original varyings that were packed into them.
* glsl/lower_clip_distance: Update symbol table.Paul Berry2012-12-141-1/+1
| | | | | | | | | | | | | This patch modifies the clip distance lowering pass so that the new symbol it generates (glClipDistanceMESA) is added to the shader's symbol table. This will allow a later patch to modify the linker so that it finds transform feedback varyings using the symbol table rather than having to iterate through all the declarations in the shader. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Add a lowering pass to turn complicated UBO references to vector loads.Eric Anholt2012-08-071-0/+1
| | | | | | | | | | | v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping the ir_variable as the variable part of the offset from handle_rvalue(), and track the constant offsets from that with a plain old integer value, avoiding a bunch of temporary variables in the array and struct handling. Also, fix file description doxygen. v3: Fix a row vs col typo, and fix spelling in a comment. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Fix lower_discard_flow prototype mismatch.José Fonseca2012-05-151-1/+1
| | | | Should fix MSVC link failure.
* glsl: Implement the GLSL 1.30+ discard control flow rule in GLSL IR.Eric Anholt2012-05-141-0/+1
| | | | | | | Previously, I tried implementing this in the i965 driver, but did so in a way that violated the intent of the spec, and broke Tropics. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add an array splitting pass.Eric Anholt2012-04-111-0/+1
| | | | | | | | | | | | | | | | | I've had this code laying around almost done for a long time. The idea is like opt_structure_splitting, that we've got a bunch of transforms at the GLSL IR level that only understand scalars and vectors, which just skip complicated dereferences. While driver backends may manage some optimization after they split matrices up themselves, it would be better to bring all of our optimization to bear on the problem. While I wasn't expecting changes quite yet, a few programs end up winning: a gstreamer convolution shader, and the Humus dynamic branching demo: Total instructions: 269430 -> 269342 3/2148 programs affected (0.1%) 1498 -> 1410 instructions in affected programs (5.9% reduction)
* glsl: Add a lowering pass to remove reads of shader output variables.Vincent Lejeune2012-01-061-0/+1
| | | | | | | | | This is similar to Gallium's existing glsl_to_tgsi::remove_output_read lowering pass, but done entirely inside the GLSL compiler. Signed-off-by: Vincent Lejeune <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: Add uniform_locations_assigned parameter to do_dead_code opt passIan Romanick2011-10-251-2/+4
| | | | | | | | | | | | | | | | | Setting this flag prevents declarations of uniforms from being removed from the IR. Since the IR is directly used by several API functions that query uniforms in shaders, uniform declarations cannot be removed after the locations have been set. However, it should still be safe to reorder the declarations (this is not tested). Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41980 Tested-by: Brian Paul <[email protected]> Reviewed-by: Bryan Cain <[email protected]> Cc: Vinson Lee <[email protected]> Cc: José Fonseca <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Yuanhan Liu <[email protected]>
* glsl: Implement a lowering pass for gl_ClipDistance.Paul Berry2011-09-231-0/+1
| | | | | | | | | | | | | | | | | | | In i965 GEN6+ (and I suspect most other hardware), gl_ClipDistance needs to be laid out as a pair of vec4's (the first containing clip distances 0-3, and the second containing clip distances 4-7). However, it is declared in GLSL as an array of 8 floats. This lowering pass acts at the GLSL level, modifying the declaration of gl_ClipDistance so that it is an array of vec4's rather than an array of floats, and renaming it to gl_ClipDistanceMESA. In addition, it modifies all accesses to the array so that they access the appropiate component of one of the vec4's. Since some hardware may not internally represent gl_ClipDistance as a pair of vec4's, this lowering pass is optional. To enable it, set the LowerClipDistance flag in gl_shader_compiler_options to true. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Use a separate div_to_mul_rcp lowering flag for integers.Bryan Cain2011-08-311-6/+7
| | | | | | | | | | | | | | Using multiply and reciprocal for integer division involves potentially lossy floating point conversions. This is okay for older GPUs that represent integers as floating point, but undesirable for GPUs with native integer division instructions. TGSI, for example, has UDIV/IDIV instructions for integer division, so it makes sense to handle this directly. Likewise for i965. Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Bryan Cain <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Factor out code that generates block of index comparisonsIan Romanick2011-07-231-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* glsl: Remove unused function prototypes.Paul Berry2011-07-081-2/+0
| | | | | No functional change. Remove prototypes for do_mod_to_fract() and do_sub_to_add_neg(), which haven't existed since November 2010.
* glsl: Add a new opt_copy_propagation variant that does it channel-wise.Eric Anholt2011-02-041-0/+1
| | | | | | | | | | This patch cleans up many of the extra copies in GLSL IR introduced by i965's scalarizing passes. It doesn't result in a statistically significant performance difference on nexuiz high settings (n=3) or my demo (n=10), due to brw_fs.cpp's register coalescing covering most of those extra moves anyway. However, it does make the debug of wine's GLSL shaders much more tractable, and reduces instruction count of glsl-fs-convolution-2 from 376 to 288.
* glsl: Support if-flattening beyond a given maximum nesting depth.Kenneth Graunke2010-12-271-1/+1
| | | | | | | | | | | This adds a new optional max_depth parameter (defaulting to 0) to lower_if_to_cond_assign, and makes the pass only flatten if-statements nested deeper than that. By default, all if-statements will be flattened, just like before. This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign, to match the new naming conventions.
* glsl: Lower ir_binop_pow to a sequence of EXP2 and LOG2Ian Romanick2010-12-011-2/+3
|
* glsl: Add a lowering pass to move discards out of if-statements.Kenneth Graunke2010-12-011-0/+1
| | | | | | | This should allow lower_if_to_cond_assign to work in the presence of discards, fixing bug #31690 and likely #31983. NOTE: This is a candidate for the 7.9 branch.
* glsl: Add an optimization pass to simplify discards.Kenneth Graunke2010-12-011-0/+1
| | | | NOTE: This is a candidate for the 7.9 branch.
* glsl: Combine many instruction lowering passes into one.Kenneth Graunke2010-11-191-2/+8
| | | | | | | This should save on the overhead of tree-walking and provide a convenient place to add more instruction lowering in the future. Signed-off-by: Ian Romanick <[email protected]>