summaryrefslogtreecommitdiffstats
path: root/src/glsl/loop_unroll.cpp
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Skip loop-too-large heuristic if indexing arrays of a certain sizeKenneth Graunke2014-11-061-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pattern in certain shaders is: uniform vec4 colors[NUM_LIGHTS]; for (int i = 0; i < NUM_LIGHTS; i++) { ...use colors[i]... } In this case, the application author expects the shader compiler to unroll the loop. By doing so, it replaces variable indexing of the array with constant indexing, which is more efficient. This patch extends the heuristic to see if arrays accessed within the loop are indexed by an induction variable, and if the array size exactly matches the number of loop iterations. If so, the application author probably intended us to unroll it. If not, we rely on the existing loop-too-large heuristic. Improves performance in a phong shading microbenchmark by 2.88x, and a shadow mapping microbenchmark by 1.63x. Without variable indexing, we can upload the small uniform arrays as push constants instead of pull constants, avoiding shader memory access. Affects several games, but doesn't appear to impact their performance. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Kristian Høgsberg <[email protected]>
* glsl: Use typed foreach_in_list instead of foreach_list.Matt Turner2014-07-011-4/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Ignore loop-too-large heuristic if there's bad variable indexing.Kenneth Graunke2014-04-111-3/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many shaders use a pattern such as: for (int i = 0; i < NUM_LIGHTS; i++) { ...access a uniform array, or shader input/output array... } where NUM_LIGHTS is a small constant (such as 2, 4, or 8). The expectation is that the compiler will unroll those loops, turning the array access into constant indexing, which is more efficient, and which may enable array splitting and other optimizations. In many cases, our heuristic fails - either there's another tiny nested loop inside, or the estimated number of instructions is just barely beyond the threshold. So, we fail to unroll the loop, leaving the variable indexing in place. Drivers which don't support the particular flavor of variable indexing will call lower_variable_index_to_cond_assign(), which generates piles and piles of immensely inefficient code. We'd like to avoid generating that. This patch detects unsupported forms of variable-indexing in loops, where the array index is a loop induction variable. In that case, it bypasses the loop-too-large heuristic and forces unrolling. Improves performance in various microbenchmarks: Gl32PSBump8 by 47%, Gl32ShMapVsm by 80%, and Gl32ShMapPcf by 27%. No changes in shader-db. v2: Check ir->array for being an array or matrix, rather than the ir_dereference_array itself. v3: Fix and expand statistics in commit message. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Rename loop_unroll_count::fail to "nested_loop."Kenneth Graunke2014-04-111-4/+5
| | | | | | | | | | | The "fail" flag is set if loop_unroll_count encounters a nested loop; calling the flag "nested_loop" is a bit clearer. The original reasoning was that count is inaccurate (too small) if there are nested loops, as we don't do any sort of analysis on the inner loop. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Pass gl_shader_compiler_optimizations to unroll_loops().Kenneth Graunke2014-04-111-7/+13
| | | | | | | Loop unrolling will need to know a few more options in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Clean up "unused parameter" warningsIan Romanick2014-03-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ../../src/glsl/builtin_functions.cpp:72:1: warning: unused parameter 'state' [-Wunused-parameter] ../../src/glsl/ir_clone.cpp:31:1: warning: unused parameter 'ht' [-Wunused-parameter] ../../src/glsl/ir_equals.cpp:44:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/ir_equals.cpp:50:1: warning: unused parameter 'ignore' [-Wunused-parameter] ../../src/glsl/ir_equals.cpp:68:1: warning: unused parameter 'ignore' [-Wunused-parameter] ../../src/glsl/ir_print_visitor.cpp:149:6: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/ir_print_visitor.cpp:556:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/ir_print_visitor.cpp:562:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/link_uniforms.cpp:213:1: warning: unused parameter 'record_type' [-Wunused-parameter] ../../src/glsl/loop_analysis.cpp:225:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/loop_unroll.cpp:73:30: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/loop_unroll.cpp:79:30: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/loop_unroll.cpp:85:30: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/opt_copy_propagation_elements.cpp:189:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/opt_cse.cpp:402:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/opt_dead_code_local.cpp:117:30: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/opt_redundant_jumps.cpp:53:1: warning: unused parameter 'ir' [-Wunused-parameter] ../../src/glsl/opt_vectorize.cpp:301:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/loops: Get rid of lower_bounded_loops and ir_loop::normative_bound.Paul Berry2013-12-091-3/+0
| | | | | | | | Now that loop_controls no longer creates normatively bound loops, there is no need for ir_loop::normative_bound or the lower_bounded_loops pass. Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: Stop creating normatively bound loops in loop_controls.Paul Berry2013-12-091-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, when loop_controls analyzed a loop and found that it had a fixed bound (known at compile time), it would remove all of the loop terminators and instead set the loop's normative_bound field to force the loop to execute the correct number of times. This made loop unrolling easy, but it had a serious disadvantage. Since most GPU's don't have a native mechanism for executing a loop a fixed number of times, in order to implement the normative bound, the back-ends would have to synthesize a new loop induction variable. As a result, many loops wound up having two induction variables instead of one. This caused extra register pressure and unnecessary instructions. This patch modifies loop_controls so that it doesn't set the loop's normative_bound anymore. Instead it leaves one of the terminators in the loop (the limiting terminator), so the back-end doesn't have to go to any extra work to ensure the loop terminates at the right time. This complicates loop unrolling slightly: when deciding whether a loop can be unrolled, we have to account for the presence of the limiting terminator. And when we do unroll the loop, we have to remove the limiting terminator first. For an example of how this results in more efficient back end code, consider the loop: for (int i = 0; i < 100; i++) { total += i; } Previous to this patch, on i965, this loop would compile down to this (vec4) native code: mov(8) g4<1>.xD 0D mov(8) g8<1>.xD 0D loop: cmp.ge.f0(8) null g8<4;4,1>.xD 100D (+f0) if(8) break(8) endif(8) add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD add(8) g8<1>.xD g8<4;4,1>.xD 1D add(8) g4<1>.xD g4<4;4,1>.xD 1D while(8) loop (notice that both g8 and g4 are loop induction variables; one is used to terminate the loop, and the other is used to accumulate the total). After this patch, the same loop compiles to: mov(8) g4<1>.xD 0D loop: cmp.ge.f0(8) null g4<4;4,1>.xD 100D (+f0) if(8) break(8) endif(8) add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD add(8) g4<1>.xD g4<4;4,1>.xD 1D while(8) loop Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: Get rid of loop_variable_state::max_iterations.Paul Berry2013-12-091-3/+3
| | | | | | | | This value is now redundant with loop_variable_state::limiting_terminator->iterations and ir_loop::normative_bound. Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: Simplify loop unrolling logic by breaking into functions.Paul Berry2013-12-091-108/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old logic of loop_unroll_visitor::visit_leave(ir_loop *) was: heuristics to skip unrolling in various circumstances; if (loop contains more than one jump) return; else if (loop contains one jump) { if (the jump is an unconditional "break" at the end of the loop) { remove the break and set iteration count to 1; fall through to simple loop unrolling code; } else { for (each "if" statement in the loop body) see if the jump is a "break" at the end of one of its forks; if (the "break" wasn't found) return; splice the remainder of the loop into the other fork of the "if"; remove the "break"; complex loop unrolling code; return; } } simple loop unrolling code; return; These tasks have been moved to their own functions: - splice the remainder of the loop into the other fork of the "if" - simple loop unrolling code - complex loop unrolling code And the logic has been flattened to: heuristics to skip unrolling in various circumstances; if (loop contains more than one jump) return; if (loop contains no jumps) { simple loop unroll; return; } if (the jump is an unconditional "break" at the end of the loop) { remove the break; simple loop unroll with iteration count of 1; return; } for (each "if" statement in the loop body) { if (the jump is a "break" at the end of one of its forks) { splice the remainder of the loop into the other fork of the "if"; remove the "break"; complex loop unroll; return; } } This will make it easier to modify the loop unrolling algorithm in a future patch. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Hide many classes local to individual .cpp files in anon namespaces.Eric Anholt2013-09-231-0/+3
| | | | | | | | This gives the compiler the chance to inline and not export class symbols even in the absence of LTO. Saves about 60kb on disk. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Refine the loop instruction counting.Eric Anholt2012-03-081-12/+36
| | | | | | | | | | | | Before, we were only counting top-level instructions. But if we have an assignment of a giant expression tree (such as the ones eventually generated by glsl-fs-unroll), we were counting the same as an assignment of a variable deref. glsl-fs-unroll-explosion now fails in a reasonable amount of time on i965 because the unrolling didn't go ridiculously far. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Avoid excessive loop unrolling.Mathias Fröhlich2012-02-091-0/+15
| | | | | | | | | | | | | | | | | | | Avoid unrollong loops that are either nested loops or where the loop body times the unroll count is huge. The change is far from being perfect but it extends the loop unrolling decision heuristic by some additional safeguard. In particular this cuts down compilation of a shader precomputing atmospheric scattering integral tables containing two nesting levels in a loop from something way beyond some minutes (I never waited for it to finish) to some fractions of a second. This fixes piglit tests glsl-fs-unroll-explosion and glsl-vs-unroll-explosion on r600g. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* Convert everything from the talloc API to the ralloc API.Kenneth Graunke2011-01-311-2/+2
|
* glsl: Unroll loops with conditional breaks anywhere (not just the end)7.10-branchpointLuca Barbieri2010-12-091-46/+68
| | | | | | | | | | | Currently we only unroll loops with conditional breaks at the end, which is the form that lower_jumps generates. However, if breaks are not lowered, they tend to appear at the beginning, so add support for a conditional break anywhere. Signed-off-by: Luca Barbieri <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Consider the "else" branch when looking for loop breaks.Kenneth Graunke2010-12-091-1/+1
| | | | | | | Found this bug by code inspection. Based off the comments just before this code, the intent is to find whether the break exists in the "then" branch or the "else" branch. However, the code actually looked at the last instruction in the "then" branch twice.
* glsl: Clean up code by adding a new is_break() function.Kenneth Graunke2010-12-091-6/+11
|
* glsl2: fix signed/unsigned comparison warningBrian Paul2010-10-121-1/+1
|
* loop_unroll: unroll loops with (lowered) breaksLuca Barbieri2010-09-131-4/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the loop ends with an if with one break or in a single break unroll it. Loops that end with a continue will have that continue removed by the redundant jump optimizer. Likewise loops that end with an if-statement with a break at the end of both branches will have the break pulled out after the if-statement. Loops of the form for (...) { do_something1(); if (cond) { do_something2(); break; } else { do_something3(); } } will be unrolled as do_something1(); if (cond) { do_something2(); } else { do_something3(); do_something1(); if (cond) { do_something2(); } else { do_something3(); /* Repeat inserting iterations here.*/ } } ir_lower_jumps can guarantee that all loops are put in this form and thus all loops are now potentially unrollable if an upper bound on the number of iterations can be found. Signed-off-by: Ian Romanick <[email protected]>
* glsl: add several EmitNo* options, and MaxUnrollIterationsLuca Barbieri2010-09-081-4/+6
| | | | | | | | | This increases the chance that GLSL programs will actually work. Note that continues and returns are not yet lowered, so linking will just fail if not supported. Signed-off-by: Ian Romanick <[email protected]>
* glsl2: Add module to perform simple loop unrollingIan Romanick2010-09-031-0/+100