aboutsummaryrefslogtreecommitdiffstats
path: root/src/glsl/opt_algebraic.cpp
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Add support doubles in optimization passesDave Airlie2015-02-191-4/+22
| | | | | | Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: Optimize (f2i(trunc x)) into (f2i x).Matt Turner2015-02-111-0/+9
| | | | | | total instructions in shared programs: 5950326 -> 5949286 (-0.02%) instructions in affected programs: 88264 -> 87224 (-1.18%) helped: 692
* glsl: Optimize round-half-up pattern.Matt Turner2015-02-111-0/+33
| | | | | Hurts some Psychonauts shaders, but after the next patch (which this enables) they're fewer instructions than before this patch.
* glsl: Optimize 1/exp(x) into exp(-x).Matt Turner2015-02-101-0/+6
| | | | | | | | | | | | | Lots of shaders divide by exp2(...) which we turn into a multiplication by the reciprocal. We can avoid the reciprocal by simply negating exp2's argument. total instructions in shared programs: 5947154 -> 5946695 (-0.01%) instructions in affected programs: 118661 -> 118202 (-0.39%) helped: 380 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Don't optimize min/max into saturate when EmitNoSat is setAbdiel Janulgue2014-12-081-1/+1
| | | | | | | v3: Fix multi-line comment format (Ian) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Optimize scalar all_equal/any_nequal into equal/nequal.Matt Turner2014-12-051-0/+10
| | | | | | | | | Cuts an instruction from two shaders in Tesseract, by allowing the (x+y) cmp 0 -> x cmp -y optimization to take place. instructions in affected programs: 1198 -> 1194 (-0.33%) Reviewed-by: Eric Anholt <[email protected]>
* glsl: Remove now useless dot optimization on basis vectMatt Turner2014-11-031-23/+0
| | | | | | | The optimization in commit d056863b covers these cases, which were the first optimizations I added to the GLSL compiler. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Emit mul instead of dot if only one component left.Matt Turner2014-11-031-1/+4
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85683 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85691 Reviewed-by: Ian Romanick <[email protected]>
* glsl: Drop constant 0.0 components from dot products.Matt Turner2014-10-291-0/+27
| | | | | | | | | Helps a small number of vertex shaders in the games Dungeon Defenders and Shank, as well as an internal benchmark. instructions in affected programs: 2801 -> 2719 (-2.93%) Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Recognize open-coded pow(x, y).Matt Turner2014-09-271-0/+14
| | | | | | | | pow(x, y) is equivalent to exp(log(x) * y). instructions in affected programs: 578 -> 458 (-20.76%) Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)Abdiel Janulgue2014-08-311-0/+23
| | | | | | | | | | | | v2: - Output max(saturate(x),b) instead of saturate(max(x,b)) - Make sure we do component-wise comparison for vectors (Ian Romanick) v3: - Add missing condition where the outer constant value is > 0.0 and inner constant is 1.0. - Fix comments to show that the optimization is a commutative operation (Matt Turner) Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)Abdiel Janulgue2014-08-311-0/+39
| | | | | | | | | | | v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin - Make sure we do component-wise comparison for vectors (Ian Romanick) v3: - Add missing condition where the outer constant value is zero and inner constant is < 1 - Fix comments to reflect we are doing a commutative operation (Matt Turner) Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Optimize clamp(x, 0, 1) as saturate(x)Abdiel Janulgue2014-08-311-0/+36
| | | | | | | | | | | v2: - Check that the base type is float (Ian Romanick) v3: - Make sure comments reflect that we are doing a commutative operation - Add missing condition where the inner constant is 1.0 and outer constant is 0.0 - Make indexing of operands easier to read (Matt Turner) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Don't convert reductions of ivec to a dot-productIan Romanick2014-06-251-1/+3
| | | | | | | | | | | | | | | | | | Mesa has an optimization that converts expressions like "v.x + v.y + v.z + v.w" into dot(v, 1.0). And therein lies the rub: the other operand to the dot-product is always a float... even if the vector is an ivec or uvec. This results in an assertion failure in ir_builder. If the base type of the operand is not float, don't try the optimization. Dot-product is not valid on integer data. Fixes piglit vs-integer-reduction.shader_test and OpenGL ES conformance test ES2-CTS.gtf.GL2Tests.glGetUniform.glGetUniform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* glsl: Optimize (v.x + v.y) + (v.z + v.w) into dot(v, 1.0).Matt Turner2014-06-191-0/+46
| | | | Cuts five instructions out of SynMark's Gl32VSInstancing benchmark.
* glsl: Pass in options to do_algebraic().Matt Turner2014-06-191-3/+8
| | | | | | Will be used in the next commit. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Pass ctx->Const.NativeIntegers to do_algebraic.Kenneth Graunke2014-04-081-3/+5
| | | | | | | | | The next patch will introduce an optimization that only works when integers are not represented as floating point values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize (x + y cmp 0) into (x cmp -y).Matt Turner2014-04-051-0/+22
| | | | | | | | Cuts a small handful of instructions in Serious Sam 3: instructions in affected programs: 4692 -> 4666 (-0.55%) Reviewed-by: Ian Romanick <[email protected]>
* glsl: Optimize pow(x, 2) into x * x.Matt Turner2014-03-181-0/+8
| | | | | | Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Fix broken LRP algebraic optimization.Kenneth Graunke2014-03-021-1/+3
| | | | | | | | | | | | | | | | | | | | | opt_algebraic was translating lrp(x, 0, a) into add(x, -mul(x, a)). Unfortunately, this references "x" twice, which is invalid in the IR, leading to assertion failures in the validator. Normally, cloning IR solves this. However, "x" could actually be an arbitrary expression tree, so copying it could result in huge piles of wasted computation. This is why we avoid reusing subexpressions. Instead, transform it into mul(x, add(1.0, -a)), which is equivalent but doesn't need two references to "x". Fixes a regression since d5fa8a95621169, which isn't in any stable branches. Fixes 18 shaders in shader-db (bastion and yofrankie). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize lrp(x, 0, a) into x - (x * a).Matt Turner2014-02-281-0/+2
| | | | | | | | Helps one program in shader-db: instructions in affected programs: 96 -> 92 (-4.17%) Reviewed-by: Ian Romanick <[email protected]>
* glsl: Optimize lrp(0, y, a) into y * a.Matt Turner2014-02-281-0/+2
| | | | | | | | Helps two programs in shader-db: instructions in affected programs: 254 -> 234 (-7.87%) Reviewed-by: Ian Romanick <[email protected]>
* glsl: Optimize triop_csel with all-true or all-false.Eric Anholt2014-02-071-0/+7
| | | | Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize various cases of fma (aka MAD).Eric Anholt2014-02-071-0/+13
| | | | Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize lrp(x, x, coefficient) --> x.Eric Anholt2014-02-071-0/+2
| | | | | | | | | | | total instructions in shared programs: 1627754 -> 1624534 (-0.20%) instructions in affected programs: 45748 -> 42528 (-7.04%) GAINED: 3 LOST: 0 (serious sam, humus domino demo) Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize pow(x, 1) -> x.Eric Anholt2014-02-071-0/+4
| | | | | | | | | | | total instructions in shared programs: 1627826 -> 1627754 (-0.00%) instructions in affected programs: 6640 -> 6568 (-1.08%) GAINED: 0 LOST: 0 (HoN and savage2) Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize log(exp(x)) and exp(log(x)) into x.Eric Anholt2014-02-071-0/+36
| | | | Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize ~~x into x.Eric Anholt2014-02-071-0/+5
| | | | | | | v2: Fix pasteo of an extra abs being inserted (caught by many). Rewrite to drop the silly switch statement. Reviewed-by: Matt Turner <[email protected]> (v1)
* glsl: Optimize open-coded lrp into lrp.Jordan Justen2014-01-211-0/+52
| | | | | | | | | | total instructions in shared programs: 1498191 -> 1487051 (-0.74%) instructions in affected programs: 669388 -> 658248 (-1.66%) GAINED: 1 LOST: 0 Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Jordan Justen <[email protected]>
* glsl: Optimize pow(2, x) --> exp2(x).Kenneth Graunke2014-01-071-0/+11
| | | | | | | | | | | | | | | | | On Haswell, POW takes 24 cycles, while EXP2 only takes 14. Plus, using POW requires putting 2.0 in a register, while EXP2 doesn't. I believe that EXP2 will be faster than POW on basically all GPUs, so it makes sense to optimize it. Looking at the savage2 subset of shader-db: total instructions in shared programs: 113225 -> 113179 (-0.04%) instructions in affected programs: 2139 -> 2093 (-2.15%) instances of 'math pow': 795 -> 749 (-6.14%) instances of 'math exp': 389 -> 435 (11.8%) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize pow(1.0, X) --> 1.0.Kenneth Graunke2014-01-071-0/+6
| | | | | | | Surprisingly, this helps one vertex shader in 3DMMES. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Apply the transformation "1/rsq(x) == sqrt(x)" in opt_algebraic.Eric Anholt2013-11-151-3/+4
| | | | | | | | | | | | | The comment was stale, because the lowering in question wasn't happening in lower_instructions.cpp. Presumably if the lowering ever moves there, we can plumb the lowering mask through to opt_algebraic. total instructions in shared programs: 1618696 -> 1616810 (-0.12%) instructions in affected programs: 243018 -> 241132 (-0.78%) GAINED: 0 LOST: 0 Reviewed-by: Jordan Justen <[email protected]>
* glsl: Apply the transformation "(a ^^ a) -> false" in opt_algebraic.Eric Anholt2013-11-151-1/+3
| | | | Reviewed-by: Jordan Justen <[email protected]>
* glsl: Apply the transformation "(a && a) -> a" in opt_algebraic.Eric Anholt2013-11-151-1/+3
| | | | Reviewed-by: Jordan Justen <[email protected]>
* glsl: Apply the transformation "(a || a) -> a" in opt_algebraic.Eric Anholt2013-11-151-1/+3
| | | | | | | | | | | total instructions in shared programs: 1732385 -> 1732373 (-0.00%) instructions in affected programs: 416 -> 404 (-2.88%) GAINED: 0 LOST: 0 (That's 4 already-short fragment shaders in dota2) Reviewed-by: Jordan Justen <[email protected]>
* glsl: Drop no-op shifts involving 0.Eric Anholt2013-10-281-0/+10
| | | | | | | | | | | | I noticed this in a shader in Unigine Heaven that was spilling. While it doesn't really reduce register pressure, it shaves a few instructions anyway (7955 -> 7882). v2: Fix turning "0 >> x" into "x" instead of "0" (caught by Erik Faye-Lund). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Use ir_builder more in opt_algebraic.Eric Anholt2013-10-281-30/+10
| | | | | | | | | While ir_builder is slightly less efficient, we're only increasing the work when there's actual optimization being done, and it's way more readable code. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Move common code out of opt_algebraic's handle_expression().Eric Anholt2013-10-281-78/+39
| | | | | | | | | | Matt and I had each screwed up these common required patterns recently, in ways that wouldn't have been noticed for a long time if not for code review. Just enforce it in the caller so that we don't rely on code review catching these bugs. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize (not A) and (not B) into not (A or B).Matt Turner2013-10-251-0/+9
| | | | | | No shader-db changes, but seems like a good idea. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Optimize (not A) or (not B) into not (A and B).Matt Turner2013-10-251-0/+12
| | | | | | | | A few Serious Sam 3 shaders affected: instructions in affected programs: 4384 -> 4344 (-0.91%) Reviewed-by: Eric Anholt <[email protected]>
* glsl: Optimize -(-expr) into expr.Matt Turner2013-10-211-0/+10
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Optimize abs(-expr) and abs(abs(expr)) into abs(expr).Matt Turner2013-10-211-0/+18
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Use saved values instead of recomputing them.Matt Turner2013-10-211-8/+4
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: Optimize mul(a, -1) into neg(a).Matt Turner2013-10-161-0/+23
| | | | | | | | | | | Two extra instructions in some heroesofnewerth shaders, but a win for everything else. total instructions in shared programs: 1531352 -> 1530815 (-0.04%) instructions in affected programs: 121898 -> 121361 (-0.44%) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add support for new bit built-ins in ARB_gpu_shader5.Matt Turner2013-05-061-3/+3
| | | | | | v2: Move use of ir_binop_bfm and ir_triop_bfi to a later patch. Reviewed-by: Chris Forbes <[email protected]>
* glsl: Optimize ir_triop_lrp(x, y, a) with a = 0.0f or 1.0fMatt Turner2013-02-281-0/+11
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Convert mix() to use a new ir_triop_lrp opcode.Kenneth Graunke2013-02-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Many GPUs have an instruction to do linear interpolation which is more efficient than simply performing the algebra necessary (two multiplies, an add, and a subtract). Pattern matching or peepholing this is more desirable, but can be tricky. By using an opcode, we can at least make shaders which use the mix() built-in get the more efficient behavior. Currently, all consumers lower ir_triop_lrp. Subsequent patches will actually generate different code. v2 [mattst88]: - Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a subsequent patch and ir_triop_lrp translated directly. v3 [mattst88]: - Move changes from the next patch to opt_algebraic.cpp to accept 3-src operations. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Transform dot product by a basis vector into a swizzleMatt Turner2012-06-121-0/+24
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Check for zero vectors in ir_binop_dotMatt Turner2012-06-121-0/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Put a bunch of optimization visitors under anonymous namespaces.Eric Anholt2012-06-111-0/+4
| | | | | | | | | | | | | | | | | Because these classes are used entirely from their own source files and not from separate DSOs, the linker gets to produce massively less code. This cuts about 13k of text in the libdricore case. In the non-libdricore case, the additional linkage information allows the compiler to inline some code, so libglsl.a size actually increases by about 300 bytes. For a dricore build, improves shader_runner runtime on glsl-fs-copy-propagation-texcoords-1 by 0.21% +/- 0.03% (n=353574, outliers removed). No statistically significant difference with n=322 on glslparsertest on a yofrankie shader intended to test compiler performance. Reviewed-by: Kenneth Graunke <[email protected]>