summaryrefslogtreecommitdiffstats
path: root/src/compiler
diff options
context:
space:
mode:
authorKenneth Graunke <[email protected]>2016-12-09 19:06:06 -0800
committerKenneth Graunke <[email protected]>2017-01-17 21:45:22 -0800
commite7d4008ebfe561ee0aa3df6cdcfd39a8842ed659 (patch)
treec9c16c38459a8190f9040c7dbae23c7d931ff5c8 /src/compiler
parent9919542f1cfff70524bc6117d19bf88e59159caa (diff)
glsl: Make copy propagation not panic when it sees an intrinsic.
A number of games have large arrays of constants, which we promote to uniforms. This introduces copies from the uniform array to the original temporary array. Normally, copy propagation eliminates those copies, making everything refer to the uniform array directly. A number of shaders in "Deus Ex: Mankind Divided" recently exposed a limitation of copy propagation - if we had any intrinsics (i.e. image access in a compute shader), we weren't able to get rid of these copies. That meant that any variable indexing remained on the temporary array rather being moved to the uniform array. i965's scalar backend currently doesn't support indirect addressing of temporary arrays, which meant lowering it to if-ladders. This was horrible. According to Marek, on radeonsi/GCN, "F1 2015" uses 64% less spilled-temp-array memory. On i965/Skylake: total instructions in shared programs: 13362954 -> 13329878 (-0.25%) instructions in affected programs: 43745 -> 10669 (-75.61%) helped: 12 HURT: 0 total cycles in shared programs: 248081010 -> 245949178 (-0.86%) cycles in affected programs: 4597930 -> 2466098 (-46.37%) helped: 12 HURT: 0 total spills in shared programs: 9493 -> 9507 (0.15%) spills in affected programs: 25 -> 39 (56.00%) helped: 0 HURT: 1 total fills in shared programs: 12127 -> 12197 (0.58%) fills in affected programs: 110 -> 180 (63.64%) helped: 0 HURT: 1 Helps Deus Ex: Mankind Divided. The one shader with hurt spills/fills is from Tomb Raider at Ultra settings, but that same shader has a -39.55% reduction in instructions and -14.09% reduction in cycle counts, so it seems like a win there as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Matt Turner <[email protected]>
Diffstat (limited to 'src/compiler')
-rw-r--r--src/compiler/glsl/opt_copy_propagation.cpp31
1 files changed, 27 insertions, 4 deletions
diff --git a/src/compiler/glsl/opt_copy_propagation.cpp b/src/compiler/glsl/opt_copy_propagation.cpp
index 247c4988ed3..2240421a2a5 100644
--- a/src/compiler/glsl/opt_copy_propagation.cpp
+++ b/src/compiler/glsl/opt_copy_propagation.cpp
@@ -186,11 +186,34 @@ ir_copy_propagation_visitor::visit_enter(ir_call *ir)
}
}
- /* Since we're unlinked, we don't (necessarily) know the side effects of
- * this call. So kill all copies.
+ /* Since this pass can run when unlinked, we don't (necessarily) know
+ * the side effects of calls. (When linked, most calls are inlined
+ * anyway, so it doesn't matter much.)
+ *
+ * One place where this does matter is IR intrinsics. They're never
+ * inlined. We also know what they do - while some have side effects
+ * (such as image writes), none edit random global variables. So we
+ * can assume they're side-effect free (other than the return value
+ * and out parameters).
*/
- _mesa_hash_table_clear(acp, NULL);
- this->killed_all = true;
+ if (!ir->callee->is_intrinsic()) {
+ _mesa_hash_table_clear(acp, NULL);
+ this->killed_all = true;
+ } else {
+ if (ir->return_deref)
+ kill(ir->return_deref->var);
+
+ foreach_two_lists(formal_node, &ir->callee->parameters,
+ actual_node, &ir->actual_parameters) {
+ ir_variable *sig_param = (ir_variable *) formal_node;
+ if (sig_param->data.mode == ir_var_function_out ||
+ sig_param->data.mode == ir_var_function_inout) {
+ ir_rvalue *ir = (ir_rvalue *) actual_node;
+ ir_variable *var = ir->variable_referenced();
+ kill(var);
+ }
+ }
+ }
return visit_continue_with_parent;
}