summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir/nir_search.h
diff options
context:
space:
mode:
authorConnor Abbott <[email protected]>2019-02-18 14:20:34 +0100
committerConnor Abbott <[email protected]>2019-05-02 16:14:06 +0200
commit7ce86e69381ebbff495f6a3f6fd716f58df9bd10 (patch)
treea5cd86734c59e4404d44a2a8222dba60fba271ed /src/compiler/nir/nir_search.h
parent08be23bfdec9fb447c58ae48bf9cc1b91ecba128 (diff)
nir/search: Add automaton-based pre-searching
nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <[email protected]>
Diffstat (limited to 'src/compiler/nir/nir_search.h')
-rw-r--r--src/compiler/nir/nir_search.h3
1 files changed, 3 insertions, 0 deletions
diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
index 9dc09d2361c..526a498cd47 100644
--- a/src/compiler/nir/nir_search.h
+++ b/src/compiler/nir/nir_search.h
@@ -121,8 +121,11 @@ enum nir_search_op {
nir_search_op_b2i,
nir_search_op_i2b,
nir_search_op_f2b,
+ nir_num_search_ops,
};
+uint16_t nir_search_op_for_nir_op(nir_op op);
+
typedef struct {
nir_search_value value;