summaryrefslogtreecommitdiffstats
path: root/src/util/register_allocate.c
Commit message (Collapse)AuthorAgeFilesLines
* util/ra: Add a getter for a node classAlyssa Rosenzweig2019-07-251-0/+7
| | | | | | | | Complements the existing getters and the setter for node class. To be used in the Panfrost RA refactor. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Assert nodes are in-bounds in add_node_interferenceJason Ekstrand2019-05-141-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* util/ra: Don't destroy the graph in ra_allocate()Jason Ekstrand2019-05-141-76/+102
| | | | | | | | | | | | We want to be able to call ra_allocate() and, when it fails, mutate the graph and try again rather than re-building the graph from scratch. This commit moves all the scratch bits except the final register allocation (which is really an out value not scratch) into sub-structs named "tmp" to make it clear which things are scratch. It also adds bits to the ra_select() initialization loop to initialize things (since we can't trust rzalloc anymore) and copy q_test and forced_reg over. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Add a helper for resetting a node's interferenceJason Ekstrand2019-05-141-0/+36
| | | | Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Add helpers for adding nodes to an interference graphJason Ekstrand2019-05-141-20/+70
| | | | Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Improve the performance of ra_simplifyJason Ekstrand2019-05-141-30/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most expensive part of register allocation is the ra_simplify step which is a fixed-point algorithm with a worst-case complexity of O(n^2) which adds the registers to a stack which we then use later to do the actual allocation. This commit uses bit sets and changes the core loop of ra_simplify to first walk 32-node chunks and then walk each chunk. This lets us skip whole 32-node chunks in one go based on bit operations and compute the minimum q value potentially 32x as fast. Of course, the algorithm still has the same fundamental O(n^2) run-time but the constant is now much lower. In the nasty Aztec Ruins compute shader, this shaves a full four seconds off the 30s compile time for a release build of mesa. In a debug build (needed for accurate stack traces), perf says that ra_select takes 20% of runtime before this patch and only 5-6% of runtime after this patch. It also makes shader-db runs faster. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%) Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Only update q_total if the reg is not assignedJason Ekstrand2019-05-141-1/+1
| | | | | | | | We only use q_total if the reg is not assigned so there's no point in updating it if the reg is not assigned. This has no known perf benefit but it will reduce churn in a future commit. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Only update best_optimistic_node if !progressJason Ekstrand2019-05-141-1/+5
| | | | | | | This shaves about half a second off the 30 second compile time of one of the compute shaders in Aztec ruins. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Make in_stack a bitset in the graphJason Ekstrand2019-05-141-18/+15
| | | | Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Get rid of tabsJason Ekstrand2019-05-141-24/+24
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: include mtypes.h lessMarek Olšák2018-04-121-1/+0
| | | | | | | | | | - remove mtypes.h from most header files - add main/menums.h for often used definitions - remove main/core.h v2: fix radv build Reviewed-by: Brian Paul <[email protected]>
* util/ra: fix memory leakEric Engestrom2017-07-311-0/+2
| | | | | | | | | CID: 1415909 Fixes: 7a34a0e8903249c41fae "ra: Add a callback for selecting a register from what's available." Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ra: Add a callback for selecting a register from what's available.Eric Anholt2017-07-251-14/+76
| | | | | | | | | | | | | | | | VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to use low-numbered registers (more parallelism on Intel, fewer delay slots on vc4), but in order to give instruction scheduling the most freedom to avoid delays we want to round-robin between registers of the same cost. Our two heuristics so far have chosen one end or the other of that tradeoff. The callback, instead, hands the driver the set of registers that are available, and the driver gets to make its own choice. This will be used in vc4 to round-robin between registers of the same cost, and might be used in the future for improving bank selection. Reviewed-by: Nicolai Hähnle <[email protected]>
* ra: Don't put a node in its own adjacency set.Eric Anholt2017-07-251-13/+10
| | | | | | | | All the paths looping over adjacency had guards against considering themselves (the non-obvious one was ra_any_neighbors_conflict(), which has in_stack set). Reviewed-by: Nicolai Hähnle <[email protected]>
* ra: Pull the body of a loop out to a helper function.Eric Anholt2017-07-251-12/+19
| | | | | | | I was going to indent this code another level, and decided it would be easier to read as a helper. Reviewed-by: Nicolai Hähnle <[email protected]>
* util/ra: (trivial) fix c99 loop variable initializationRoland Scheidegger2015-08-191-7/+8
| | | | Fails with old msvc otherwise.
* util/ra: Make allocating conflict lists optionalJason Ekstrand2015-08-181-9/+17
| | | | | | | | | Since i965 is now using make_reg_conflicts_transitive and doesn't need q-value computations, they are disabled on i965. They are enabled everywhere else so that they get the old behavior. This reduces the time spent in eglInitialize() on BDW by around 10-15%. Reviewed-by: Eric Anholt <[email protected]>
* util/ra: Add a function for making all conflicts on a register transitiveJason Ekstrand2015-08-181-0/+23
| | | | Reviewed-by: Eric Anholt <[email protected]>
* ra: Delete the conflict lists in ra_set_finalizeJason Ekstrand2015-08-101-0/+5
| | | | | | | They are never used after the set is finalized so there's no reason to keep them around. Reviewed-by: Matt Turner <[email protected]>
* ra: Refactor ra_set_finalizeJason Ekstrand2015-08-101-26/+25
| | | | | | | All this commit does is change an early return to an if with an else clause. Reviewed-by: Matt Turner <[email protected]>
* util: Avoid double promotion.Matt Turner2015-07-291-1/+1
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* ra: Disable round-robin strategy for optimistically colorable nodes.Francisco Jerez2015-02-231-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The round-robin allocation strategy is expected to decrease the amount of false dependencies created by the register allocator and give the post-RA scheduling pass more freedom to move instructions around. On the other hand it has the disadvantage of increasing fragmentation and decreasing the number of equally-colored nearby nodes, what increases the likelihood of failure in presence of optimistically colorable nodes. This patch disables the round-robin strategy for optimistically colorable nodes. These typically arise in situations of high register pressure or for registers with large live intervals, in both cases the task of the instruction scheduler shouldn't be constrained excessively by the dense packing of those nodes, and a spill (or on Intel hardware a fall-back to SIMD8 mode) is invariably worse than a slightly less optimal scheduling. Shader-db results on the i965 driver: total instructions in shared programs: 5488539 -> 5488489 (-0.00%) instructions in affected programs: 1121 -> 1071 (-4.46%) helped: 1 HURT: 0 GAINED: 49 LOST: 5 v2: Re-enable round-robin already for the lowest one of the nodes pushed optimistically onto the sack (Connor). v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor). Reviewed-by: Connor Abbott <[email protected]>
* util: Move Mesa's bitset.h to util/.Eric Anholt2015-02-201-1/+1
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* util: Silence signed-unsigned comparison warningsJan Vesely2014-12-171-6/+6
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* ra: Don't use regs as the ralloc context.Matt Turner2014-12-011-1/+1
| | | | | | | | | | The i965 backends pass something out of 'screen', which is allocated per-process, making using this as a ralloc context not thread-safe. All callers ra_alloc_interference_graph() already ralloc_free() its return value. Reviewed-by: Jason Ekstrand <[email protected]>
* util: Use reg_belongs_to_class instead of BITSET_TESTJason Ekstrand2014-10-241-1/+1
| | | | | | | | | This shouldn't be a functional change since reg_belongs_to_class is just a wrapper around BITSET_TEST. It just makes the code a little easier to read. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Move register_allocate.c to util.Eric Anholt2014-09-231-0/+654
The r300 gallium driver is using it outside of the Mesa tree, and I wanted to do so for vc4 as well. Rather than make the multiple-definitions problem even more complicated, just move it to more-shared code. v2: Don't forget to delete the symlink in r300 (review by Matt). Delete more r300-helper references (review by Emil) Don't prefix util/ header inclusion with "util/" (review by Emil) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]> (v1)