aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/program/register_allocate.c
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Move register_allocate.c to util.Eric Anholt2014-09-231-654/+0
| | | | | | | | | | | | | The r300 gallium driver is using it outside of the Mesa tree, and I wanted to do so for vc4 as well. Rather than make the multiple-definitions problem even more complicated, just move it to more-shared code. v2: Don't forget to delete the symlink in r300 (review by Matt). Delete more r300-helper references (review by Emil) Don't prefix util/ header inclusion with "util/" (review by Emil) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Emil Velikov <[email protected]> (v1)
* ra: assert against unsigned underflow in q_totalConnor Abbott2014-09-121-0/+1
| | | | | | | | q_total should never go below 0 (which is why it's defined as unsigned), and if it does, then something is seriously wrong. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ra: move declarations before code to fix MSVC buildBrian Paul2014-08-141-2/+2
| | | | Trivial.
* ra: optimistically color only one node at a timeConnor Abbott2014-08-131-35/+22
| | | | | | | | | | | | | | | | | | | | | Before, when we encountered a situation where we had to optimistically color a node, we would immediately give up and push all the remaining nodes on the stack in the order of their index - which is a random, and potentially not optimal, order. Instead, choose one node to optimistically color in ra_select(), and then once we've optimistically colored it, keep on going as normal in the hopes that we've opened up more avenues for the normal select phase to make progress. In cases with high register pressure, this helps make the order we push things on the stack much better, and therefore increase the chance that we can allocate successfully. total instructions in shared programs: 4545447 -> 4545401 (-0.00%) instructions in affected programs: 1353 -> 1307 (-3.40%) GAINED: 124 LOST: 6 Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ra: don't consider nodes for spilling we don't need toConnor Abbott2014-08-131-40/+11
| | | | | | | | | | | | | | | | | Previously, we would consider any optimistically colored nodes for spilling. However, spilling any optimistically colored nodes below the node that we failed to color on the stack wouldn't help us make progress, since it wouldn't help with allowing us to find a color for the node currently failing to get colored. Only consider nodes which were above the failing node on the stack for spilling, which simplifies the logic, and comment the code better so people know what's going on here. No shader-db changes with BRW_MAX_GRF reduced to 90 (or with the normal number of GRF's). Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ra: make the p, q test more efficientConnor Abbott2014-08-131-7/+26
| | | | | | | | | | | | | We can store the q total that pq_test() would've calculated in the node itself, updating it when we add a node to the stack. This way, we only have to walk the adjacency list when we push a node on the stack (i.e. when the p, q test succeeds) instead of every time we do the p, q test. No difference in shader-db run times, but I'm keeping this in because the q total that it calculates will also be used in the next few commits. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ra: cleanup the public APIConnor Abbott2014-08-131-6/+6
| | | | | | | | | | | | | | Previously, there were 3 entrypoints into parts of the actual allocator, and an API called ra_allocate_no_spills() that called all 3. Nobody would ever want to call any of the 3 entrypoints by themselves, so everybody just used ra_allocate_no_spills(). So just make them static functions, and while we're at it rename ra_allocate_no_spills() to ra_allocate() since there's no equivalent "with spills," because the backend is supposed to handle spilling. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* util: Move ralloc to a new src/util directory.Kenneth Graunke2014-08-041-1/+1
| | | | | | | | | | | | | | | | | | For a long time, we've wanted a place to put utility code which isn't directly tied to Mesa or Gallium internals. This patch creates a new src/util directory for exactly that purpose, and builds the contents as libmesautil.la. ralloc seemed like a good first candidate. These days, it's directly used by mesa/main, i965, i915, and r300g, so keeping it in src/glsl didn't make much sense. Signed-off-by: Kenneth Graunke <[email protected]> v2 (Jason Ekstrand): More realloc uses and some scons fixes Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ra: Convert another bool array to bitsets.Eric Anholt2014-03-181-6/+7
| | | | | | | | | This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1, with no performance difference on timing short shader-db runs (n=9/10, warmup outlier removed). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* ra: Use a bitset for storing which registers belong to a class.Kenneth Graunke2014-03-181-5/+10
| | | | | | | | | This should use 1/8 the memory. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* ra: Create a reg_belongs_to_class() helper function.Kenneth Graunke2014-03-181-2/+11
| | | | | | | | | This is a little easier to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Christoph Brill <[email protected]>
* ra: Use bool instead of GLboolean.Kenneth Graunke2014-03-181-25/+26
| | | | | | | | | | | | | | | | | | | This isn't the GL API, so there's no reason to use GLboolean. Using bool is safer: any non-zero value is treated as "true". When converting a value to a GLboolean, all but the low byte is discarded, which means that values like 256 will be incorrectly rendered as false. Done via the following vim commands: :%s/GLboolean/bool/g :%s/GL_TRUE/true/g :%s/GL_FALSE/false/g and one line of manual whitespace tidying. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: move declarations before codeBrian Paul2013-06-271-2/+3
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* ra: Fix register spilling.Eric Anholt2013-06-261-5/+39
| | | | | | | | | | | | Commit 551c991606e543c3a264a762026f11348b37947e tried to avoid spilling registers that were trivially colorable. But since we do optimistic coloring, the top of the stack also contains nodes that are not trivially colorable, so we need to consider them for spilling (since they are some of our best candidates). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58384 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63674 NOTE: This is a candidate for the 9.1 branch.
* mesa: Fix test for optimistic coloring being necessary.Eric Anholt2013-05-291-1/+1
| | | | | | | | | | i965 and radeon use ra_set_node_reg() to force payload registers to specific registers while exposing those registers to the allocator still. We were treating those register nodes as unsuccessfully allocated in the ra_simplify() step, leading to walking the registers again to do optimistic coloring even if there was nothing left ot do. Acked-by: Kenneth Graunke <[email protected]>
* mesa: Add a macro to bitset for determining bitset size.Eric Anholt2013-04-121-1/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* register_allocate: Fix the type of best_benefit.Matt Turner2013-04-081-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Ask the register allocator to round-robin through registers.Eric Anholt2013-04-041-3/+28
| | | | | | | | | | | | The way we were allocating registers before, packing into low register numbers for Ironlake, resulted in an overly-constrained dependency graph for instruction scheduling. Improves GLBenchmark 2.1 performance by 4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No difference on nexuiz (n=15). v2: Fix off-by-one bug that made the change only work for 16-wide on i965. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).Eric Anholt2013-03-111-4/+8
| | | | | | | | | | | | | After the previous fix that almost removes an allocation of 4*n^2 bytes, we can use a bitset to reduce another allocation from n^2 bytes to n^2/8 bytes. Between the previous commit and this one, the peak heap size for an oglconform ARB_fragment_program max instructions test on i965 goes from 4GB to 255MB. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825 Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)Eric Anholt2013-03-111-1/+13
| | | | | | | | We were allocating an adjacency_list entry for every possible interference that could get created, but that usually doesn't happen. We can save a lot of memory by resizing the array on demand. Reviewed-by: Kenneth Graunke <[email protected]>
* register_allocate: don't consider trivially colorable registers for spilling.Paul Berry2012-10-031-0/+7
| | | | | | | | | | | | | | | | | Previously, we considered all registers as candidates for spilling. This was counterproductive--for any registers that have already been removed from the interference graph, there is no benefit to spilling them, since they don't contribute to register pressure. This patch ensures that we will only try to spill registers that are still in the interference graph after register allocation has failed. This is consistent with the recommendations of the paper "Retargetable Graph-Coloring Register Allocation for Irregular Architectures", on which our register allocator is based. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ra: Add q_values parameter to ra_set_finalize()Tom Stellard2012-09-191-1/+12
| | | | | | This allows the user to pass precomputed q values to the allocator. Reviewed-by: Kenneth Graunke <[email protected]>
* ra: Clarify usage of ra_set_node_reg()Tom Stellard2012-09-191-0/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Make the register allocator allocation take a ralloc context.Eric Anholt2012-01-181-2/+8
| | | | | | This fixes a memory leak on i965 context destruction. NOTE: This is a candidate for the 8.0 branch.
* mesa: Add a convenience interface for register allocator conflicts setup.Eric Anholt2011-08-101-0/+21
|
* ra: Add ra_set_node_reg()Tom Stellard2011-04-301-4/+24
| | | | | | | | This function can be used to avoid creating single register classes for input/payload registers. This makes optimistic coloring less likely to fail. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Add a bunch of documentation to the register allocator.Eric Anholt2011-04-291-3/+65
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* Convert everything from the talloc API to the ralloc API.Kenneth Graunke2011-01-311-20/+17
|
* ra: Use the same context when realloc'ing arrays.Kenneth Graunke2011-01-211-2/+2
| | | | | | The original allocations use regs->regs as the context, so talloc will happily ignore the context given here. Change it to match to clarify that it isn't changing.
* ra: Take advantage of the adjacency list in finding a node to spill.Eric Anholt2011-01-181-6/+6
| | | | | | | | This revealed a bug in ra_get_spill_benefit where we only considered the benefit of the first adjacency we were to remove, explaining some of the ugly spilling I've seen in shaders. Because of the reduced spilling, it reduces the runtime of glsl-fs-convolution-1 36.9% +/- 0.9% (n=5).
* ra: Remove unused "name" field in regs.Eric Anholt2011-01-181-1/+0
|
* ra: Take advantage of the adjacency list in ra_select() too.Eric Anholt2011-01-181-5/+6
| | | | Reduces runtime of glsl-fs-convolution-1 another 13.9% +/- 0.6% (n=5).
* ra: Add an adjacency list to trade space for time in ra_simplify().Eric Anholt2011-01-181-14/+21
| | | | | | This was recommended in the original paper, but I figued "make it run" before "make it fast". Now we make it fast. Reduces the runtime of glsl-fs-convolution-1 by 12.7% +/- 0.6% (n=5).
* ra: Trade off some space to get time efficiency in ra_set_finalize().Eric Anholt2011-01-181-6/+32
| | | | | | | | | | | | | | | | | Our use of the register allocator in i965 is somewhat unusual. Whereas most architectures would have a smaller set of registers with fewer register classes and reuse that across compilation, we have 1, 2, and 4-register classes (usually) and a variable number up to 128 registers per compile depending on how many setup parameters and push constants are present. As a result, when compiling large numbers of programs (as with glean texCombine going through ff_fragment_shader), we spent much of our CPU time in computing the q[] array. By keeping a separate list of what the conflicts are for a particular reg, we reduce glean texCombine time 17.0% +/- 2.3% (n=5). We don't expect this optimization to be useful for 915, which will have a constant register set, but it would be useful if we were switch to this register allocator for Mesa IR.
* mesa: move declaration before codeBrian Paul2010-10-221-1/+2
|
* i965: Add support for register spilling.Eric Anholt2010-10-211-0/+63
| | | | | It can be tested with if (0) replaced with if (1) to force spilling for all virtual GRFs. Some simple tests work, but large texturing tests fail.
* ra: First cut at a graph-coloring register allocator for mesa.Eric Anholt2010-09-291-0/+361
Notably missing is choice of registers to spill.