aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_qpu_schedule.c
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Fix a pasteo in scheduling condition flag usage.Eric Anholt2016-07-041-1/+1
| | | | | | | Noticed by code inspection. This hasn't been too big of a deal, because our cond usages all start out as adder ops, either MOVs or the FTOI for Z writes. MOVs *can* get converted to mul ops during scheduling, but apparently we hadn't hit this.
* vc4: Fix latency handling for QPU texture scheduling.Eric Anholt2015-12-181-32/+50
| | | | | | There's only high latency between a complete texture fetch setup and collecting its result, not between each step of setting up the texture fetch request.
* vc4: Keep sample mask writes from being reordered after TLB writesEric Anholt2015-12-181-1/+2
| | | | | | Fixes a regression I noticed after introducing scheduling on the QIR. Cc: "11.1" <[email protected]>
* vc4: Add debugging of the estimated time to run the shader to shader-db.Eric Anholt2015-12-111-15/+38
|
* vc4: Add support for storing sample mask.Eric Anholt2015-12-041-0/+4
| | | | | From the API perspective, writing 1 bits can't turn on pixels that were off, so we AND it with the sample mask from the payload.
* vc4: Convert from simple_list.h to list.hEric Anholt2015-05-291-38/+26
| | | | list.h is a nicer and more familiar set of list functions/macros.
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Do QPU scheduling across uniform loads.Eric Anholt2014-12-091-28/+60
| | | | | | | | This means another pass of reordering the uniform data store, but it lets us pair up a lot more instructions. total instructions in shared programs: 44639 -> 43176 (-3.28%) instructions in affected programs: 36938 -> 35475 (-3.96%)
* vc4: Populate the delay field better, and schedule high delay first.Eric Anholt2014-12-091-1/+49
| | | | | | | This is a standard scheduling heuristic, and clearly helps. total instructions in shared programs: 46418 -> 44467 (-4.20%) instructions in affected programs: 42531 -> 40580 (-4.59%)
* vc4: Skip raddr dependencies for 32-bit immediate loads.Eric Anholt2014-12-091-2/+5
| | | | These don't have raddr fields.
* vc4: Mark VPM read setup as impacting VPM reads, not writes.Eric Anholt2014-12-091-1/+7
| | | | | Fixes assertion failures if we adjust scheduling priorities to emphasize VPM reads more.
* vc4: Add separate write-after-read dependency tracking for pairing.Eric Anholt2014-12-051-20/+58
| | | | | | | | | If an operation is the last one to read a register, the instruction containing it can also include the op that has the next write to that register. total instructions in shared programs: 57486 -> 56995 (-0.85%) instructions in affected programs: 43004 -> 42513 (-1.14%)
* vc4: Fix inverted priority of instructions for QPU scheduling.Eric Anholt2014-12-051-10/+10
| | | | | | | | | | | We were scheduling TLB operations as early as possible, and texture setup as late as possible. When I introduced prioritization, I visually inspected that an independent operation got moved above texture results collection, which tricked me into thinking it was working (but it was just because texture setup was being pushed late). total instructions in shared programs: 57651 -> 57486 (-0.29%) instructions in affected programs: 18532 -> 18367 (-0.89%)
* vc4: Pair up QPU instructions when scheduling.Eric Anholt2014-12-011-17/+62
| | | | | | | | | | | We've got two mostly-independent operations in each QPU instruction, so try to pack two operations together. This is fairly naive (doesn't track read and write separately in instructions, doesn't convert ADD-based MOVs into MUL-based movs, doesn't reorder across uniform loads), but does show a decent improvement on shader-db-2. total instructions in shared programs: 59583 -> 57651 (-3.24%) instructions in affected programs: 47361 -> 45429 (-4.08%)
* vc4: Introduce scheduling of QPU instructions.Eric Anholt2014-12-011-0/+693
This doesn't reschedule much currently, just tries to fit things into the regfile A/B write-versus-read slots (the cause of the improvements in shader-db), and hide texture fetch latency by scheduling setup early and results collection late (haven't performance tested it). This infrastructure will be important for doing instruction pairing, though. shader-db2 results: total instructions in shared programs: 61874 -> 59583 (-3.70%) instructions in affected programs: 50677 -> 48386 (-4.52%)