aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_qpu.h
Commit message (Collapse)AuthorAgeFilesLines
* vc4: If a QIR source has an unpack set, print it.Eric Anholt2015-10-261-0/+3
| | | | Not used yet, but will be.
* vc4: Reuse QPU dumping for packing bits in QIR.Eric Anholt2015-08-211-0/+7
|
* vc4: Use the pure/const attributes on a bunch of our QPU functions.Eric Anholt2015-07-171-15/+15
| | | | | | On a release build, this makes the rest of vc4_qpu_validate.c go away (the compiler didn't know that our qpu helper function calls had no side effects).
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Refuse to merge two ops that both access shared functions.Eric Anholt2014-12-051-0/+1
| | | | | Avoids assertion failures in vc4_qpu_validate.c if we happen to find the right set of operations available.
* vc4: Pair up QPU instructions when scheduling.Eric Anholt2014-12-011-1/+1
| | | | | | | | | | | We've got two mostly-independent operations in each QPU instruction, so try to pack two operations together. This is fairly naive (doesn't track read and write separately in instructions, doesn't convert ADD-based MOVs into MUL-based movs, doesn't reorder across uniform loads), but does show a decent improvement on shader-db-2. total instructions in shared programs: 59583 -> 57651 (-3.24%) instructions in affected programs: 47361 -> 45429 (-4.08%)
* vc4: Introduce scheduling of QPU instructions.Eric Anholt2014-12-011-0/+3
| | | | | | | | | | | | This doesn't reschedule much currently, just tries to fit things into the regfile A/B write-versus-read slots (the cause of the improvements in shader-db), and hide texture fetch latency by scheduling setup early and results collection late (haven't performance tested it). This infrastructure will be important for doing instruction pairing, though. shader-db2 results: total instructions in shared programs: 61874 -> 59583 (-3.70%) instructions in affected programs: 50677 -> 48386 (-4.52%)
* vc4: Add another check for invalid TLB scoreboard handling.Eric Anholt2014-12-011-0/+3
| | | | This was caught by an assertion in the simulator.
* vc4: Merge qpu_a_NOP() and qpu_m_NOP to a single qpu_NOP() helper.Eric Anholt2014-08-241-2/+1
| | | | | | Now that qpu_inst() ignores the WADDR from the other half of the instruction, we can set both the ADD and MUL WADDRs in the NOP helper. Thanks to that, we also no longer need to qpu_inst(NOP, NOP).
* vc4: Make some helpers for setting condition codes in instructions.Eric Anholt2014-08-221-0/+2
|
* vc4: Switch to actually generating vertex and fragment shader code from TGSI.Eric Anholt2014-08-081-2/+3
| | | | | | | | | | | | | | | | | | This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
* vc4: Initial skeleton driver import.Eric Anholt2014-08-081-0/+201
This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)