aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_qpu_validate.c
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Avoid false scheduling dependencies for LOAD_IMMs.Eric Anholt2016-11-301-0/+4
| | | | | | | | | | Noticed in shaders with branching, where we ended up scheduling delay slots near the start of a block for the uniforms reset setup. total instructions in shared programs: 93970 -> 93951 (-0.02%) instructions in affected programs: 3117 -> 3098 (-0.61%) 3DMMES performance +0.423087% +/- 0.133521% (n=9,10)
* vc4: Add a bit of QPU validation for threaded shaders.Eric Anholt2016-11-121-1/+102
| | | | | These are both bugs we've run into along the way writing multithreaded FS support.
* vc4: Add real validation for MUL rotation.Eric Anholt2016-08-251-10/+39
| | | | Caught problems in the upcoming DDX/DDY implementation.
* vc4: Validate QPU uniform pointer updates.Eric Anholt2016-07-121-0/+22
|
* vc4: Add QPU support for generating BRANCH instructions.Eric Anholt2016-07-121-0/+6
|
* vc4: Make vc4_qpu_validate() produce more verbose failures.Eric Anholt2016-05-061-35/+71
| | | | | | Seeing the expansion of a QPU_GET_FIELD in an assert isn't very informative, and it's hard find what's going wrong without getting a dump of the instruction that failed.
* vc4: Fix compiler warnings on release builds.Eric Anholt2015-07-141-0/+7
|
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Refuse to merge two ops that both access shared functions.Eric Anholt2014-12-051-36/+1
| | | | | Avoids assertion failures in vc4_qpu_validate.c if we happen to find the right set of operations available.
* vc4: Fix assertion about SFU versus texturing.Eric Anholt2014-12-011-3/+4
| | | | | | | We're supposed to be checking that nothing else writes r4, which is done by the TMU result collection signal, not the coordinate setup. Avoids a regression when QPU instruction scheduling is introduced.
* vc4: Add another check for invalid TLB scoreboard handling.Eric Anholt2014-12-011-8/+13
| | | | This was caught by an assertion in the simulator.
* vc4: Fix totally broken assertions about inter-instruction reg conflicts.Eric Anholt2014-08-221-3/+18
| | | | | | | | | The spec citation talked about A and B, and I proceeded to pay no attention to whether the waddrs were for A or B. As a result, this pair of instructions would claim to conflict: mov ra4, ra4 ; nop nop, r0, r0 mov.ns ra4, rb4 ; nop nop, r0, r0
* vc4: Initial skeleton driver import.Eric Anholt2014-08-081-0/+275
This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)