summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/Makefile.sources
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Coalesce MOVs into VPM with the instructions generating the values.Eric Anholt2014-12-181-0/+1
| | | | | total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Introduce scheduling of QPU instructions.Eric Anholt2014-12-011-0/+1
| | | | | | | | | | | | This doesn't reschedule much currently, just tries to fit things into the regfile A/B write-versus-read slots (the cause of the improvements in shader-db), and hide texture fetch latency by scheduling setup early and results collection late (haven't performance tested it). This infrastructure will be important for doing instruction pairing, though. shader-db2 results: total instructions in shared programs: 61874 -> 59583 (-3.70%) instructions in affected programs: 50677 -> 48386 (-4.52%)
* vc4: Update for new kernel ABI with async execution and waits.Eric Anholt2014-11-201-0/+1
| | | | | Our submits now return immediately and you have to manually wait for things to complete if you want to (like a normal driver).
* vc4: correctly include the source filesEmil Velikov2014-10-161-3/+0
| | | | | | | | | The kernel files are built into a separate static library and all the functions that require it are already wrapped in ifdef USE_VC4_SIMULATOR. Don't forget the header file :) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Move the mirrored kernel code to a kernel/ directory.Eric Anholt2014-10-091-2/+3
| | | | Now this whole setup matches the kernel's file layout much more closely.
* vc4: Add the necessary stubs for occlusion queries.Eric Anholt2014-09-291-0/+1
| | | | | | We have to expose them for GL 2.0, but we just always return a value of 0. We should be advertising 0 query bits instead of 64, but gallium doesn't have plumbing for that yet. At least this stops the segfaults.
* vc4: Actually implement VC4_DEBUG=cl.Eric Anholt2014-09-181-0/+1
|
* vc4: Add support for reordering the uniform stream after optimization.Eric Anholt2014-09-171-0/+1
| | | | | | This allows for introducing dead code eliminating of uniforms, copy propagation of uniforms, and instruction rescheduling between instructions that both read uniforms.
* vc4: Move register allocation to a separate file.Eric Anholt2014-09-161-0/+1
| | | | | I'm going to be rewriting it all, and having it mixed up with the QIR-to-QPU opcode translation was messy.
* gallium/vc4: ship all files in the tarballEmil Velikov2014-09-051-1/+13
| | | | | | | | | - include all headers in Makefile.sources Cc: Eric Anholt <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Matt Turner <[email protected]>
* vc4: Add a CSE optimization pass.Eric Anholt2014-09-041-0/+1
| | | | | | Debugging a regression in discard support was just too full of duplicate instructions, so I decided to remove them instead of re-analyzing each of them as I dumped their outputs in simulation.
* vc4: Add support for all the texture and FBO formats we can.Eric Anholt2014-08-221-0/+1
| | | | | | | Now that tiling is in place, we can expose the other formats. Depth is still broken (need to make changes in the shader), but if you don't expose it things crash all over. SNORM is dropped, but we could re-add it later with some shader fixes to handle converting between [0,1] and [-1,1].
* vc4: Add support for texture tiling.Eric Anholt2014-08-221-0/+1
| | | | | | This still treats everything as RGBA8888 for the most part, same as before. This is a prerequisite for handling other texture formats, since only RGBA8888 has a raster-layout mode.
* vc4: Rewrite the kernel ABI to support texture uniform relocation.Eric Anholt2014-08-111-0/+1
| | | | | | | | | | This required building a shader parser that would walk the program to find where the texturing-related uniforms are in the uniforms stream. Note that as of this commit, a new kernel is required for rendering on actual VC4 hardware (currently that commit is named "drm/vc4: Introduce shader validation and better command stream validation.", but is likely to be squashed as part of an eventual merge of the kernel driver).
* vc4: Switch simulator to using kernel validatorEric Anholt2014-08-111-0/+1
| | | | | | | | This ensures that when I'm using the simulator, I get a closer match to what behavior on real hardware will be. It lets me rapidly iterate on the kernel validation code (which otherwise has a several-minute turnaround time), and helps catch buffer overflow bugs in the userspace driver faster.
* vc4: Add copy propagation between temps.Eric Anholt2014-08-081-0/+1
| | | | | | | | We put in a bunch of extra MOVs for program outputs, and this can clean those up. We should do uniforms, too, though. v2: Fix missing flagging of progress when we actually optimize. Caught by Aaron Watry.
* vc4: Add dead code elimination.Eric Anholt2014-08-081-0/+1
| | | | | | This cleans up a bunch of noise in the compiled coordinate shaders (since we don't need the varying outputs), and also from writemasked instructions with negated src operands.
* vc4: Add an initial pass of algebraic optimization.Eric Anholt2014-08-081-0/+1
| | | | | There was a lot of extra noise in my piglit shader dumps because of silly CMPs.
* vc4: Switch to actually generating vertex and fragment shader code from TGSI.Eric Anholt2014-08-081-0/+2
| | | | | | | | | | | | | | | | | | This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
* vc4: Initial skeleton driver import.Eric Anholt2014-08-081-0/+15
This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)