aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/Makefile.sources
Commit message (Collapse)AuthorAgeFilesLines
* vc4: rename file to group vpm optimizations togetherVarad Gautam2016-03-151-1/+1
| | | | | | | | This file will contain optimization passes for both vpm reads and writes. Signed-off-by: Varad Gautam <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Do instruction scheduling on the QIR to hide texture fetch latency.Eric Anholt2015-12-181-0/+1
| | | | | | | | | | | | | | | | | | | | This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR. Texture fetch can probably take as much as the rest of the cycles of the program, so it's important to hide our other cycles during it (which is hard to do after register allocation). Also, we can queue up multiple texture requests before collecting the resulting samples, so that we keep the texture unit busy more of the time. High-settings openarena performance +2.35849% +/- 0.221154% (n=7). Also about 2-3% on the multiarb demo. 8 piglit tests (ext_framebuffer_multisample accuracy depthstencil) go from failing in rendering to failing in register allocation, but hopefully I can fix that up with some better register pressure handling here. total instructions in shared programs: 87723 -> 88448 (0.83%) instructions in affected programs: 78411 -> 79136 (0.92%) total estimated cycles in shared programs: 276583 -> 246306 (-10.95%) estimated cycles in affected programs: 265691 -> 235414 (-11.40%)
* vc4: Add support for texel fetches from MSAA resources.Eric Anholt2015-12-081-0/+1
| | | | | | | | This is the core of ARB_texture_multisample. Most of the piglit tests for GL_ARB_texture_multisample require GL 3.0, but exposing support for this lets us use the gallium blitter for multisample resolves. We can sometimes multisample resolve using just the RCL, but that requires that the blit is 1:1, unflipped, and aligned to tile boundaries.
* vc4: Move all of our fixed function fragment color handling to NIR.Eric Anholt2015-08-141-0/+1
| | | | | | | | | | This massively reduces our dependency on VC4-specific optimization passes. shader-db: total uniforms in shared programs: 32077 -> 32067 (-0.03%) uniforms in affected programs: 149 -> 139 (-6.71%) total instructions in shared programs: 98208 -> 98182 (-0.03%) instructions in affected programs: 2154 -> 2128 (-1.21%)
* vc4: Start adding a NIR-based output lowering pass.Eric Anholt2015-07-301-0/+1
| | | | | | For now, this just splits up store_output intrinsics to be scalars, and drops unused outputs in the coordinate shader. My goal is to be able to drop a bunch of my VC4-specific optimization by letting NIR handle it.
* vc4: Move uniforms handling to a separate file.Eric Anholt2015-07-141-0/+1
| | | | | The rest of vc4_program.c is about compiling, while this is about uniform emit at draw time.
* vc4: Move RCL generation into the kernel.Eric Anholt2015-06-171-0/+1
| | | | | There weren't that many variations of RCL generation, and this lets us skip all the in-kernel validation for what we generated.
* vc4: Move vc4_packet.h to the kernel/ directory, since it's also shared.Eric Anholt2015-06-161-1/+1
| | | | I want to notice discrepancies when I diff -u between Mesa and the kernel.
* vc4: Drop subdirectory in vc4 build.Eric Anholt2015-06-091-0/+4
| | | | | Just because we put the source in a subdir, doesn't mean we need helper libraries in the build. This will also simplify the Android build setup.
* vc4: Move the blit code to a separate file.Eric Anholt2015-04-131-0/+1
| | | | | There will be other blit code showing up, and it seems like the place you'd look.
* vc4: Separate out a bit of code for submitting jobs to the kernel.Eric Anholt2015-04-131-0/+1
| | | | | | I want to be able to have multiple jobs being set up at the same time (for example, a render job to do a little fixup blit in the course of doing a render to the main FBO).
* vc4: Add a constant folding pass.Eric Anholt2015-03-301-0/+1
| | | | | | | | | | | | This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)
* vc4: Enforce one-uniform-per-instruction after optimization.Eric Anholt2015-02-191-0/+1
| | | | | | | | | | | | | | | This lets us more intelligently decide which uniform values should be put into temporaries, by choosing the most reused values to push to temps first. total uniforms in shared programs: 13457 -> 13433 (-0.18%) uniforms in affected programs: 1524 -> 1500 (-1.57%) total instructions in shared programs: 40198 -> 40019 (-0.45%) instructions in affected programs: 6027 -> 5848 (-2.97%) I noticed this opportunity because with the NIR work, some programs were happening to make different uniform copy propagation choices that significantly increased instruction counts.
* vc4: Coalesce MOVs into VPM with the instructions generating the values.Eric Anholt2014-12-181-0/+1
| | | | | total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Introduce scheduling of QPU instructions.Eric Anholt2014-12-011-0/+1
| | | | | | | | | | | | This doesn't reschedule much currently, just tries to fit things into the regfile A/B write-versus-read slots (the cause of the improvements in shader-db), and hide texture fetch latency by scheduling setup early and results collection late (haven't performance tested it). This infrastructure will be important for doing instruction pairing, though. shader-db2 results: total instructions in shared programs: 61874 -> 59583 (-3.70%) instructions in affected programs: 50677 -> 48386 (-4.52%)
* vc4: Update for new kernel ABI with async execution and waits.Eric Anholt2014-11-201-0/+1
| | | | | Our submits now return immediately and you have to manually wait for things to complete if you want to (like a normal driver).
* vc4: correctly include the source filesEmil Velikov2014-10-161-3/+0
| | | | | | | | | The kernel files are built into a separate static library and all the functions that require it are already wrapped in ifdef USE_VC4_SIMULATOR. Don't forget the header file :) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Move the mirrored kernel code to a kernel/ directory.Eric Anholt2014-10-091-2/+3
| | | | Now this whole setup matches the kernel's file layout much more closely.
* vc4: Add the necessary stubs for occlusion queries.Eric Anholt2014-09-291-0/+1
| | | | | | We have to expose them for GL 2.0, but we just always return a value of 0. We should be advertising 0 query bits instead of 64, but gallium doesn't have plumbing for that yet. At least this stops the segfaults.
* vc4: Actually implement VC4_DEBUG=cl.Eric Anholt2014-09-181-0/+1
|
* vc4: Add support for reordering the uniform stream after optimization.Eric Anholt2014-09-171-0/+1
| | | | | | This allows for introducing dead code eliminating of uniforms, copy propagation of uniforms, and instruction rescheduling between instructions that both read uniforms.
* vc4: Move register allocation to a separate file.Eric Anholt2014-09-161-0/+1
| | | | | I'm going to be rewriting it all, and having it mixed up with the QIR-to-QPU opcode translation was messy.
* gallium/vc4: ship all files in the tarballEmil Velikov2014-09-051-1/+13
| | | | | | | | | - include all headers in Makefile.sources Cc: Eric Anholt <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Matt Turner <[email protected]>
* vc4: Add a CSE optimization pass.Eric Anholt2014-09-041-0/+1
| | | | | | Debugging a regression in discard support was just too full of duplicate instructions, so I decided to remove them instead of re-analyzing each of them as I dumped their outputs in simulation.
* vc4: Add support for all the texture and FBO formats we can.Eric Anholt2014-08-221-0/+1
| | | | | | | Now that tiling is in place, we can expose the other formats. Depth is still broken (need to make changes in the shader), but if you don't expose it things crash all over. SNORM is dropped, but we could re-add it later with some shader fixes to handle converting between [0,1] and [-1,1].
* vc4: Add support for texture tiling.Eric Anholt2014-08-221-0/+1
| | | | | | This still treats everything as RGBA8888 for the most part, same as before. This is a prerequisite for handling other texture formats, since only RGBA8888 has a raster-layout mode.
* vc4: Rewrite the kernel ABI to support texture uniform relocation.Eric Anholt2014-08-111-0/+1
| | | | | | | | | | This required building a shader parser that would walk the program to find where the texturing-related uniforms are in the uniforms stream. Note that as of this commit, a new kernel is required for rendering on actual VC4 hardware (currently that commit is named "drm/vc4: Introduce shader validation and better command stream validation.", but is likely to be squashed as part of an eventual merge of the kernel driver).
* vc4: Switch simulator to using kernel validatorEric Anholt2014-08-111-0/+1
| | | | | | | | This ensures that when I'm using the simulator, I get a closer match to what behavior on real hardware will be. It lets me rapidly iterate on the kernel validation code (which otherwise has a several-minute turnaround time), and helps catch buffer overflow bugs in the userspace driver faster.
* vc4: Add copy propagation between temps.Eric Anholt2014-08-081-0/+1
| | | | | | | | We put in a bunch of extra MOVs for program outputs, and this can clean those up. We should do uniforms, too, though. v2: Fix missing flagging of progress when we actually optimize. Caught by Aaron Watry.
* vc4: Add dead code elimination.Eric Anholt2014-08-081-0/+1
| | | | | | This cleans up a bunch of noise in the compiled coordinate shaders (since we don't need the varying outputs), and also from writemasked instructions with negated src operands.
* vc4: Add an initial pass of algebraic optimization.Eric Anholt2014-08-081-0/+1
| | | | | There was a lot of extra noise in my piglit shader dumps because of silly CMPs.
* vc4: Switch to actually generating vertex and fragment shader code from TGSI.Eric Anholt2014-08-081-0/+2
| | | | | | | | | | | | | | | | | | This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
* vc4: Initial skeleton driver import.Eric Anholt2014-08-081-0/+15
This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)