aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_qpu_defines.h
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Add support for the 2-bit LOAD_IMM variants.Eric Anholt2016-08-251-0/+6
| | | | | Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.
* vc4: Add real validation for MUL rotation.Eric Anholt2016-08-251-0/+4
| | | | Caught problems in the upcoming DDX/DDY implementation.
* vc4: Add a bitmap of branch targets in kernel validation.Eric Anholt2016-07-121-0/+3
| | | | | | This isn't used yet, it's just a first step toward loop validation. During the main parsing of instructions, we need to know when we hit a new basic block so that we can reset validated state.
* vc4: Add QPU support for generating BRANCH instructions.Eric Anholt2016-07-121-0/+30
|
* vc4: Fix names of the 16-bit unpacksEric Anholt2015-10-241-2/+2
| | | | | They're only f16-to-f32 on a float operation, otherwise they're i16-to-i32.
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-171-2/+5
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Add a helper for changing a field in an instruction.Eric Anholt2014-12-161-0/+3
|
* vc4: Rename the 16-bit unpack #define.Eric Anholt2014-12-151-2/+2
| | | | | It's only an f16 conversion if you're doing a float operation, otherwise it's 16 bit signed to 32-bit signed.
* vc4: Add support for the FACE semantic.Eric Anholt2014-10-011-1/+1
| | | | Fixes glsl-fs-frontfacing.
* vc4: Add disasm for A-file unpack operations.Eric Anholt2014-09-231-9/+9
| | | | | | The A-file unpack is just like R4 unpack, except that if you don't do a floating-point operation it won't do float conversion (so int16 gets scaled up to int32).
* vc4: Reuse the util header instead of defining our own ARRAY_SIZE.Eric Anholt2014-09-151-2/+1
| | | | | Fixes redefinition warnings if you end up including this header before util stuff.
* vc4: Add support for depth clears and tests within a tile.Eric Anholt2014-08-111-0/+1
| | | | | | | | | This doesn't load/store the Z contents across submits yet. It also disables early Z, since it's going to require tracking of Z functions across multiple state updates to track the early Z direction and whether it can be used. v2: Move the key setup to before the search for the key.
* vc4: Add support for texturing (under simulation)Eric Anholt2014-08-111-0/+14
| | | | | | | | Only rgba8888 works, and only a single texture unit, and it's only under simulation because I haven't built the kernel interface yet. v2: Rebase on helpers. v3: Fold in the don't-break-the-arm-build fix.
* vc4: Add support for SNE/SEQ/SGE/SLT.Eric Anholt2014-08-081-0/+2
|
* vc4: Switch to actually generating vertex and fragment shader code from TGSI.Eric Anholt2014-08-081-2/+2
| | | | | | | | | | | | | | | | | | This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
* vc4: Initial skeleton driver import.Eric Anholt2014-08-081-0/+255
This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)