| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Extracted and fixed up from a patch by jonasarrow on github. This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
|
|
|
|
| |
Caught problems in the upcoming DDX/DDY implementation.
|
|
|
|
|
|
| |
This isn't used yet, it's just a first step toward loop validation.
During the main parsing of instructions, we need to know when we hit a new
basic block so that we can reset validated state.
|
| |
|
|
|
|
|
| |
They're only f16-to-f32 on a float operation, otherwise they're
i16-to-i32.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
|
| |
|
|
|
|
|
| |
It's only an f16 conversion if you're doing a float operation, otherwise
it's 16 bit signed to 32-bit signed.
|
|
|
|
| |
Fixes glsl-fs-frontfacing.
|
|
|
|
|
|
| |
The A-file unpack is just like R4 unpack, except that if you don't do a
floating-point operation it won't do float conversion (so int16 gets
scaled up to int32).
|
|
|
|
|
| |
Fixes redefinition warnings if you end up including this header before
util stuff.
|
|
|
|
|
|
|
|
|
| |
This doesn't load/store the Z contents across submits yet. It also
disables early Z, since it's going to require tracking of Z functions
across multiple state updates to track the early Z direction and whether
it can be used.
v2: Move the key setup to before the search for the key.
|
|
|
|
|
|
|
|
| |
Only rgba8888 works, and only a single texture unit, and it's only under
simulation because I haven't built the kernel interface yet.
v2: Rebase on helpers.
v3: Fold in the don't-break-the-arm-build fix.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.
Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.
v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).
|
|
This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders. I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those. I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.
v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
time, and dropping the dricommon drm bits) and fix a copyright header
(thanks, Roland)
|