| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
total instructions in shared programs: 41168 -> 40976 (-0.47%)
instructions in affected programs: 18156 -> 17964 (-1.06%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it). This
infrastructure will be important for doing instruction pairing, though.
shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs: 50677 -> 48386 (-4.52%)
|
|
|
|
|
| |
Our submits now return immediately and you have to manually wait for
things to complete if you want to (like a normal driver).
|
|
|
|
|
|
|
|
|
| |
The kernel files are built into a separate static library and
all the functions that require it are already wrapped in ifdef
USE_VC4_SIMULATOR. Don't forget the header file :)
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Now this whole setup matches the kernel's file layout much more closely.
|
|
|
|
|
|
| |
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
|
| |
|
|
|
|
|
|
| |
This allows for introducing dead code eliminating of uniforms, copy
propagation of uniforms, and instruction rescheduling between instructions
that both read uniforms.
|
|
|
|
|
| |
I'm going to be rewriting it all, and having it mixed up with the
QIR-to-QPU opcode translation was messy.
|
|
|
|
|
|
|
|
|
| |
- include all headers in Makefile.sources
Cc: Eric Anholt <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
|
|
|
|
|
|
|
| |
Now that tiling is in place, we can expose the other formats. Depth is
still broken (need to make changes in the shader), but if you don't expose
it things crash all over. SNORM is dropped, but we could re-add it later
with some shader fixes to handle converting between [0,1] and [-1,1].
|
|
|
|
|
|
| |
This still treats everything as RGBA8888 for the most part, same as
before. This is a prerequisite for handling other texture formats, since
only RGBA8888 has a raster-layout mode.
|
|
|
|
|
|
|
|
|
|
| |
This required building a shader parser that would walk the program to find
where the texturing-related uniforms are in the uniforms stream.
Note that as of this commit, a new kernel is required for rendering on
actual VC4 hardware (currently that commit is named "drm/vc4: Introduce
shader validation and better command stream validation.", but is likely to
be squashed as part of an eventual merge of the kernel driver).
|
|
|
|
|
|
|
|
| |
This ensures that when I'm using the simulator, I get a closer match to
what behavior on real hardware will be. It lets me rapidly iterate on the
kernel validation code (which otherwise has a several-minute turnaround
time), and helps catch buffer overflow bugs in the userspace driver
faster.
|
|
|
|
|
|
|
|
| |
We put in a bunch of extra MOVs for program outputs, and this can clean
those up. We should do uniforms, too, though.
v2: Fix missing flagging of progress when we actually optimize. Caught by
Aaron Watry.
|
|
|
|
|
|
| |
This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.
|
|
|
|
|
| |
There was a lot of extra noise in my piglit shader dumps because of silly
CMPs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.
Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.
v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).
|
|
This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders. I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those. I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.
v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
time, and dropping the dricommon drm bits) and fix a copyright header
(thanks, Roland)
|