| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Noticed in shaders with branching, where we ended up scheduling delay
slots near the start of a block for the uniforms reset setup.
total instructions in shared programs: 93970 -> 93951 (-0.02%)
instructions in affected programs: 3117 -> 3098 (-0.61%)
3DMMES performance +0.423087% +/- 0.133521% (n=9,10)
|
|
|
|
|
| |
These are both bugs we've run into along the way writing multithreaded FS
support.
|
|
|
|
| |
Caught problems in the upcoming DDX/DDY implementation.
|
| |
|
| |
|
|
|
|
|
|
| |
Seeing the expansion of a QPU_GET_FIELD in an assert isn't very
informative, and it's hard find what's going wrong without getting a dump
of the instruction that failed.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
|
|
|
|
|
| |
Avoids assertion failures in vc4_qpu_validate.c if we happen to find the
right set of operations available.
|
|
|
|
|
|
|
| |
We're supposed to be checking that nothing else writes r4, which is done
by the TMU result collection signal, not the coordinate setup.
Avoids a regression when QPU instruction scheduling is introduced.
|
|
|
|
| |
This was caught by an assertion in the simulator.
|
|
|
|
|
|
|
|
|
| |
The spec citation talked about A and B, and I proceeded to pay no
attention to whether the waddrs were for A or B. As a result, this pair
of instructions would claim to conflict:
mov ra4, ra4 ; nop nop, r0, r0
mov.ns ra4, rb4 ; nop nop, r0, r0
|
|
This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders. I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those. I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.
v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
time, and dropping the dricommon drm bits) and fix a copyright header
(thanks, Roland)
|