| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This will let us do copy propagation of the VPM reads.
|
|
|
|
|
| |
We pass in a byte offset, not dword. I'm rather scared that this actually
managed to pass piglit, but it does fix gears.
|
|
|
|
|
| |
total instructions in shared programs: 40960 -> 39753 (-2.95%)
instructions in affected programs: 20871 -> 19664 (-5.78%)
|
|
|
|
|
|
|
|
|
| |
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4. But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:
total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs: 4381 -> 4314 (-1.53%)
|
|
|
|
|
| |
Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're
ssa.
|
|
|
|
| |
Any other caller would want it, too.
|
|
|
|
|
|
| |
Fixes piglit glsl-fs-fragcoord-zw-perspective, es3conform
gl_FragCoord_z_frag, and the rest of the piglit glsl 1.10 interpolation
tests.
|
|
|
|
|
| |
They key is, oddly enough, in the key field, not in the data field (which
is the vc4_compiled_shader *). Fixes regular failures in fp-long-alu.
|
|
|
|
|
|
|
| |
Improves framerate of 5 seconds of es2gears by 1.57473% +/- 0.669409%
(n=67).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Can't reset the CL before looking at how much we had pupt in it.
|
|
|
|
|
| |
This gives a 2.7x improvement in x11perf -rect100, since we only end up
load/storing the x11perf window, not the whole screen.
|
|
|
|
|
|
| |
This will be more important in the next commit, when there's more state to
reset to nonzero values, and I want an early exit from the submit
function.
|
|
|
|
|
| |
The callers all follow it with a flush of the context, and the flush of
the context gives us more information about how things are being flushed.
|
|
|
|
|
|
|
| |
As of 229bf4475ff0a5dbeb9bc95250f7a40a983c2e28 we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out. However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.
|
|
|
|
|
| |
They should all be set to real values by the time they're read, and
ideally if you used valgrind you'd see uninitialized value uses.
|
|
|
|
| |
It doesn't matter, since it just got truncated to 16 inside, anyway.
|
|
|
|
|
|
|
|
| |
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.
Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).
|
|
|
|
|
| |
Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673%
(n=20).
|
|
|
|
|
| |
total instructions in shared programs: 41168 -> 40976 (-0.47%)
instructions in affected programs: 18156 -> 17964 (-1.06%)
|
|
|
|
|
| |
This will let me coalesce the VPM writes into the instructions generating
the values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
|
|
|
|
| |
I want this from other passes.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Since our kernel BOs require CMA allocation, and the use of them requires
new mmaps, it's pretty expensive and we should avoid it if possible.
Copying my original design for Intel, make a userspace cache that reuses
BOs that haven't been shared to other processes but frees BOs that have
sat in the cache for over a second.
Improves glxgears framerate on RPi by around 30%.
|
|
|
|
|
|
| |
This gets DRI3 working on modesetting with glamor. It's not enabled under
simulation, because it looks like handing our dumb-allocated buffers off
to the server doesn't actually work for the server's rendering.
|
| |
|
|
|
|
|
| |
total instructions in shared programs: 43053 -> 40795 (-5.24%)
instructions in affected programs: 37996 -> 35738 (-5.94%)
|
| |
|
|
|
|
| |
We're deciding about the WS bit, not PM.
|
|
|
|
| |
This is the same basic logic from the original Broadcom driver.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not
supporting vertex ids with base vertex offset applied (so, only support
d3d10-style vertex ids) will get such a d3d10-style vertex id instead -
with the caveat they'll also need to handle the basevertex system value
too (this follows what core mesa already does).
Additionally, this is also useful for other state trackers (for instance
llvmpipe / draw right now implement the d3d10 behavior on purpose, but
with different semantics it can just do both).
Doesn't do anything yet.
And fix up the docs wrt similar values.
v2: incorporate feedback from Brian and others, better names, better docs.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
32-bit unsigned would require some adjustments to handle values >=
0x80000000.
|
| |
|
|
|
|
|
| |
It's only an f16 conversion if you're doing a float operation, otherwise
it's 16 bit signed to 32-bit signed.
|
| |
|
|
|
|
| |
There was just way too much indentation.
|
|
|
|
|
| |
We're actually allocating out of r3 now, and I missed it because I'd typed
this one as qpu_rn(3) instead of qpu_r3().
|
|
|
|
|
| |
There is an equivalent unpack function without conversion to float if you
use an integer operation instead.
|
| |
|
|
|
|
| |
I typoed this when rebasing the memory leak fixes.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
They're copied into a vc4_bo after compiling is done.
|
|
|
|
|
| |
No performance difference on a microbenchmark with norast that should hit it
enough to have mattered, n=220.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the hash_table API required the user to do all of the hashing
of keys as it passed them in. Since the hashing function is intrinsically
tied to the comparison function, it makes sense for the hash table to know
about it. Also, it makes for a somewhat clumsy API as the user is
constantly calling hashing functions many of which have long names. This
is especially bad when the standard call looks something like
_mesa_hash_table_insert(ht, _mesa_pointer_hash(key), key, data);
In the above case, there is no reason why the hash table shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash table will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
v2: change to call the old entrypoint "pre_hashed" rather than
"with_hash", like cworth's equivalent change upstream (change by
anholt, acked-in-general by Jason).
Signed-off-by: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
While the pipe_reference_* helpers set the pointer, a bare pipe_reference
doesn't. Fixes 5 ARB_sync tests.
|