| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This gives the kernel a chance to validate and lock down the data,
without having to deal with mmap zapping.
With this, GLBenchmark stops on a texture relocations, because we'd
recycled a shader BO as another shader and failed to revalidate, since we
weren't clearing the cached validation state on mmap faults.
|
| |
|
| |
|
|
|
|
| |
The hash table considers key 0 to be the empty key.
|
|
|
|
|
|
| |
On a release build, this makes the rest of vc4_qpu_validate.c go away
(the compiler didn't know that our qpu helper function calls had no
side effects).
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Cuts another 12% of vc4_uniforms.o, in exchange for computing it at
CSO creation time.
|
|
|
|
|
| |
In exchange for a bit of space and computation in CSO setup, we cut
vc4_uniform.c (draw time) code size by 4.8%.
|
|
|
|
|
| |
The rest of vc4_program.c is about compiling, while this is about
uniform emit at draw time.
|
|
|
|
|
| |
No code generation changes from this, but it'll be useful to have this
next time I go checking -Wdouble-promotion.
|
| |
|
| |
|
|
|
|
| |
Cuts another 88 bytes of compiled code.
|
|
|
|
|
| |
Drops 680 bytes of code, from avoiding a bunch of extra updates to the
next pointer in the struct.
|
|
|
|
|
|
| |
I needed to rewrite this a bit for safety checking in the next commit.
Despite being a static inline of the same thing that was being done, we
lose 36 bytes of code for some reason.
|
|
|
|
|
|
| |
Now that RCL generation is in the kernel, we don't have any other
callers. Oddly, the compiler generates another 8 bytes of code for
this, but the simplification is worth it.
|
|
|
|
|
|
|
| |
Now that we don't resize the CL as we build (it's set up at the top by
vc4_start_draw()), we can store the pointers instead of offsets from
the base. Saves a bit of math in emitting relocs (about 60 bytes of
code).
|
| |
|
|
|
|
|
|
|
|
|
| |
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
fence_finish(timeout=0) does the same thing
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We need to distinguish a shader that has separate writes to each MRT
from one which is supposed to write the data from MRT 0 to all the MRTs.
In TGSI this is done with a property. NIR doesn't have that, so encode
it as a funny location and decode on the other end.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.
The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.
v2: rename and flip meaning of flag (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
| |
It returns a new value for each sample in the TLB. We've already avoided
trying to get the same index's color multiple times at the vc4_program.c
level, so we're not losing anything by doing this.
|
|
|
|
| |
We've done so for all the other QIR instruction generation in this file.
|
|
|
|
|
| |
It's fairly separate from the rest of the TLB operations at frag end time,
and we'll need to run it multiple times to support MSAA blending.
|
|
|
|
|
| |
It's the same value for loads and stores, because they're basically the
same packet.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This doesn't fix the broken 1D cases of texsubimage, but it does prevent
segfaulting when dumping the QIR code generated in fbo-1d.
|
|
|
|
|
|
|
|
| |
We need to make sure that when we store the aligned box, we've got
initialized contents in the border. We could potentially just load the
border area, but for now let's get text rendering working in X (and fix
the GL_TEXTURE_2D errors in piglit's texsubimage test and
gl-2.1-pbo/test_tex_image)
|
|
|
|
|
|
|
| |
This avoids a security issue where userspace could have written the tile
state/tile alloc behind the GPU's back, and will apparently be necessary
for fixing stability bugs (tile state buffers are missing some top bits
for the tile alloc's address).
|
|
|
|
|
| |
There weren't that many variations of RCL generation, and this lets us
skip all the in-kernel validation for what we generated.
|
| |
|
|
|
|
|
| |
I accidentally shadowed the outside declaration, so we always returned
NULL even when we'd found something in the cache.
|
|
|
|
| |
This is useful for BO leak debugging.
|
|
|
|
|
|
|
| |
I was thinking of the MIN opcode in terms of unsigned math, but it's
signed, so if you used a negative array index, you could read before the
UBO. Fixes segfaults under simulation in piglit array indexing tests with
mprotect-based guard pages.
|
|
|
|
|
|
|
|
| |
I wanted to assert that src1 came from a non-unspilled register in shader
validation, and this easily gets us that. And, as a bonus:
total instructions in shared programs: 93347 -> 92723 (-0.67%)
instructions in affected programs: 60524 -> 59900 (-1.03%)
|
|
|
|
|
| |
Our array only goes to R3, and R4 is a special case that shouldn't be
used.
|
| |
|
|
|
|
| |
We're always looking at the slice anyway, when we would have needed it.
|
|
|
|
|
| |
This reduces the diff to the kernel, and will be useful when I make the
kernel allocate more BOs as part of validation.
|
| |
|
| |
|
| |
|
|
|
|
| |
I want to notice discrepancies when I diff -u between Mesa and the kernel.
|
|
|
|
|
|
|
|
| |
v2: Add a comment explaining why we link libmesa_glsl. Drop warning
option from freedreno. Add vc4 to the documentation for
BOARD_GPU_DRIVERS.
Reviewed-by: Emil Velikov <[email protected]>
|