| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had bugs in the old path where I was laying out as tiled (so we'd render
tiled) but then only allocating space in the shared object for linear
rendering. The resource_from_handle makes it so the same layout choices
are made in both the import and export scanout cases. Also, fixes a leak
of the fd that was tripping up the CTS.
Now that we're checking PIPE_BIND_SHARED to choose to use RO, the
DRM_FORMAT_MOD_LINEAR check wasn't needed any more.
Fixes visual corruption and MMU faults in X in renderonly mode.
Fixes: bd09bb1629a7 ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")
|
|
|
|
|
|
| |
Part of fixing DRI3 rendering with RO on X11.
Fixes: e113b21cb779 ("v3d: Add renderonly support.")
|
|
|
|
|
| |
We don't have a way to talk to RO about modifiers it can do yet, so assume
the minimum.
|
|
|
|
|
|
| |
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
|
|
|
|
|
| |
So far I assume that all the buffers get written. If they weren't, you'd
probably be using UBOs instead.
|
|
|
|
|
|
| |
This was an arbitrary "we support lots of stuff" value when I started the
driver. However, at 400 we expose OES_gpu_shader5, which claims support
for dynamically indexing samplers, which the driver doesn't do yet.
|
|
|
|
|
|
|
| |
Otherwise, the simulator raises the GMP interrupt and waits for it to be
handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when
violation happens gives us a chance to look at the backtrace of whatever
thread triggered the violation.
|
|
|
|
|
|
| |
Fixes
dEQP-GLES31.functional.state_query.integer.max_framebuffer_height_getboolean
when GLES3 is enabled.
|
|
|
|
| |
This will be needed for SSBOs and image_load_store.
|
|
|
|
|
| |
This is part of GLES 3.1, and with the NIR lowering we're now passing the
GLES31 testcases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I noticed that a VS I was debugging was missing all of its output stores
-- outputs_written was for POS, VAR0, VAR3, while the shader's variables
were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed
to be doing here, but we can just walk the declared variables and avoid
both this bug and the emission of extra stvpms for less-than-vec4
varyings.
|
|
|
|
|
|
| |
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
|
|
|
|
|
| |
Looking at some assembly dumps for an optimization, we were clearly
missing important parts of the shader!
|
|
|
|
|
|
|
| |
We'll still fail at draw time, but this avoids a regression in shader-db
execution once I enable TLB writes in precompiles.
Fixes: b38e4d313fc2 ("v3d: Create a state uploader for packing our shaders together.")
|
| |
|
|
|
|
|
|
|
| |
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general. I formatted it more like Intel's so I can mostly reuse their
report script.
|
|
|
|
|
|
|
|
|
| |
I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.
|
|
|
|
|
| |
This is the right channel to report these things, so that end-users don't
need to know each driver's custom debug options.
|
|
|
|
|
|
|
| |
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
The shadow state is now in the sampler.
|
|
|
|
|
|
| |
i915 render nodes refuse the dumb ioctls, so the simulator would crash on
the original non-apitrace shader-db. Replace them with direct i915 calls
if we detect that we're on one of their gem fds.
|
|
|
|
|
|
| |
This calls the expensive uif offset function once per utile, but it still
gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over
calling it on each pixel.
|
|
|
|
|
|
|
| |
This lets us store the non-PBO glTexImage data directly into the tiled
image without making an extra untiled memcpy for the gallium transfer.
Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around
in the kernel mapping and unmapping the transfer's temporary area.
|
| |
|
|
|
|
|
|
|
| |
They're raster order anyway, so we'd assertion fail along with wasting
bandwidth.
Fixes: 6ad9e8690d14 ("v3d: Add support for texturing from linear.")
|
|
|
|
|
|
|
|
|
|
| |
We're waiting for the jobs-completed count to increment (with wrapping),
not to reach its starting state. This mostly ended up working out because
the next v3d_hw_tick() for a submit CL would end up doing the TFU
operation first, but it did fail when a blit was used for glReadPixels()
at the end of a test.
Fixes: ee0549ff9ab3 ("v3d: Add the V3D TFU submit interface to the simulator.")
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the UAPI, the first BO is the destination, and the one the kernel
should do an exclusive reservation on. Currently we only do exclusive
reservations, anyway. However, in the simulator path I was only copying
back the "destination" BO (actually src in this case), and this caused
regressions once I fixed the simulator to actually complete TFU before
returning (since otherwise, the TFU op would happen at the start of the
next CL submit and the draw would get the right contents).
Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.")
|
|
|
|
|
|
| |
These have all been floating in my head, and while I've thought about
encoding them in issues on gitlab once they're enabled, they also make
sense to just have in the area of the code you'll need to work in.
|
|
|
|
|
| |
This will be a lot easier than my usual "38400.000000? that looks like a
viewport scale" decoding strategy.
|
|
|
|
|
|
| |
Follows 3954331aff23 ("vc4: Pull uinfo->data[i] dereference out to the top
of the loop.") which showed a large performance win for vc4, but also
cleans up the code a decent bit.
|
|
|
|
|
|
| |
In trying to enable compute shaders, I found that a bunch of deqp-gles31's
compute stuff wanted to interact with indirect dispatch. This was easy to
do on its own.
|
|
|
|
| |
This should ease my debugging next time I screw it up.
|
|
|
|
|
|
|
| |
Just like vc4, we have to support linear shared BOs for X11 on arbitrary
displays. When we're faced with a request to texture from one of those,
make a shadow image that we copy using the TFU at the start of the draw
call.
|
|
|
|
| |
This will be useful in particular for blits from raster to UIF for X11.
|
|
|
|
|
|
| |
generatemipmap is just filling out the rest of the mipmap that's already
been written (by a mapping or a draw call), so it didn't matter. As I
reuse the TFU code for linear-to-UIF conversions, it'll start mattering.
|
|
|
|
|
| |
I didn't have any raster images in the generatemipmap path, so the
pixels-vs-bytes mixup didn't matter here.
|
|
|
|
|
|
|
|
| |
Otherwise we may race to read old contents. This didn't show up in the
CTS and piglit for me, but it did once I started using the TFU to do
linear->UIF blits for X11.
Fixes: 2ebca177dc18 ("v3d: Use the TFU to do generatemipmap.")
|
| |
|
| |
|
|
|
|
| |
Fixes: 7a30517cce8f ("broadcom/vc5: Start adding support for rendering to Z32F_S8X24_UINT.")
|
|
|
|
| |
I had a bit of it for V3D 3.x, but didn't update it for 4.x.
|
| |
|
|
|
|
| |
For shader image load/store, we want most of this logic to be shared.
|
|
|
|
|
| |
Having "v3dx_pack() {" under each #if branch would confuse emacs's
indenter.
|
|
|
|
|
| |
I think this bug predated adding v3d_layer_offset(). Noticed during an
unrelated refactor.
|
|
|
|
|
| |
It's supposed to be the dispatched sample mask for this pixel, not the GL
state's sample mask.
|
|
|
|
|
| |
If someone did TF into a UBO, we might have left the TF job un-flushed at
the point of reading.
|
|
|
|
|
| |
This simplifies a bunch of our texture handling, while introducing the
slots necessary for adding new shader stages.
|