| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
We still need to emit them in V3D 3.x since there there is no mechanism to
disable them.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA,
we can go up to 7680 (actually probably 8138, but that hasn't been
validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.
|
|
|
|
|
|
|
|
| |
The _LEVELS assumes that the max is always power of two. For V3D 4.2, we
can support up to 7680 non-power-of-two MSAA textures, which will let X11
support dual 4k displays on newer hardware.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This reverts commit ccce9409470c1053c40c822d759b9bd417062bc0, leaving a
note as to why we had to (corruption in chromium, breaking some GLES3.1
tests).
|
|
|
|
|
|
|
|
|
|
|
| |
There are two cases where v3d's sampler view's resource doesn't match the
base's: shadow textures for sampling from raster, and pointing at the
separate depth texture for z32f_s8x24. We only want to update shadow for
the first case.
Fixes
dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_draw
when run after the previous testcase.
|
| |
|
|
|
|
|
|
|
| |
We were emitting a dummy load for when the VS doesn't load any attributes,
but we also need to emit a dummy load for when the render VS loads
attributes but the binner VS doesn't. Fixes simulator assertion failures
and GPU hangs on KHR-GLES31.core.texture_gather.\*
|
|
|
|
|
|
| |
We are assured that the input segment size field is ignored for
!separate_segs mode, and now the simulator wants an in-range value set
regardless of whether it's functionally ignored or not.
|
|
|
|
|
|
|
|
|
|
| |
The CTS fails on
dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.*vertex
when they are enabled, due to the VS being run for both bin and render. I
think this behavior is expected to be valid, but I can't find text in
atomic counters or SSBO specs saying so (the closed I found was in
shader_image_load_store). Just disable it for now, since the closed
source driver doesn't expose vertex atomic counters/SSBOs either.
|
|
|
|
|
|
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use the same register spilling infrastructure for our loads/stores
of indirect access of temp variables, instead of doing an if ladder.
Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db.
Also causes several other KSP shaders with large bodies and large loop
counts to not be force-unrolled.
The change was originally motivated by NOLTIS slightly modifying register
pressure in piglit temp mat4 array read/write tests, triggering register
allocation failures.
|
|
|
|
|
| |
We were missing a * 4 even if the particular hardware matched our
assumption.
|
|
|
|
|
|
|
|
| |
While waiting for the CSD UABI to get reviewed, I keep having to rebase
the CS patch. Just land the compiler side for now to keep it from
diverging.
For now this covers just GLES 3.1 compute shaders, not CL kernels.
|
| |
|
|
|
|
|
|
|
| |
This required to calculate sizes correctly when we have bindless
samplers/images.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o
at the st level instead of the backend, packing uniforms with no padding
at all, and lowering to UBOs.
Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS
leads to the driver needing to either link against the type size function
in mesa/st, or duplicating it in the backend. Given that all backends
want this lower-io as far as I can tell, just move it to mesa/st to
resolve the link issue and avoid the driver author needing to understand
st's uniforms layout.
Incidentally, fixes uniform layout failures in nouveau in:
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex
and I think in Lima as well.
v2: fix indents
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We'll need to do a render-based blit for scissors, since the TFU (as seen
in this conditional) can only update a whole surface.
Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.")
Fixes piglit fbo-scissor-blit.
|
|
|
|
|
|
|
| |
4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in
the CTS at 8k and 16k. 4k at least lets us get one 4k display working.
Cc: [email protected]
|
|
|
|
|
|
| |
I have v3d allocating enough initial allocation memory that we've been
passing tests without it, but to match kernel behavior more it would be
good to actually exercise the OOM path.
|
|
|
|
|
|
|
| |
to indicate write usage per buffer.
This is just a hint (it will be used by radeonsi).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea was that we could skip uploading the constant-indexed uniform
data and just upload the uniforms that are variably-indexed. However,
since the VS bin and render shaders may have a different set of uniforms
used, this meant that we had to upload the UBO for each of them. The
first case is generally a fairly small impact (usually the uniform array
is the most space, other than a couple of FSes in shader-db), while the
second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms
instead of 18k.
Given that the optimization is of dubious value, has a big downside, and
is quite a bit of code, just drop it. No change in shader-db. No change
on 3DMMES2 (n=15).
|
|
|
|
|
|
|
|
|
|
| |
We'd end up with the constant offset in the uniform stream anyway, since
they're bigger than small immediates. Avoids the extra uniforms and adds
in the shader in favor of just adding once on the CPU.
shader-db:
total instructions in shared programs: 6496865 -> 6494851 (-0.03%)
total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
|
|
|
|
|
|
| |
I want to reuse this for encoding small constant UBO/SSBO offsets into the
uniform stream to reduce the extra uniform loads and adds for the small
constant offsets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glMemoryBarrier() function makes shader memory stores ordered with
respect to things specified by the given bits. Until now, st/mesa has
ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
saying that drivers should implicitly perform the needed flushing.
This seems like a pretty big assumption to make. Instead, this commit
opts to translate them to new PIPE_BARRIER bits, and adjusts existing
drivers to continue ignoring them (preserving the current behavior).
The i965 driver performs actions on these memory barriers. Shader
memory stores go through a "data cache" which is separate from the
render cache and other read caches (like the texture cache). All
memory barriers need to flush the data cache (to ensure shader memory
stores are visible), and possibly invalidate read caches (to ensure
stale data is no longer visible). The driver implicitly flushes for
most caches, but not for data cache, since ARB_shader_image_load_store
introduced MemoryBarrier() precisely to order these explicitly.
I would like to follow i965's approach in iris, flushing the data cache
on any MemoryBarrier() call, so I need st/mesa to actually call the
pipe->memory_barrier() callback.
Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate
and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on
the iris driver.
Roland said this looks reasonable to him.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This allows DRI3 to pick between UIF and raster according to whether we're
pageflipping or not and whether the pageflipping display can do UIF,
avoiding copies for the windowed/composited case that previously was
forced to linear.
Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783%
+/- 13.1719% (n=3)
|
|
|
|
|
|
|
| |
We ask the other side to make a buffer with the right number of pages, and
then just store the UIF in it. This avoids an extra silent copy of the
buffer from linear to UIF if it gets used for texturing (X11 copy-based
swapbuffers, GL compositors).
|
|
|
|
|
| |
The samplers are already ready for this, we just needed to make sure that
layout chose UIF for level 0.
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
This makes v3d match vc4's destroy path.
Fixes: e113b21cb779 ("v3d: Add renderonly support.")
|
|
|
|
|
|
| |
This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from
11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target
devices.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, tgsi_to_nir will output NIR that is tailored to
the given pipe, by reading its capabilities and adjusting the NIR code
to those capabilities similarly to how glsl_to_nir works.
It also adds an optimization loop that brings the output NIR in line
with what glsl_to_nir outputs. This is necessary for the same reason
why glsl_to_nir has its own optimization loop: currently not every
driver does these optimizations yet.
For uses which cannot pass a pipe_screen we also keep a variant
called tgsi_to_nir_noscreen which keeps the old behavior.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Acked-By: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
v3d may be built as part of a set of drivers in a system not requiring
NEON, but we know V3D devices will be paired with CPUs with NEON so we
should be able to use this asm.
Fixes: 0c05198d6b5b ("v3d: Always enable the NEON utile load/store code.")
|
|
|
|
|
| |
It's unused in the VS (since we need vattr_sizes[] anyway), so move it to
FS prog data.
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently we need disable-EZ flagged, not just "does Z writes".
Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.
Signed-off-by: Eric Anholt <[email protected]>
Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
|
|
|
|
|
|
| |
Fixes intermittent fails in
dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices
and others (particularly when run as part of a CTS run)
|
|
|
|
|
|
|
| |
Otherwise, we might have pages accessible that shouldn't be and miss out
on errors. This is unlikely for most tests since v3d_hw_get_mem() is big
enough that it'll be a freshly zeroed mmap, but if screens are destroyed
and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
|
|
|
|
|
|
|
|
|
| |
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.
Fixes KHR-GL45.limits.max_fragment_input_components
Signed-off-by: Karol Herbst <[email protected]>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Cc: 19.0 <[email protected]>
|
|
|
|
|
|
|
| |
If you only bound rt 1+, we'd still emit a write to the rt0 that isn't
present (noticed while debugging an
ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero
regression in another change).
|
|
|
|
| |
I was just leaving the other MRT targets than DATA0 out, by accident.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ken's rework of mesa/st builtins to NIR means that we'll have more NIR
shaders with color output types that are mismatched with the render target
types. Since this is behavior that GLSL doesn't require, add it as a
shader_info option so the driver can know that it needs to ignore the FS
output's base type in favor of the actual render target's. This prevents
needing additional variants in several mesa/st paths (clear, pbo upload,
pbo download), given that the driver already has to handle the variants
for any TGSI being passed to it (from u_blitter, for example).
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Reported by Coverity: in the case of unsupported modifier request, the
code does not jump to the “fail” label to destroy the acquired resource.
CID: 1435704
Signed-off-by: Ernestas Kulik <[email protected]>
Fixes: 45bb8f295710 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
|
|
|
|
|
|
|
| |
I can't imagine the new HW block being paired with a v6 CPU, so don't
bother with the CPU detection that vc4 had to do.
Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sampler border color is encoded in the TMU's blending format (half
floats, 32-bit floats, or integers) and must be clamped to the format's
range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't
know about how we're abusing the swizzle to support BGRA, A, and LA, so we
have to pre-swizzle the border color for those.
We don't really want to spend half a kb on sampler states in most cases,
so skip generating the variants when the border color is unused or is
0,0,0,0.
|
|
|
|
|
| |
Samplers are small (8-24 bytes), so allocating 4k for them is a huge
waste.
|
| |
|
|
|
|
|
|
|
|
| |
When the sampler view is in sample-stencil mode, we need to return uint
stencil values. To do that, fill in the format table to return R8I, and
have the sampler view point at the separate stencil buffer.
Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d
|
|
|
|
| |
We need to pick the 8-bit unorm value out, not the depth component.
|
| |
|