| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Fixes INSTR_INVALID_ENC in dEQP-GLES31.functional.compute.basic.empty
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike other GPUs, Mali does not have dedicated shared memory for
compute workloads. Instead, we allocate shared memory (backed to RAM),
and the general memory access functions have modes to access shared
memory (essentially, think of these modes as adding this allocates base
+ workgroupid * stride in harder). So let's allocate enough memory
based on the shared_size parameter and supply it.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
|
|
|
|
|
|
|
|
|
|
|
|
| |
(And bifrost_fb_extra to mali_framebuffer_extra, bifrost_render_target
to mali_render_target)
These structures are the norm on midgard t760+, drop the bifrost names,
it's silly... unrelated to the rest of the series but while I'm messing
with pandecode and cleaning up bifrost abstractions, might as well.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
|
|
|
|
|
|
|
|
|
|
| |
It looks like these are the same structure, so this allows us to reuse
mali_shared_memory across architectures, and dispels with the
Bifrost-specific mystery of the scratchpads... nothing so mysterious
after all, just stack.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
|
|
|
|
|
|
|
|
|
| |
This small structure is used to configure shared memory and stack for
compute shaders, and is also present at the beginning of framebuffer
descriptors. Let's factor it out.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
|
|
|
|
|
|
|
| |
In theory the hardware doesn't care but it'll make for easier traces.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
|
|
|
|
|
|
|
|
|
| |
It lines up anyway, and Gallium shouldn't change this. (And if it does,
we'll deal with that then since CI would start failing.)
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3824>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3824>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit replaces panfrost_get_default_swizzle with an inlined
implementation where the returned values can be determined at compile
time.
According to perf, this previously used about 2% CPU for Openarena.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3824>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make Z/S writes from fragment shaders effective, we need
to set the MALI_WRITES_{Z,S} flags when the shader has a
FRAG_RESULT_{DEPTH,STENCIL} output variable.
Now that shaders can change the S value, we can expose the
STENCIL_EXPORT cap.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We discovered 2 new shader flags used when a fragment shader updates
the depth/stencil value through a ZS writeout. If those flags are not
set, the depth/stencil value stored in the depth/stencil tilebuffer
remain unchanged.
While at it, rename unknown2 into flags_hi and rename flags into
flags_lo.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
|
|
|
|
|
|
|
|
|
|
|
| |
Midgard has no dedicated samplers for Z24S8 and Z24X8 formats, and the
GPU expects the depth to be encoded in an IEEE 32-bit float. Turn all
Z24_UNORM variants into R32UI and let the shader do the conversion
using bfe+fmul instructions.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were using cubemap_stride and sometimes overwriting other
BOs such as shaders.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3692>
|
|
|
|
|
|
|
|
|
| |
Otherwise, we may be incrementing max_lod over the limit, causing a
DATA_INVALID_FAULT.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3692>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the rendering are is not covering the whole FBO, and the biggest
damage rect is empty, we can have damage.max{x,y} > damage.min{x,y},
which leads to invalid reload boxes.
Fixes: 65ae86b85422 ("panfrost: Add support for KHR_partial_update()")
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For non-debug builds, where assertions are compiled out, GCC complains
about the end of the non-void function panfrost_translate_channel_width
being reached.
Fixes: 226c1efe9a8 ("panfrost: Add more info to some assertions")
Reported-by: Piotr Oniszczuk
Suggested-by: Boris Brezillon <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3665>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3665>
|
|
|
|
|
|
|
|
|
|
| |
It pollutes the output of programs that use Panfrost and can confuse its
callers, such as test runners.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3625>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3625>
|
|
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3625>
|
|
|
|
|
|
|
|
|
| |
This fixes a crash when using Gallium HUD with QuakeSpasm when gamma
correction shaders (a QuakeSpasm feature, not part of Mesa) are used.
Reviewd-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3549>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3549>
|
|
|
|
|
|
|
|
|
|
|
|
| |
../src/gallium/drivers/panfrost/pan_context.c: In function ‘panfrost_draw_vbo’:
../src/gallium/drivers/panfrost/pan_context.c:1551:70: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
ctx->payloads[PIPE_SHADER_FRAGMENT].prefix.indices = (u64) NULL;
^
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reported-by: Icecream95 <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3543>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3543>
|
|
|
|
|
|
| |
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3525>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3525>
|
|
|
|
|
|
|
|
|
|
| |
It doesn't seem to affect any results and it's not at all clear if/why
the blob sometimes(?) sets it? So let's clean this up since this
solution isn't correct anyway.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3513>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3513>
|
|
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Fixes: d8a3501f1b2 ("panfrost: Dynamically allocate shader variants")
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3515>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3515>
|
|
|
|
|
|
| |
Acked-by: Daniel Stone <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
|
|
|
|
|
|
| |
Acked-by: Daniel Stone <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's a lot going on here (it's a ton of commits squashed together
since otherwise this would be impossible to review...)
1. We have a fast path for linear->tiled for whole (aligned) tiles, but we
have to use a slow path for unaligned accesses. We can get a pretty
major win for partial updates by using this slow path simply on the
borders of the update region, and then hit the fast path for the
tile-aligned interior. This does require some shuffling.
2. Mark the LUTs constant, which allows the compiler to inline them,
which pairs well with loop unrolling (eliminating the memory accesses
and just becoming some immediates.. which are not as immediate on
aarch64 as I'd like..)
3. Add fast path for bpp1/2/8/16. These use the same algorithm and we
have native types for them, so may as well get the fast path.
4. Drop generic path for bpp != 1/2/8/16, since these formats are
generally awful and there's no way to tile them efficienctly and
honestly there's not a good reason too either. Lima doesn't support any
of these formats; Panfrost can make the opinionated choice to make them
linear.
5. Specialize the unaligned routines. They don't have to be fully
generic, they just can't assume alignment. So now they should be nearly
as fast as the aligned versions (which get some extra tricks to be even
faster but the difference might be neglible on some workloads).
6. Specialize also for the size of the tile, to allow 4x4 tiling as well
as 16x16 tiling. This allows compressed textures to be efficiently tiled
with the same routines (so we add support for tiling ASTC/ETC textures
while we're at it)
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Tested-by: Vasily Khoruzhick <[email protected]> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's an implicit dependence on Gallium here that will add more
complexity than needed when testing/optimizing out of driver as well as
potentially Vulkanizing. We don't need a full pipe_box, just the x/y/w/h
properties directly.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Tested-by: Vasily Khoruzhick <[email protected]> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
|
|
|
|
|
|
|
|
|
| |
This fixes a crash in LZDoom where over 16 shader variants are needed
for a few shaders in some maps, and should also save a few kilobytes
of RAM as most of the time only one or two variants of the 8 previously
allocated are actually needed.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
These features are stable enough that they don't need to be hidden.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3464>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3464>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the array bo_access->readers was only cleared when there
were no unsignaled fences, which in some situations never happened.
That resulted in the array having thousands of NULL pointers, but only
a handful of active readers.
With this patch, all the unsignaled readers are moved to the front of
the array, effectively building a new array only containing the active
readers in-place. This results in the readers array usually only having
a couple of elements.
Reviewed-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
|
|
|
|
|
| |
Signed-off-by: Robert Foss <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As pointed out by Boris, what we were calling PAN_LINEAR depth textures
was in fact u-interleaved tiled (!), but we never noticed since we
flipped the flag used for sampling, leading to all sorts of fun bugs
when attempting to directly acess depth textures from the CPU. Which
begs the question -- if what we called LINEAR was tiled, how do we
actually render linear depth textures? It turns out the flags for AFBC
form a mali_block_format 2-bit code just like their render-target
counterparts, so we can render to any of the above.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reported-by: Boris Brezillon <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
|
|
|
|
|
|
|
|
|
|
| |
The per-batch headers/gpu_headers dynarrays need to be freed during the
batch cleanup to prevent leaking.
Signed-off-by: Daniel Ogorchock <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
|
|
|
|
|
|
|
|
|
| |
The bo access needs to be freed prior to removing it from its hash
table. This prevents leaking them over time.
Signed-off-by: Daniel Ogorchock <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
|
|
|
|
| |
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware separates face selection and array indexing, it looks like,
whereas Gallium smushes them together with some modulus fun. Let's fix
it so mipmapped 2D arrays work without regressing cubemaps.
Fixes dEQP-GLES3.functional.texture.filtering.2d_array.* among others.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES3.functional.ubo.multi_basic_types.single_buffer.* among
others
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Same format code as UINT... might be different in how it's fed into a
shader but we'll deal with that when we get there.
Fixes dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.usigned_int2_10_10_10.components4_vec2_quads1
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES3.functional.state_query.integers.max_samples_getinteger64
We'll have to actually implement multisampling next, but hey.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make it a lot more obvious what we're doing and fix more than a few
corner cases in the process.
Fixes
dEQP-GLES3.functional.buffer.map.write.render_as_index_array.pixel*, and
likely others.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use the lowering in nir_format_convert. There are native ops for this
so this is far from optimal and not remotely efficient but as with most
blend shader things right now, it's hard enough to get it working, so
let's focus on that for now. We'll make it fast later (once we have
GLES3 stable, we can start optimizing these things).
Fixes dEQP-GLES3.functional.fragment_ops.blend.fbo_srgb.*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
| |
Fixes abort in STK's shadow implementation.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Z32F_S8 becomes Z32F in texturing, which in turn just becomes R32F.
Fixes dEQP-GLES3.functional.texture.format.sized.*.depth32f_stencil8*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Blend shader size and location in memory is considerably constrained,
probably to facilitate optimizations (my guess is that blend shaders are
run strictly out of i-cache). We need to pack the blend shaders for each
RT of a single framebuffer together. The easiest way to do this is at
draw time which is not terribly efficient but will hold us over for now.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
We don't handle this format yet, but we will soon, and the abort in
pan_pack_color is possible even without exposing the format... Handling
this gracefully might not be required by the spec but let's not crash.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
It's needed by u_transfer_helper to know when the depth+stencil buffer
has been split.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
As that's what Gallium expects in transfer.layer_stride.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
With 3D textures we can have lots of layers, so better allocate it
dynamically at runtime.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
We have native support for this somehow. Fixes the mesa demo `points`
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since we have a separate blend shader for each render target, let's
simplify this structure and reduce the options memory footprint by 88%
or something goofy like that.
Should also enable separate blending per render target.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need to actually work out the varying format on demand, rather than
assuming rgba32f.
Fixes dEQP-GLES3.functional.fragment_out.basic.int.*
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|