| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d
dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d
dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stencil is actually in the .w component, but we used to use SWAP to
remap the channels. This doesn't work when tiled/ubwc.
Fixes:
dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d_array
dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_cube
dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d_array
dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_cube
dEQP-GLES31.functional.stencil_texturing.misc.base_level
dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_pot
dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_npot
dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot
dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot
dEQP-GLES31.functional.texture.border_clamp.sampler.uint_stencil
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we can blit depth/stencil in a way that plays nicely with UBWC,
re-enable it.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that it can turn these blits into rendering to RB6_Z24_UNORM_S8_UINT
it can properly handle cases where only one of depth+stencil is being
blit. And this avoids lying about he format, which completely doesn't
work when UBWC is used.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For re-written z/s blits, we want to use the re-written `pipe_blit_info`
even if we have to fallback to 3d pipe (`u_blitter`). So handle that
fallback ourself.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The name 'separate' doesn't make a while lot of sense, as only one of
the cases is the blit actually split. But split out from previous patch
in an attempt to reduce the noise.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
This will get even simpler with the next patch
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The src/dst format is overriden from the pipe_blit_info, so this just
logic just serves to confuse the reader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Since we could only need a subset of the layers, and otherwise we
trigger an assert in util_max_layer()
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is more expressive and simplifies a subsequent change.
v2:
- fix one more call-site after rebase
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Tested-by: Jeffrey Hugo <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the
obsolete comment (and bonus, drop unused softpin debug flag)
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is slightly annoying because it *mostly* works.. but we have some
issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before
we can enable UBWC by default. For now it is a step forward to at least
enable it for non-z/s while we figure out how to blit z24s8+UBWC.
(The basic issue is that pretending z24s8 is an equivalently sized rgba
format for the purpose of blitting falls apart when UBWC is in the
picture.)
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
No functional change, the registers have the same layout as MRT flags
pitch reg.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
An older blob claims to support UBWC w/ r32ui an r32i, but not r32f.
Results from deqp indicate that it doesn't work with r32ui and r32i.
This *could* also just mean that use as "IBO" (image) is more limited
than as texture, although blob also doesn't seem to bother to try to use
UBWC with images at all, so hard to know for sure.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'll need this for a few edge cases, like image/sampler view that uses
a format that UBWC does not support with a resource originally created
in a format that UBWC does support.
NOTE we *could* in some cases do an in-place uncompress. But that has
a couple potential sharp edges:
1) the uncompressed buffer could have different layout, ie. a5xx
with meta and pixel data of layers/levels interleaved.
2) if it comes mid-batch, it would force flush, or somehow fixing
up cmdstream for draws already emitted. But with the resource
shadowing approach we can rely on batch re-ordering to avoid
splitting things.. older draws see the older compressed version,
newer draws see the new uncompressed version of the rsc.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
When uncompressing a UBWC buffer, we don't want to discard anything.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It doesn't come up yet, as so far we only hit this path with linear
buffers. But it will when we start re-using the shadow path for
uncompressing UBWC buffers.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
To uncompress UBWC, I want to re-use the shadow path, but we'll need a
way to request that the new buffer is not compressed.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The number of elements to draw should not be affected by the offset.
A similar fix was submitted for a6xx at 79180a05.
Fixes these dEQP tests on a5xx:
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split out a separate program config state group to run early before the
other groups.
This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.
It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw). Re-work the GMEM code to structure things a bit closer to
the blob. This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop. But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.
Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking like it is the newer fw that is
headed for linux-firmware. This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.
Cc: [email protected]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to be in a block of non buffered/context regs. Blob always
WFIs before write, so probably a good idea.
Annoyingly, compared to ealier gens, it is a bit harder to tell from the
register offset whether it is a buffered reg, it isn't as simple as
everything below 0x2000, it seems.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases the draw for the text wasn't working. This seems to be
fixed by resyncing some of the "golded registers" from blob (initial
values were based on somewhat older blob version).
Perhaps good to have a bit of soak time on master, but would be good
to eventually land in 19.x stable branches.
Cc: [email protected]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This now boils down to just picking between binning or vertex shader
and dummy_fs or real fs, which we can do in a couple of lines of code
instead. The constlen logic isn't doing what it thinks it's doing,
both constlens at this point
MAX2(s[VS].constlen, align(state->bs->constlen, 4));
are binning shader constlens. We'll have to revisit the constlen
logic, but this commit doesn't change how it works.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
a6xx only supports indirect shaders.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We have a similar function in fd6_program.c. Move to fd6_emit.h and
share.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There's already a bit of duplicated logic here and tessellation will
add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the
process.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
In preparation for refactoring fd6_draw.c a bit.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there. This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.
Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again. I'm not aware of any hardware which
need lowering for one bitsize and not the other.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We do support TF now, so it's no longer valid. Besides, if we want this
optimization, we should probably have mesa/st doing it right for everyone.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Some new flag setting disallows it due to being a security risk.
Fixes: c9c1e261064 "mesa: prevent common string formatting security issues"
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.
The a6xx version is copied from the a5xx code but it has not been
tested.
v2. [Hyunjun Ko ([email protected])] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.
Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream. Resulting in VS state getting emitted twice and no
FS state emitted.
Fixes:
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it. There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.
The return type reflects better what the function name suggests. It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it. When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TGSI's FBFETCH instruction currently only supports reading from a single
render target, but NIR intrinsics can support multiple render targets.
radeonsi can only support fetching from RT 0, but other drivers may be
able to support fetching from any render target.
To express this, this patch renames PIPE_CAP_TGSI_FS_FBFETCH to simply
PIPE_CAP_FBFETCH, and converts it from a boolean "is FBFETCH supported?"
to an integer number of render targets which can be fetched.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This "fixes" hangs seen w/ various android games. I think a similar
issue to with constant state, we need to avoid CP_LOAD_STATE until
previous draw completes.
It isn't entirely clear why blob doesn't need to do this, but it might
have a different way to accomplish the same thing.
Signed-off-by: Rob Clark <[email protected]>
|