| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This will be needed for SSBOs and image_load_store.
|
|
|
|
|
|
|
|
|
| |
The bug caused that rgb565 framebuffers used argb1555.
Fixes: 433ca3127a3b94bfe9a513e7c7ce594e09e1359f
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Make sure that the next line starts with spaces so that bullets are
maintained throughout, add `` around a few more special tokens, and fix
SAMPLE_COUNT_TEXTURE -> SAMPLE_COUNT.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The last round of fixing 3d layer+level layout skipped the tiled case,
since tiled texture support was not in place yet. This finishes the
job.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is known when the CSO is created, so no need to patch it in later.
Also, it seems like smaller textures where the first level is small
enough to be linear, it seems like we should set linear tile mode.
See: dEQP-GLES3.functional.texture.format.unsized.rgb_unsigned_byte_3d_pot
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we'd use format/etc from the primary (z32) buffer for the
stencil (s8), due to confusion about rsc vs psurf. Rework this to drop
extra arg and push down handling of separate stencil case (and make sure
we take the fmt from the right place).
This doesn't completely fix separate-stencil, but at least it avoids the
GPU scribbling over random other cmdstream buffers and causing a bunch
of bogus fails in dEQP.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Guido Günther <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This ensures that during encoding, applications can get
the correct status of the surface before submitting
more operations on the same.
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Indrajit Das <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit f6a6da8131383d8eeee07cd59326a70f4b15866b.
With this commit we see massive amounts of asserts triggering
in lp_fence_wait(), assert(f->issued), for instance with libgl_xlib
state tracker and piglit. Not entirely sure if the assert could
just be removed.
|
|
|
|
|
|
|
|
|
|
| |
We have found some pipe_surface leaks internally.
This is the same code as surface_destroy in radeonsi.
Ideally, surface_destroy would be in pipe_screen.
Cc: 18.3 <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With Mesa 18.1, commit be973ed21f6e, si_llvm_load_input_vs()
changed the number of source 32-bit wide dword components
used for fetching vertex attributes into the vertex shader
from a constant 4 to a variable num_channels number, depending
on input data format, with some special case handling for
input data formats like 64-Bit doubles.
In the case of a GL_DOUBLE input data format with one
or two components though, e.g, submitted via ...
a) glTexCoordPointer(1, GL_DOUBLE, 0, buffer);
b) glTexCoordPointer(2, GL_DOUBLE, 0, buffer);
... the input format would be SI_FIX_FETCH_RG_64_FLOAT,
but no special case handling was implemented for that
case, so in the default path the number of 32-bit
dwords would be set to the number of float input components
derived from info->input_usage_mask. This ends with corrupted
input to the vertex shader, because fetching a 64-bit double
from the vbo requires fetching two 32-bit dwords instead of 1,
and fetching a two double input requires 4 dword fetches
instead of 2, so in these cases the vertex shader receives
incomplete/truncated input data:
a) float v = gl_MultiTexCoord0.x; -> v.x is corrupted.
b) vec2 v = gl_MultiTexCoord0.xy; -> v.x is assigned
correctly, but v.y is corrupted.
This happens with the standard TGSI IR compiled shaders.
Under NIR with R600_DEBUG=nir, we got correct behavior
because the current radeonsi nir code always assigns
info->input_usage_mask = TGSI_WRITEMASK_XYZW, thereby
always fetches 4 dwords regardless of what the shader
actually needs.
Fix this by properly assigning 2 or 4 dword fetches for
one or two component GL_DOUBLE input.
Fixes: be973ed21f6e ("radeonsi: load the right number of
components for VS inputs and TBOs")
Signed-off-by: Mario Kleiner <[email protected]>
Cc: [email protected]
Cc: Marek Olšák <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is
enabled with DXVK.
It also fixes artifacts with Fallout 4's god rays with DXVK.
Various piglit interpolateAt*() tests under NIR are also fixed.
v2: formatting fix
update commit message to include Fallout 4 and the Fixes tag
Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595
Signed-off-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If there is no last fence, due to no rendering happening yet, just
create a new signaled fence and return it, to match the expectations of
the EGL sync fence API.
Fixes random "Could not create sync fence 0x3003" assertion failures from
Skia on Android, coming from the following code:
https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427
Reproducible especially with thread count >= 4.
One could make the driver always keep the reference to the last fence,
but:
- the driver seems to explicitly destroy the fence whenever a rendering
pass completes and changing that would require a significant functional
change to the code. (Specifically, in lp_scene_end_rasterization().)
- it still wouldn't solve the problem of an EGL sync fence being created
and waited on without any rendering happening at all, which is
also likely to happen with Android code pointed to in the commit.
Therefore, the simple approach of always creating a fence is taken,
similarly to other drivers, such as radeonsi.
Tested with piglit llvmpipe suite with no regressions and following
tests fixed:
egl_khr_fence_sync
conformance
eglclientwaitsynckhr_flag_sync_flush
eglclientwaitsynckhr_nonzero_timeout
eglclientwaitsynckhr_zero_timeout
eglcreatesynckhr_default_attributes
eglgetsyncattribkhr_invalid_attrib
eglgetsyncattribkhr_sync_status
v2:
- remove the useless lp_fence_reference() dance (Nicolai),
- explain why creating the dummy fence is the right approach.
Signed-off-by: Tomasz Figa <[email protected]>
|
|
|
|
|
| |
This is part of GLES 3.1, and with the NIR lowering we're now passing the
GLES31 testcases.
|
|
|
|
|
|
|
|
| |
This way they can be shared. Build tested with meson, but not too sure
on the autotools stuff though.
Reviewed-by: Dylan Baker <[email protected]>
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This has never functioned and probably wont ever function, due to the
way gallium media state trackers are architected and the tegra video
decoder is architected.
Cc: Thierry Reding <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Fixes: 1755f608f5201e0a23f00cc3ea1b01edd07eb6ef
("tegra: Initial support")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The version exported by LLVM in its CMake configuration files can
include the “svn” suffix when building a development version (for
example “8.0.0svn”). However the exported clang headers are still found
under “lib/clang/8.0.0/”, without the “svn” suffix.
Meson takes care of removing the “svn” suffix from the version when
using the dependency’s `version()` method.
This processing is already performed in “configure.ac” when using
autotools.
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
This stores the raster state and calls the correct primconvert interface
using the currently bound raster state.
Reviewed-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPIR-V allows for matrix and array types to be decorated with explicit
byte stride decorations and matrix types to be decorated row- or
column-major. This commit adds support to glsl_type to encode this
information. Because this doesn't work nicely with std430 and std140
alignments, we add asserts to ensure that we don't use any of the std430
or std140 layout functions with explicitly laid out types.
In SPIR-V, the layout information for matrices is applied to the parent
struct member instead of to the matrix type itself. However, this is
gets rather clumsy when you're walking derefs trying to compute offsets
because, the moment you hit a matrix, you have to crawl back the deref
chain and find the struct. Instead, we take the same path here as we've
taken in spirv_to_nir and put the decorations on the matrix type itself.
This also subtly adds support for strided vector types. These don't
come up in SPIR-V directly but you can get one as the result of taking a
column from a row-major matrix or a row from a column-major matrix.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms. This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.
We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Some of the status variables in the compiler are only used in asserts
and thus may be unused in release builds. Annotate them accordingly
to avoid 'unused but set' warnings from the compiler.
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Take into account the render target format when checking if the color
mask affects all channels of the RT. This allows to enable full
overwrite in a few cases where a non-alpha format is used.
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I noticed that a VS I was debugging was missing all of its output stores
-- outputs_written was for POS, VAR0, VAR3, while the shader's variables
were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed
to be doing here, but we can just walk the declared variables and avoid
both this bug and the emission of extra stvpms for less-than-vec4
varyings.
|
|
|
|
|
| |
Fixes: 174f53 ("virgl: consolidate transfer code")
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, the gl-1.0-long-dlist Piglit test crashes.
Fixes: db7757 ("virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT")
Reported by airlied@
v2: Exit on any invalid range (Erik)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109190
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
Tested-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
|
|
| |
No functional change as the socket name is the same,
just removing the double definition of the path.
Reviewed-by: Gurchetan Singh <[email protected]>
Signed-off-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A 2d-array texture (for example), should get the # of array elements
from box->depth, rather than depth0 which is minified.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment
with tiled textures.
Reported-by: Kristian H. Kristensen <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we hit the memcpy() path for copy_region(), that will try to do a
transfer_map(), which goes badly for blits to/from staging triggered
by transfer_map() or transfer_unmap().
We could possibly add fd_blit2() which has allow_transfer_map param,
and call that for staging blits. But I'm not really sure if trying
the blit via copy_region() is very useful. At least for newer gens
that implement fd_context::blit(), it probably isn't.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Switch over to using fd_context::blit(), in the same way that a5xx does.
The previous patch wires fd_resource_copy_region() up to the blitter so
a6xx no longer needs to bypass the core layer to accelerate this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First step to unify the way fd5 and fd6 blitter works. Currently a6xx
bypasses the blit API in order to also accelerate resource_copy_region()
But this approach can lead to infinite recursion:
#0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291
#1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479
#2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350
#4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
#6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864
#7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993
#8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546
#9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129
#10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326
#11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416
#12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516
#13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376
#14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
...
Instead rework the API to push the fallback back to core code, so that
we can rework resource_copy_region() to have it's own fallback path,
and then finally convert fd6 over to work in the same way.
This also makes ctx->blit() optional, and cleans up some unnecessary
callers.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For multi-pass rendering, it is common to keep the same depth buffer
from previous pass, to discard geometry that would be hidden by later
draws. In the later passes with depth-test enabled, but depth-write
disabled, there is no reason to do gmem2mem resolve.
TODO probably do something similar for stencil.. although stencil
buffer isn't used as commonly these days
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
|
|
|
|
|
| |
Looking at some assembly dumps for an optimization, we were clearly
missing important parts of the shader!
|
|
|
|
|
|
|
| |
We'll still fail at draw time, but this avoids a regression in shader-db
execution once I enable TLB writes in precompiles.
Fixes: b38e4d313fc2 ("v3d: Create a state uploader for packing our shaders together.")
|
|
|
|
|
|
| |
Team Fortress 2 32-bit version runs out of the CPU address space.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
It seems to be the same, but this doesn't use integer division with
a variable divisor.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code used a do while loop and continues after walking
a nested loop/if-statement. This means we end up evaluating the
last instruction from the nested block against the while condition
and potentially exit early if it matches the exit condition of the
outer block.
Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just happened not to crash/assert because all loops have at
least 1 if-statement and due to a second bug we end up matching
the same ENDIF to exit both the iteration over the if-statment
and the loop.
The second bug is fixed in the following patch.
Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.
Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:
- Don't set a uniform pitch for POT-sized compressed textures
- Adjust define_rect API to be less confused about block sizes
- Only mark a texture as linear if it has a uniform pitch set
This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.
Signed-off-by: Ilia Mirkin <[email protected]>
|