| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader key size: 107 -> 47
Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors
are loaded from a constant buffer.
The shader code doing the division is huge. Is it something we need to
worry about? Does any app use instance divisors >= 2?
VS prolog disassembly:
s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080
s_nop 0 ; BF800000
s_waitcnt lgkmcnt(0) ; BF8C007F
s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004
s_waitcnt lgkmcnt(0) ; BF8C007F
v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E
v_rcp_iflag_f32_e32 v4, v4 ; 7E084704
v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000
v_cvt_u32_f32_e32 v4, v4 ; 7E080F04
v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04
v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04
v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80
v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80
v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06
v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905
v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905
v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905
v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04
v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304
v_add_i32_e32 v4, vcc, s8, v0 ; 32080008
v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05
v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81
v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01
v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01
v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E
v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280
v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280
v_and_b32_e32 v6, v8, v6 ; 260C0D08
v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80
v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07
v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1
v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080
v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06
v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09
v2: set prefer_mono for fetched instance divisors
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
sizeof(struct si_shader_key):
Before reverting the 2 commits: 120 bytes
After reverting the 2 commits: 128 bytes
With #pragma pack: 107 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This reverts commit 7b2240ac9ce3ba9bd86f4ae8aac53af8878c0b10.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
ff_tcs_inputs_to_copy"
This reverts commit 6b6fed3a3c81c2b0d319ef121df20a0dc914705f.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
It's enabled through message buffer for UVD
Signed-off-by: Leo Liu <[email protected]>
Acked-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Per Jose's suggestion, this patch cleans up format_cap_table to remove
the unnecessary default cap value for vgpu10 formats since those devcap values
can be retrieved from the device.
Tested with MTT conform, glretrace, piglit in HWv13 and HWv8.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The default devcap for format SVGA3D_Z_D24S8_INT in HWv8 when its devcap is
not explicitly advertised should be set to zero to match the default value
in the device.
Tested with MTT piglit in HW version 8.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In cases where certain bind flags cannot be enabled together,
such as CONSTANT_BUFFER cannot be combined with any other flags,
a separate host surface will be created.
For example, if a stream output buffer is reused as a constant buffer,
two host surfaces will be created, one for stream output,
and another one for constant buffer. Data will be copied from the
stream output surface to the constant buffer surface.
Fixes piglit test ext_transform_feedback-immediate-reuse-index-buffer,
ext_transform_feedback-immediate-reuse-uniform-buffer
Tested with MTT piglit, MTT glretrace, Nature, NobelClinician Viewer, Tropics.
v2: Fix bind flags compatibility check as suggested by Brian.
v3: Use the list utility to maintain the buffer surface list.
v4: Use the SAFE rev of LIST_FOR_EACH_ENTRY
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we unconditionally enable streamout bind flag at
buffer resource creation time. This is not necessary if the buffer
is never used as a streamout buffer. With this patch, we enable
streamout bind flag as indicated by the state tracker. If the buffer
is later bound to streamout and does not already has streamout bind
flag enabled, we will recreate the buffer with
the new set of bind flags. Buffer content will be copied
from the old buffer to the new one.
Tested with MTT piglit, Nature, Tropics, Lightsmark.
v2: Fix bind flags check as suggested by Brian.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This is to prepare for more bind_flags optimization
in subsequent patches.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This is to prepare for other bind_flags optimization
in subsequent patches.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the logic would decide that the record is kept, which
translates into keep = false in the caller, which meant that these
passes did not run.
While it's right that keep = false which means that a new record does
not need to be added, we do still have to perform the usual list
maintenance. It's easiest to do this pre-merge rather than post.
The lowering that clip/cull distance passes produce triggers this bug in
TCS (since reading outputs is done differently in other stages), but it
should be possible to achieve it with the right sequence of regular
reads/writes.
Fixes: KHR-GL45.cull_distance.functional
Fixes: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the fileIndex is different, that means they are in logically
different spaces. However if there's also a relative offset, then they
could end up pointing at the same spot again.
Also add a note about potential for multiple buffers to overlap even if
they're at different file indexes. However that's potentially lowered
away by the point that this logic hits.
Not known to fix any specific application or test.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
This has no effect since in practice this will only play for
memory-backed files, for which VFETCH will never happen.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idxbuf could linger, and when a clear happened, which also uses the
3d bufctx, we could get an error trying to access it.
This fixes spurious crashes/errors in CTS tests.
Fixes: 61d8f3387d ("nv50,nvc0: clear index buffer bufctx bin unconditionally")
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
All the BuildUtil helpers just insert the operation into the current BB.
So we have to take care that any fetchSrc() operations happen before the
operation whose setIndirect() it goes into.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently a resource flush may trigger a self resolve, even if a scanout buffer
exists, but is up to date. If a scanout buffer exists we only ever want to
flush the resource to the scanout buffer. This fixes a performance regression.
Fixes: dda956340ce9 (etnaviv: resolve tile status when flushing resource)
Cc: [email protected]
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Based on a patch from Wladimir J. van der Laan and untested due
to lack of hardware. Binary blob emits those formats if GPU supports
HALTI1 (faked with ibvivhook).
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Wladimir J. van der Laan <[email protected]>
|
|
|
|
|
|
|
| |
Passes texwrap GL_ARB_texture_rg piglit (with faked full texture rg support).
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Wladimir J. van der Laan <[email protected]>
|
|
|
|
|
|
|
| |
Passes all ext_texture_swizzle piglits.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-By: Wladimir J. van der Laan <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Wladimir J. van der Laan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix regression of "no rendering" on simple apps like glxgears by
setting an explicit full surface clear_rect when scissor is not
enabled.
This regressed with commit 00173d91 "st/mesa: don't set 16
scissors and 16 viewports if they're unused" due to an assumption
that a default scissor rect is always set, which was the case prior
to this optimization.
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some combinations of c++ compilers and standard libraries had problems
with the string::replace code we were using previously.
This should fix the travis-ci system.
Tested-by: Eric Engestrom <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware doesn't support it, so we just interpolate all array elements
and then use indirect indexing on the resulting vector.
Clearly, this is not very efficient. There is an argument to be had for
adding if/else, or perhaps even pulling the data out of LDS directly.
Both don't really seem worth the effort, considering that it seems nobody
actually uses this feature.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
…and print error in such case. Which probably is not a rare event btw
because fopen doesn't expand ~ to $HOME.
Also get rid of unused "bool ret" variable.
Signed-off-by: Constantine Kharlamov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=100785
v2: I was too much twiddling whether to initialize nsys_inputs at the beginning of shader initialization or for allocation of system values, and by the time I decided to go with the first one, I forgot to change it back.
Signed-off-by: Constantine Kharlamov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Constantine Kharlamov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This adds support for the GDS operations needed to do atomic
counters.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
We don't want vtx/tex instructions ending up in GDS sections.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This just makes sure we can see the index gpr in the asm dumps.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
On evergreen we can route vertex fetches via the texture cache,
and this is required for some images support. So add support
to the asm builder for it.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This was found during writing the images code, we need to
make sure we route the correct index register.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: b7d9677d ("nv50/ir: constant fold OP_SPLIT")
Cc: [email protected]
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
It's not needed. The hw doesn't fetch ahead over page boundaries.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If lp_setup_bind_framebuffer() is never called, then setup fb x1/y1 was not
correctly initialized. This can happen if there's never a fb set - both
cso and llvmpipe would consider setting this with no cbufs and no zsbuf a
redundant change and therefore it would never get set.
We rely on this setup fb rect being initialized correctly for the tri intersect
tests, throwing away tris which don't intersect. Not initializing it meant
we'd then say it intersected, and we'd try to bin that despite that we have
no actual tiles to bin it to, leading to assertion failures (pretty harmless
since tile 0/0 always exists nevertheless as tiles are statically allocated,
albeit that should change at some point).
(Note probably not an issue with gl state tracker)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
Do KILL at the end of shaders so as not to break WQM.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use the bounding box (triangle extents) to figure out if 32bit rasterization
could potentially overflow. However, we used the bounding box which already got
rounded up to 0 for negative coords for this, which is incorrect, leading to
overflows and hence bogus rendering in some of our private use.
It might be possible to simplify this somehow (we're now using 3 different
boxes for binning) but I don't quite see how.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is pretty useful for debugging rasterization issues, so turn it on
based on DEBUG (the actual existence of the fields is also conditionalized
on DEBUG, lines fill it out the same too).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit c9040dc9e75c81024f88f3f1bab821ad2bc73db3.
People have reported it causes corruption on VI, and I see GPU hangs
on GFX9.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If the call fails we need to flush the command buffer and retry. In this
case, we were failing to unbind the GS which led to subsequent errors.
This fixes a bug replaying a Cinebench R15 apitrace in a Linux guest.
VMware bug 1894451
cc: [email protected]
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When surface_invalidate is called to invalidate a newly created surface
in svga_validate_surface_view(), it is possible that the command
buffer is already full, and in this case, currently, the associated wddm
winsys function will flush the command buffer and resend the invalidate
surface command. However, this can pre-maturely flush the command buffer
if there is still pending image updates to be patched.
To fix the problem, this patch will add a return status to the
surface_invalidate interface and if it returns FALSE, the caller will
call svga_context_flush() to do the proper context flush.
Note, we don't call svga_context_flush() if surface_invalidate()
fails when flushing the screen surface cache though, because it is
already in the process of context flush, all the image updates are already
patched, calling svga_context_flush() can trigger a deadlock.
So in this case, we call the winsys context flush interface directly
to flush the command buffer.
Fixes driver errors and graphics corruption running Tropics. VMware bug 1891975.
Also tested with MTT glretrace, piglit and various OpenGL apps such as
Heaven, CinebenchR15, NobelClinicianViewer, Lightsmark, GoogleEarth.
cc: [email protected]
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following RT attachment order:
1. Attach surfaces attachments 0 & 1, and render with them
2. Detach 0 & 1
3. Re-attach 0 & 1 to different surfaces
4. Render with the new attachment
The definition of a tile being resolved is that local changes have been
flushed out to the surface, hence there is no need to reload the tile before
it's written to. For an invalid tile, the tile has to be reloaded from
the surface before rendering.
Stage (2) was marking hot tiles for attachements 0 & 1 as RESOLVED,
which means that the hot tiles can be written out to memory with no
need to read them back in (they are "clean"). They need to be marked as
resolved here, because a surface may be destroyed after a detach, and we
don't want to have un-resolved tiles that may force a readback from a
NULL (destroyed) surface. (Part of a destroy is detach all attachments first)
Stage (3), during the no att -> att transition, we need to realize that the
"new" surface tiles need to be fetched fresh from the new surface, instead
of using the resolved tiles, that belong to a stale attachment.
This is done by marking the hot tiles as invalid in stage (3), when we realize
that a new attachment is being made, so that they are re-fetched during
rendering in stage (4).
Also note that hot tiles are indexed by attachment.
- Fixes VTK dual depth-peeling tests.
- No piglit changes
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
It seems to work now.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
The closed Vulkan driver doesn't do it either.
Also remove some old comments that aren't useful.
Reviewed-by: Nicolai Hähnle <[email protected]>
|