| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
No functional change.
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.
Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It's been throwing the following error today:
"<Fault -32603: 'Internal Server Error (contact server administrator
for details): could not extend file "base/17952/18226": No space left
on device\nHINT: Check free disk space.\n'>"
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 7ecfbd4f6d4 ("nir: Add alpha_to_coverage lowering pass")
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If you set LP_NUM_THREADS=0 compute shaders would hang,
just execute the workloads in sequence if we have no threads
in the pool.
Fixes: 1b24e3ba75 ("llvmpipe: add compute threadpool + mutex")
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This file is created in 2a0d45ae6cf09d60c048d7854e3d082bf15e374f but
addition to android makefiles was omitted. It breaks the build with
missing references which are defined in this file.
List the file in ir3_SOURCES to make the build succeed.
Signed-off-by: Marijn Suijten <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.
Most of these tests still fail because of an LLVM bug (they work
with ACO).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Totals from affected shaders:
SGPRS: 13920 -> 13656 (-1.90 %)
VGPRS: 12972 -> 12960 (-0.09 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1005680 -> 1000648 (-0.50 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Max Waves: 688 -> 688 (0.00 %)
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
| |
v7: rename _nv50/_llvm to _fast/_precise
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.
v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)
Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.
v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask
v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler
v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
in WQM
This fixes graphical corruption in SC2.
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
0.49.0 can compile most of mesa with ICC or ICL, but not SWR without
additional workarounds in our meson.build files. Bumping patch version
is easier and shouldn't be a big burden anyway, especially to cover a
niche compiler. The check originally only covered ICC, but now covers
ICL as well.
Fixes: 3740ffb59c89d8d879b1e0c1aed32c389dd82a35
("meson: add switches for SWR with MSVC")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1937
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
(cherry picked from commit cc88eeb6ffc4e86d76dfdbfc601d519bc35b6c41)
|
|
|
|
|
| |
Signed-off-by: Juan A. Suarez Romero <[email protected]>
(cherry picked from commit 5c6d266c591208b1c27e06f61b814210fc6e095f)
|
|
|
|
|
|
|
|
|
| |
Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.
Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation does not synchronize on BO readiness when
DISCARD_WHOLE_RES flag is set, which can lead to misbehaviours when the
resource being updated is being used by one of the pending or already
flushed batches.
Adding unconditional BO synchronization would do the trick, but we can
sometimes optimize this path by re-allocating a new BO instead of
waiting for the existing one to be ready.
Reported-by: Daniel Stone <[email protected]>
Reported-by: Heinrich Fink <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Daniel Stone <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
According to the OES_geometry_shader spec, section Dependencies:
"OpenGL ES 3.1 and OpenGL ES Shading Language 3.10
are required."
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently doesn't maintain it correctly and the buffer gets leaked if
surface is destroyed before calling swapping buffers.
From Android frameworks/native/libs/nativewindow/include/system/window.h:
The window holds a reference to the buffer between dequeueBuffer and
either queueBuffer or cancelBuffer, so clients only need their own
reference if they might use the buffer after queueing or canceling it.
v2: Remove our own reference.
Fixes: 0212db35040 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chia-I Wu <[email protected]> (v1)
Reviewed-By: Tapani Pälli <[email protected]>
Signed-off-by: Lepton Wu <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
If a pipeline has both graphics and compute, descriptors are same.
While we are at it, use queue->device for simplicity.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
To make sure a trace file is generated in case the driver crashes
during the hang report generation (which happens sometimes).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This information has never been useful. All descriptors are
already dumped with colors etc, and it's more useful.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Disable 16-bit features because fp16 isn't exposed on these chips.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
A bunch of blend tests fixed on T760. A single blend test regressed on
both T760/T860 but I am unable to reproduce locally so am just
documenting the regression and moving on.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.
Results are meh.
total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0
total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0
total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
The texture instruction has a mask we need to take into account.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
There are easy ways to iterate sources!
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.
Preparation for mixed types.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).
Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.
Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
..rather than open-coding.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This helper is used in a bunch of places ... might as well make that
common.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
They are symmetric to their 32-bit counterparts, just shifted.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Fixes: 2d914ebe818 ("pan/midgard: Fix memory corruption in register spilling")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows a write to proceed to an uninitialized part of a buffer
even when the GPU is using the previously-initialized portions.
Such a situation can be triggered with the following API usage example:
glBufferSubData(..., offset, size, data1);
glDrawArrays(...);
// append new vertex data
glBufferSubData(..., offset+size, size, data2);
glDrawArrays(...);
Same is done for freedreno, nouveau and radeon.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
|
| |
Store the changed usage in the newly created transfer object.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
| |
Fixes: 1194afdfe35 ("etnaviv: rework the stream flush to always go through the context flush")
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Jonathan Marek <[email protected]>
|