| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This should remove most of the excess code size that was
introduced by making all booleans per-lane.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.
To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.
This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.
v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition
v3 (by Timur Kristóf):
- Fix several subgroups regressions
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Reviewed-By: Timur Kristóf <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
ACO's optimizer would try to propagate 64-bit constants, but
does so in such a way that wouldn't work due to how the 64-bit
constants are handled in the IR.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch tries to give instructions with the same execution
mask also the same pass_flags and enables VN for SALU instructions
using exec as Operand.
This patch also adds back VN for VOPC instructions and removes VN for phis.
v2 (by Timur Kristóf):
- Fix some regressions.
v3 (by Daniel Schürmann):
- Fix additional issues
Reviewed-By: Timur Kristóf <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
| |
needs
Reviewed-By: Timur Kristóf <[email protected]>
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
| |
Build is broken since "Move CodeGenFileType enum to Support/CodeGen.h".
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit 66d24a9ef705b8f9f15dab8059b63781f9fb28ca.
This commit made Mesa CI red because commit depends on a Piglit test
change.
|
|
|
|
|
|
|
|
|
| |
Using a hash-table walk means that variables will get inserted in
different orders on different runs. Just walk the list of globals
instead, even if some of them can't be turned into locals.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit "1c2bf82d24a glsl: disable lower_fragdata_array() for NIR drivers"
disabled the GLSL IR lowering that turned gl_FragData from an array into a
collection of scalar outputs under the assumption that this was already being
handled properly elsewhere, however there are some corner cases where NIR
would fail to do this, leaving gl_FragData[] as an array variable. This can
break backends that assume that all their outputs will be scalar and use the
variable definitions from the shader to do their output setup, such as the
case of V3D.
At least one corner case was found in some Portal shaders from shader-db, where
NIR would optimize out the full body of a fragment shader. In this scenario,
the empty shader would keep the original array definition of gl_FragData[],
causing the backend to assert.
We need to do this late enough for it to be effective, since doing it in
st_nir_preprocess does not fix the original problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2091
Fixes: 1c2bf82d ("glsl: disable lower_fragdata_array() for NIR drivers")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2090
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
buffers"
This reverts commit 1d1b4578211dcc69cfab8879d0cdafaba1eec948.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 1d122c104a7a3d9348ab347e1e843b7e2bf3b498.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 34b1aa957a3f44ea9587ec43311e8434d3782cc1.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c1c574fdf18f2aeb1c03f9670bf00e1dcd22d99d.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <[email protected]>
Acked-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 3562 -> 3457 (-2.95%)
instructions in affected programs: 575 -> 470 (-18.26%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 6.56 x̃: 10
helped stats (rel) min: 5.71% max: 24.56% x̄: 16.83% x̃: 18.87%
95% mean confidence interval for instructions value: -9.07 -4.06
95% mean confidence interval for instructions %-change: -19.00% -14.66%
Instructions are helped.
total bundles in shared programs: 1846 -> 1830 (-0.87%)
bundles in affected programs: 338 -> 322 (-4.73%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 2.50% max: 20.00% x̄: 8.85% x̃: 3.33%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -13.02% -4.67%
Bundles are helped.
total quadwords in shared programs: 3191 -> 3144 (-1.47%)
quadwords in affected programs: 606 -> 559 (-7.76%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 2.94 x̃: 3
helped stats (rel) min: 5.17% max: 22.22% x̄: 11.20% x̃: 5.62%
95% mean confidence interval for quadwords value: -4.58 -1.29
95% mean confidence interval for quadwords %-change: -15.16% -7.24%
Quadwords are helped.
total registers in shared programs: 312 -> 303 (-2.88%)
registers in affected programs: 27 -> 18 (-33.33%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33%
95% mean confidence interval for registers value: -1.00 -1.00
95% mean confidence interval for registers %-change: -33.33% -33.33%
Registers are helped.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 3457 -> 3431 (-0.75%)
instructions in affected programs: 787 -> 761 (-3.30%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 1.86 x̃: 1
helped stats (rel) min: 1.01% max: 11.11% x̄: 9.22% x̃: 11.11%
95% mean confidence interval for instructions value: -3.55 -0.16
95% mean confidence interval for instructions %-change: -11.41% -7.03%
Instructions are helped.
total bundles in shared programs: 1830 -> 1826 (-0.22%)
bundles in affected programs: 279 -> 275 (-1.43%)
helped: 2
HURT: 0
total quadwords in shared programs: 3144 -> 3121 (-0.73%)
quadwords in affected programs: 645 -> 622 (-3.57%)
helped: 13
HURT: 0
helped stats (abs) min: 1 max: 11 x̄: 1.77 x̃: 1
helped stats (rel) min: 2.09% max: 16.67% x̄: 12.61% x̃: 14.29%
95% mean confidence interval for quadwords value: -3.45 -0.09
95% mean confidence interval for quadwords %-change: -15.43% -9.79%
Quadwords are helped.
total registers in shared programs: 303 -> 301 (-0.66%)
registers in affected programs: 14 -> 12 (-14.29%)
helped: 2
HURT: 0
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not much of a difference but slightly better and slightly less
arbitrary.
total instructions in shared programs: 3560 -> 3559 (-0.03%)
instructions in affected programs: 44 -> 43 (-2.27%)
helped: 1
HURT: 0
total bundles in shared programs: 1844 -> 1843 (-0.05%)
bundles in affected programs: 23 -> 22 (-4.35%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9afe ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
| |
Fixes: 9e440b8d0b9 ("spirv: Sort out the mess that is sampled image")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was causing uninitialized value to end up propagated to the
3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet
building due to the value being greater than 1.
Fixes: 939ddccb7a5 ("anv: Add support for depth bounds testing.")
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
| |
It's now unused, in favour of LCRA.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Pretty routine, we do have a hack to force swizzle alignment for !32-bit
for until we implement !32-bit the right way.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is less complicated than previously thought. Note we have no way of
specifying the work register count for blend shaders; it must be
strictly less than the work register count of the corresponding fragment
shader (which is fine since we force the fragment shader to report a
count of 16 with a blend shader as a major hack until we get register
pressure down for blend shaders).
TODO: pandecode the flags.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Close input end of the pipe after data was written. Without this
fix I have seen a hang in sysfs_uevent_get(.., "OF_FULLNAME")
when key was not found.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 9020f519 ("util/u_endian: Add error checks")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2078
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
This code is kinda stand-alone, and it makes it a bit easier to find the
right source in the source-tree.
|
|
|
|
|
| |
This code is kinda stand-alone, and it makes it a bit easier to find the
right source in the source-tree
|
|
|
|
| |
This will help code-reuse a bit in the next commit.
|
|
|
|
|
| |
This code is more or less stand-alone, and this keeps the formats array
a bit more encapsulated.
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We can clear the "needs" flags once we emit a flag. And also, don't
open-code the opcode name.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec05.
Fixes: f30c256ec05 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
At the ir3 level, we would assume that we could use wrmask to mask
off other components of an instruction returning a vecN when they are
not used. Which would let RA use components not written for other live
values. But this is only true for tex instructions.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow inputs/outputs to be vecN (ie. whatever their actual size is), and
use split to get scalar components of inputs, and collect to gather up
scalar components of outputs.
The main motivation is to simplify RA, by only having to consider split/
collect to figure out where values need to land in consecutive scalar
registers, rather than having to also deal with left/right neighbors.
Because of varying packing, and the resulting fractional location
(location_frac), to implement load_input/store_output, it is still
convenient to have a table of scalar inputs/outputs. We move this to
the compile ctx (since it is only needed for nir->ir3).
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In almost all places, the add_sysval_input() is paired directly with a
create_input(). (The one exception is frag shader ij bary coord, and
this exception will go away in a later patch.)
So go ahead and clean this up before reworking input/output handling.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is a driver-param (loaded from uniform), not a sysval (populated by
hw into a register). So it has no value to having a sysval slot.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently it is always 0x1 (scalar), but that will change in a later
patch.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We can at least get rid of the if-not-NULL check in a bunch of places.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We keep kill's alive w/ keeps these days, rather than a fake output.
This condition was left over from prior to that change.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This doesn't really work, we can't necessarily just change the outputs
to half-precision like this in anything but simple cases.
Keep the shader key entry around though, eventually with proper mediump
support we could use this with a nir pass to use lower precision frag
shader outputs when the render target format has <= 16b/component.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The instruction has 3 src regs, so `instr->regs[0..3]` are valid, but
`instr->regs[4]` is not.
```
Test case 'dEQP-GLES31.functional.shaders.linkage.es31.tessellation.varying.rules.output_superfluous_declaration'..
==29239== Invalid read of size 8
==29239== at 0x5BE9CDC: emit_cat6 (ir3.c:841)
==29239== by 0x5BEA1BF: ir3_assemble (ir3.c:921)
==29239== by 0x5BDF0A7: ir3_shader_assemble (ir3_shader.c:133)
==29239== by 0x5BDF193: assemble_variant (ir3_shader.c:162)
==29239== by 0x5BDF407: create_variant (ir3_shader.c:215)
==29239== by 0x5BDF4DB: shader_variant (ir3_shader.c:241)
==29239== by 0x5BDF553: ir3_shader_get_variant (ir3_shader.c:257)
==29239== by 0x5BA85F7: ir3_shader_variant (ir3_gallium.c:80)
==29239== by 0x5BA7703: ir3_cache_lookup (ir3_cache.c:96)
==29239== by 0x5B8B8B3: fd6_emit_get_prog (fd6_emit.h:119)
==29239== by 0x5B8C137: fd6_draw_vbo (fd6_draw.c:186)
==29239== by 0x5BB1FBB: fd_draw_vbo (freedreno_draw.c:290)
==29239== Address 0xb97f2d0 is 0 bytes after a block of size 240 alloc'd
==29239== at 0x4848D54: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==29239== by 0x61BD35B: ralloc_size (ralloc.c:119)
==29239== by 0x61BD41B: rzalloc_size (ralloc.c:151)
==29239== by 0x5BE599B: ir3_alloc (ir3.c:45)
==29239== by 0x5BEA583: instr_create (ir3.c:984)
==29239== by 0x5BEA5DF: ir3_instr_create2 (ir3.c:1000)
==29239== by 0x5BEE317: ir3_STLW (ir3.h:1431)
==29239== by 0x5BF12D3: emit_intrinsic_store_shared_ir3 (ir3_compiler_nir.c:903)
==29239== by 0x5BF418B: emit_intrinsic (ir3_compiler_nir.c:1802)
==29239== by 0x5BF5D07: emit_instr (ir3_compiler_nir.c:2339)
==29239== by 0x5BF603F: emit_block (ir3_compiler_nir.c:2426)
==29239== by 0x5BF624B: emit_cf_list (ir3_compiler_nir.c:2474)
==29239==
```
Probably this only triggers in non-optimized builds?
Fixes: 1f3b52ce503 ("freedreno/a6xx: Add register offset for STG/LDG")
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
v2: Remove device->default_mocs and external_mocs (Jason).
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Centralize mocs settings into isl.
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|