| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This moves the fix from commit 361f3d19f1f to happen in get_param
(used now instead of get_handle by st/dri). This fixes artifacts
seen with Xorg and CCS_E.
Fixes: fc12fd05f56 "iris: Implement pipe_screen::resource_get_param"
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, we'll incorrectly round off huge values to the nearest
representable double instead of keeping it at the exact value as
we're supposed to.
Found by inspecting compiler-warnings.
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: 85faf5082f ("glsl: Add 64-bit integer support for constant expressions")
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
Utgard PP supports indirect load of uniforms and varyings, so let's
enable it.
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we add dependecies in 3 cases:
1) One node consumes value produced by another node
2) Sequency dependencies
3) Write after read dependencies
2) and 3) only affect scheduler decisions since we still can use pipeline
register if we have only 1 dependency of type 1).
Add 3 dependency types and mark dependencies as we add them.
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It makes no sense to clone texture coords if it's not varying, moreover
we don't support cloning ALU nodes.
Fixes: 1c1890fa7077 ("lima/ppir: clone uniforms and load_coords into each successor")
Reviewed-by: Andreas Baierl <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We call nir_lower_load_const_to_scalar in the state trackers linker
however some later passes can reintroduce constant vectors. Here
we lower these to scalar and perform optimisations. The Intel
drivers do a similar call in their backend..
shader-db results VEGA 64:
Totals from affected shaders:
SGPRS: 152168 -> 151976 (-0.13 %)
VGPRS: 135224 -> 135112 (-0.08 %)
Spilled SGPRs: 4027 -> 4163 (3.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 10670028 -> 10654776 (-0.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13122 -> 13135 (0.10 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes at least this deqp test:
dEQP-VK.api.smoke.triangle
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
ir3 segfaults if nonbinning is NULL for the bininng pass shader.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This can happen with loops with unreachable exits which are later
optimized away.
Fixes assertion in dEQP-VK.graphicsfuzz.unreachable-loops with RADV.
Cc: [email protected]
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111236
|
|
|
|
|
|
|
|
|
| |
because vl doesn't call flush_resource and I wasn't able to find
all places where flush_resource needs to be called.
This fixes corrupted / unflushed surfaces with fullscreen videos on Raven.
Cc: 19.1 19.2 <[email protected]>
|
|
|
|
| |
Cc 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some piglit tests on radeonsi NIR where a varying is
initialized to a constant array in the vertex shader. Varying packing
after nir_lower_io_to_temporaries creates writemasked stores which
persist after pulling the constant initialization down into the fragment
shader.
While we're here, rewrite handle_constant_store() to do the loop over
components outside the switch, so that we don't have to duplicate the
writemask checking for every bitsize.
Fixes: 1235850522c ("nir: Add a large constants optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This was a workaround for a bug in Meson that was fixed in 0.46 [1].
[1] https://github.com/mesonbuild/meson/pull/2284
Fixes: f7b6a8d12fdc446e3251 ("meson: bump required version to 0.46")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
| |
Fixes: cdc6efddf918bc07d30d ("radv: implement all depth/stencil resolve modes using graphics")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a compilation error when building libnouveau:
In file included from ../src/gallium/drivers/nouveau/nv50/nv50_program.c:25:
../src/compiler/nir/nir.h:1115:10: fatal error: nir_intrinsics.h: No such file or directory
#include "nir_intrinsics.h"
^~~~~~~~~~~~~~~~~~
compilation terminated.
Fixes: f014ae3c7cce504afe5d ("nouveau: add support for nir")
Signed-off-by: Stephen Barber <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Released today and hangs on RADV. We don't have the root cause yet,
but this should unblock people playing the game.
No drirc because the radv debugflags are not usable from drirc and
I want this backported.
CC: <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
flrp was forgotten when already adding the rounding mode for other
instructions.
Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Ian Romanick <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After
1711bf6cf2d ("intel/fs: Generate better code for fsign multiplied by a value"),
the conflicts resolution for setting the rounding mode after the
fused fmul and fsign optimization is non obvious.
Basically, the optimization doesn't really result in a MUL, or any
other operation which would need to have the rounding mode set. Hence,
we set it just before the actual MUL in the treatment of fmul.
Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The script only handles commits with "Fixes: <sha1>" where <sha1> is
equal or great than 8 chars. But <sha1> can be smaller, like 7 chars.
This commit relax the restriction to handle <sha1> 4 or more chars.
Fixes: 533fead4236 ("bin/get-pick-list.sh: tweak the commit sha matching pattern")
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
| |
There are 64 physical registers so the shift must be 64 bits.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The scheduler doesn't expect them. To do this, I had to refactor the
registration part of gpir_node_create_dest() to be separate from
creating and inserting the node, since the last two now aren't done when
handling moves. This adds more code but creates the possibility of
automatically inserting input dependencies when inserting nodes, similar
to what's done in NIR with the use-def lists (this isn't done yet).
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We guarantee that a complex1 op is always used by postlog2 directly by
rewriting the postlog2 op to be a move when there would be a move
inserted between them. But we weren't doing this in all circumstances
where there might be a move. Move the logic to place_move() so that it
always happens. Fixes a few log tests that happened to start failing due
to changes in the register allocator leading to a different scheduling
order.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the framework for cross-basic-block register
allocation. Like ARM's compiler, we assume that the value registers
aren't usable across branches, which means we have to use physical
registers to store any value that crosses a basic block. There are three
parts to this:
1. When translating from NIR, we rely on the NIR out-of-ssa pass to
coalesce values into registers. We insert store_reg instructions for
values used in more than one basic block, and load_reg instructions for
values not defined in the same basic block (or defined after their use,
for loops). So by the time we've translated out of NIR we've already
split things into values (which are only used in the same basic block)
and registers (which are only used in different basic blocks than where
they're defined).
2. We allocate the registers at the same time that we allocate the
values, before the final scheduler. Unlike the values, where the
assigned color is fake, we assign the actual physical index & component
to physregs at this stage. load_reg and store_reg are treated as moves
in the allocator and when creating write-after-read dependencies.
3. Finally, in the main scheduler we have to avoid overwriting existing
live physregs when spilling. First, we have to tell the scheduler which
physical registers are live at the end of each block, to avoid
overwriting those. If a register is only live at the beginning, we can
reuse it for spilling after the last original use in the final program
happens, i.e. before any original use is scheduled, but we have to be
careful to add the proper dependencies so that the spill write is
scheduled before the original reads. To handle this we repurpose
reg_link for uses to be used by the scheduler.
A few register-related things copied over from NIR or from other
drivers can be dropped.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because branch conditions have to be in the pass slot, there is no
unconditional branch, and realistically the pass slot has to contain a
move when branching (there's nothing it does that would be useful for
operating on booleans, so we can't use it for anything when computing
the branch condition), we put the branch instruction in the pass slot
and at codegen time turn it into a move of the branch condition. This
means that it doesn't have to be special-cased like store instructions
are in the scheduler. Because of this decision we can remove the
half-implemented BRANCH codegen slot. Finally, we (ab)use the existing
schedule_first mechanism to make sure that branches are always last in
the basic block.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When picking a node to be scheduled, we try to schedule its children as
well. But we shouldn't try to schedule nodes which only have a fake
dependency on the original node, since this isn't the point of
scheduling children at the same time and can break some expectations of
the rest of the code.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
| |
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is the GLX counterpart to EGL_KHR_no_config_context. Contexts may
now be created without reference to an fbconfig, in which case it is
treated as compatible with any fbconfig (and thus any GLX drawable).
Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Somewhat terrifyingly, we never sent this for direct contexts, which
means the server never knew the context/drawable bindings. To handle
this sanely, pull the request code up out of the indirect backend, and
rewrite the context switch path to call it as appropriate. This
attempts to preserve the existing behavior of not calling unbind() on
the context if its refcount would not drop to zero.
Of course, you can't just do this indiscriminately, because this is GLX
and extant X servers have bugs and everything is terrible. To wit:
- For 1.20.x prior to 1.20.6, you can bind a direct context once, but
the second time you try to modify the context's binding you will get
GLXBadContextTag. This includes unbinding the context. And "deleting"
the context will leak memory, because it will still appear to be
current.
- For 1.19 and earlier, glXMakeCurrent(dpy, None, ctx) should be legal
for GL 3.0+ contexts, but the server will throw BadMatch.
To guard against this, we only send the request for indirect contexts
unless the server is known good, and only mention one context at a time
in such a request; if switching between contexts, we first unbind the
old, and then bind the new. Note that the second VendorRelease() version
is to catch XFree86 4.x and Xorg [67].x, which almost certainly have the
above bugs. Other servers might report different version numbers here,
but we can't do direct rendering against them, so this should be safe.
Fixes glx-make-context, glx-multi-window-single-context and
glx-query-drawable-glx_fbconfig_id-window. Sufficiently old piglit will
regress on glx-make-glxdrawable-current (throwing BadMatch), which is
fixed by mesa/piglit!116.
|
|
|
|
|
| |
Only relevant for indirect contexts, so let's get that code out of the
common path.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the MEDIA_VFE_STATE docs:
"Starting with this configuration, the Maximum Number of Threads must
be set to (#EU * 8) for GPGPU dispatches.
Although there are only 7 threads per EU in the configuration, the
FFTID is calculated as if there are 8 threads per EU, which in turn
requires a larger amount of Scratch Space to be allocated by the
driver."
It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern. The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.
Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.
Fixes: 5ac804bd9ac ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 729de1488f49033bc181b8123af5658228a51bf1.
It turns out that, although the register is in the logical context,
it isn't whitelisted, so we can't actually write it from userspace
batch buffers. The write just becomes a noop, which is why we saw
no performance changes.
I manually whitelisted it, and still observed no performance gains, but
it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments
on the iris driver. So we might need to fix something before enabling
this. To prevent it randomly getting turned on should the kernel ever
whitelist this register, we revert the patch for now.
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use VPC_SO_OVERRIDE to control whether we do streamout in binning or
draw pass. Normally we want to do streamout in binning pass, except
when there is a single tile and binning passed is skipped.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We could bit doing streamout from binning pass. In this case we want to
use the full VS which doesn't have (potentially streamed out) varyings
stripped out.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
This fixes VAAPI.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
This fixes some dEQP tests.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This caused a failure in NIR validation.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
It confuses radeonsi.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: cleanup
Signed-off-by: Sonny Jiang <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
|
|
|
|
|
|
| |
PCI IDs for amdgpu will be removed from Mesa.
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
|
|
|
|
|
| |
Cc: 19.2 <[email protected]>
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
|
|
|
|
|
|
| |
trivial and urgent
Cc: 19.2 <[email protected]>
|