| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This enables some more SSE optimizations on MSVC builds.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
This is not true for MSVC on ARM.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This code generates CVTSD2SI, which requires SSE2. So let's fix the
required SSE-version.
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: 5de29ae (util: try to use SSE instructions with MSVC and 32-bit gcc)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This has been unused since 183db3a6455 ("glsl: move half<->float
convertion to util"), Oct 10 2015. Let's drop needlessly including it.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's only correct when 'a' is an integral greater or equal to 0.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493
Fixes: 5544b2cbbd2 ("nir/algebraic: Use value range analysis to eliminate useless unary ops")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing "device" platform
v2: Add the missing platform (Eric)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reported-by: Jean Hertel <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111529
Fixes: d6edccee8d ("egl: add EGL_platform_device support")
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Lionel found actual documentation for this at long last. Apparently
it actually is a sampler cache limitation that was mostly fixed on
Icelake. Unfortunately, it seems there are still issues with ASTC
and non-ASTC sampler views. Still, we can lessen the flush condition
from "format mismatch" to "ASTC mismatch", which eliminates most of
the flushing here.
We also update the documentation to refer to the workaround name.
|
|
|
|
|
|
|
|
|
|
|
| |
strchrnul is not available on macOS.
pipe_loader.c:141:14: error: implicit declaration of function 'strchrnul' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
next = strchrnul(library_paths, ':');
^
Signed-off-by: Vinson Lee <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The majority of these only apply the start argument to the input, but a
few of them also does for the output-array. util_primconvert, the only
user of this argument expects this pass a non-zero start-argument does
not expect this to be applied to the output; if it is, it will write
outside of allocated memory, leading to VRAM corruption.
The reason this doesn't seem to have been noticed before, is that no
driver currently use util_primconvert to convert a primitive-type to
itself, which is the cases where this was broken. But for Zink, this
will no longer be true, because we need to eliminate the use of 8-bit
index-buffers.
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: 28f3f8d413f ("gallium/auxiliary/indices: add start param")
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Travis is checking the exit code of the entire if statement.
Fixes: 64ffc289be89 ("travis: add MacOS Scons build")
Signed-off-by: Vinson Lee <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Commit 6f7306c029a7 ("swr/rast: Refactor memory API between rasterizer
core and swr") unintentionally removed changes for llvm-9.0.
Fixes: 6f7306c029a7 ("swr/rast: Refactor memory API between rasterizer core and swr")
Fixes: 5dd9ad157005 ("swr/rasterizer: Better implementation of scatter")
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Jan Zielinski <[email protected]>
|
|
|
|
|
|
|
| |
We already had a perfectly cromulent pass for this, but one landed in
common NIR code so let's switch and lighten our tree.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This optimization depended on RA running before scheduling. It therefore
no longer applies and is now unused.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a tradeoff.
Scheduling before RA means we don't do RA on what-will-become pipeline
registers. Importantly, it means the scheduler is able to reorder
instructions, as registers have not been decided yet.
Unfortunately, it also complicates register spilling, since the spills
themselves won't get bundled optimally and we can only spill twice per
ALU bundle (only one spill per bundle allowed here). It also prevents us
from eliminating dead moves introduced by register allocation, as they
are not dead before RA. The shader-db regressions are from poor spilling
choices introduced by the new bundling requirements. These could be
solved by the combination of a post-scheduler (to combine adjacent
spills into bundles) with a VLIW-aware spill cost calculation.
Nevertheless, the change is small enough that I feel it's worth it to
eat a tiny shader-db regression for the sake of flexibility.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Rather than using a pile of hacks and awkward constructs in MIR to
ensure the writeout parameter gets written into r0, let's add a
dedicated shadow register class for writeout (interfering with work
register r0) so we can express the writeout condition succintly and
directly.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
There's no slot for it; you'll end up writing into the void and
clobbering stuff. Don't. do it.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
When running the register allocator after scheduling, the MIR looks a
little different, so we need to extend the RA to handle a few of these
extra cases correctly.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
After scheduling, we still have valid MIR, but we have additional
bundling annotations which we would like to keep debug, so print these.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Rather than a vague "br.??" line, annotate the branch with its target
type (useful for disambiguating discards) and whether it was inverted.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This is deadcode.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
I'm not sure if this is strictly necessary but it makes debugging easier
and minimizes the diff with the experimental scheduler.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scheduling occurs on a per-block basis, strongly assuming that a given
block contains at most a single branch. This does not always map to the
source NIR control flow, particularly when discard intrinsics are
involved. The solution is to allow scheduling barriers, which will
terminate a block early in code generation and open a new block.
To facilitate this, we need to move some post-block processing to a new
pass, rather than relying hackily on the current_block pointer.
This allows us to cleanup some logic analyzing branches in other parts
of the driver us well, now that the MIR is much more well-formed.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This allow multiblock blend shaders to compute constant colour offsets
correctly.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
It's sometimes convenient to call this with no instruction specified. By
definition, a missing instruction cannot reference any argument, so
let's check for NULL and shortciruit to false.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
The branch has the writeout specified in its source list, making this
special even if it's not explicitly part of r0.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to run register allocation after scheduling, it is sometimes
necessary to be able to insert instructions into an already-scheduled
program. This is suboptimal, since it forces us to do a worst-case
scheduling, but it is nevertheless required for correct handling of
spills/fills. Let's add helpers to insert instructions as standalone
bundles for use in spilling code.
These helpers are minimal -- they *only* work on load/store ops or
moves. They should not be used for anything but register spilling; any
other instructions should be added prior to the schedule.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
While it doesn't matter with an unconditional move to the conditional
register (r31), when we try to elide that move we'll need to track the
swizzle explicitly, and there is no slot for that yet since ALU ops are
normally binary.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This ensures the block only has exactly one branch, which makes
scheduling happy.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Oh boy. Midgard scheduling is crazy... These are all just the
requirements, not even the algorithm yet.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This will allow us to reference the condition while scheduling.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
It doesn't really matter but... meh.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
..to distinguish from scalar csel.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
The scheduler would like to use these.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
The scheduler shouldn't need to worry about this.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This helper doesn't need to be in the giant loop.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This does not affect shaders in any way. Rather, it makes the shader-db
instruction count recorded in the compiler accurate with the in-order
scheduler, matching up with what we calculate from pandecode.
Though shaders are the same, instruction counts cannot be compared
across this commit for this reason.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Allow a direct link to the PDF itself from the authors themselves,
rather than a paywall splash page.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The GLX extension strings are independent of any context, so abusing the
direct_support bit to control this extension's visibility is wrong.
This reverts commit 079d0717fc896bc8086b037d0ed22642274986c7.
Reported-by: Michel Dänzer <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory allocated through panfrost_allocate_transient() is likely to
come from the transient pool. Let's add the BO backing the allocated
memory region to the job batch so the kernel can retain this BO while
jobs are executed.
In practice that has never been a problem because the transient pool
is never shrinked, and even if it was, we still control the lifetime of
the job, so there's no reason for this BO to be freed before the GPU is
done executing the batch. But it still make sense to add the BO for
debugging purpose.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Both the BO cache and the transient pool are shared across
context's. Protect access to these with mutexes.
Signed-off-by: Rohan Garg <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Jobs _must_ only be shared across the same context, having
the last_job tracked in a screen causes use-after-free issues
and memory corruptions.
Signed-off-by: Rohan Garg <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This fix two dEQP tests for virgl:
dEQP-EGL.functional.image.create.gles2_cubemap_positive_x_rgba_texture
dEQP-EGL.functional.image.render_multiple_contexts.gles2_cubemap_positive_x_rgba8_texture
Signed-off-by: Lepton Wu <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Tiling mode was missing from fd3_emit_gmem_restore_tex().
emit_gmem2mem_surf() used LINEAR exclusiveley.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
* Fix 2D/2DArray/3D tiling parameters:
There is a bottom threshold for width and height.
* Renable tiling for Cubemap, after setting the right parameters.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way, the test jobs can start running before all build+test jobs
have finished, once the meson-main job has.
Idea suggested by Daniel Stone on IRC.
See https://docs.gitlab.com/ce/ci/directed_acyclic_graph/ and
https://docs.gitlab.com/ce/ci/yaml/README.html#needs for details.
v2:
* Improve commit log (Daniel Stone, Eric Engestrom)
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
In order to increase the chance of it running early.
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Equivalent of 0c1dd9dee "broadcom/vc4: Allow importing linear BOs with
arbitrary offset/stride." for v3d.
Allows YUV buffers with a single buffer and plane offsets to be
passed in.
Signed-off-by: Dave Stevenson <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|