| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a3268599f3c9, I attempted to fix nir_repair_ssa for unreachable
blocks. However, that commit missed the possibility that the use is in
a block which, itself, is unreachable. In this case, we can end up in
an infinite loop trying to replace a def with itself. Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it. Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def. That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.
Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Rhys Kidd <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Rhys Kidd <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe->clear() is not called for partial clears, which mesa emulates by
drawing a quad.
Furthermore, drivers should not use rasterizer state information for
scissor information (which was being used to handle the partial clears).
So, remove the partial clear support since it was not supposed to be
handled by pipe->clear() anyway.
This fixes issues with clearing after switching to different sized
framebuffers.
Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
| |
Signed-off-by: Boris Brezillon <[email protected]>
|
|
|
|
|
|
|
|
|
| |
->padded_count should be large enough to cover all vertices pointed by
the index array. Use the local vertex_count variable that contains the
updated vertex_count value for the indexed draw case.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
fixes "sorry, unimplemented: non-trivial designated initializers not supported"
Fixes: deb04adf2ae ("clover: add support for passing kernels as nir to the driver")
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Incomplete attachments don't have an associated pipe_surface, so
this would crash.
Fixes a WebGL conformance test that uses incomplete attachments:
https://www.khronos.org/registry/webgl/sdk/tests/conformance2/renderbuffers/invalidate-framebuffer.html?webglVersion=2&quiet=0&quick=1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111756
Reviewed-By: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocating BOs is expensive, so we should avoid doing that by caching
freed BOs.
BO cache is modelled after one in v3d driver and works as follows:
- in lima_bo_create() check if we have matching BO in cache and return
it if there's one, allocate new BO otherwise.
- in lima_bo_unreference() (renamed from lima_bo_free()): put BO in
cache instead of freeing it and remove all stale BOs from cache
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
os_time_get_absolute_timeout(0) returns current time, while kernel
driver expects 0 as value to poll BO status and return immediately.
Fix it by setting abs_timeout to 0 if timeout_ns is 0
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
| |
Reviewed-and-Tested-by: Vasily Khoruzhick <[email protected]>
Signed-off-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Some time weston set full damage region. It is
more effient to use the cached pp stream instead
of dynamically create one.
Reviewed-and-Tested-by: Vasily Khoruzhick <[email protected]>
Signed-off-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This extension set a damage region for each
buffer swap which can be used to reduce buffer
reload cost by only feed damage region's tile
buffer address for PP.
Reviewed-and-Tested-by: Vasily Khoruzhick <[email protected]>
Signed-off-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PLBU expects the viewport's 4 borders' coordinates, however
currently we're feeding the coordinate of the left-bottom point and the
size to it, which leads to misrendering when the left-bottom point is
not (0,0).
Change the macros for the viewport PLBU command, and the data feed to
it. The code to calculate the 4 borders is ported from Panfrost.
Signed-off-by: Icenowy Zheng <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
|
|
|
|
|
|
|
|
| |
ACO depends on C++14, but radeonsi/radv with LLVM 8,9 do not. Let us
only require it for RADV, since that is the only user.
Fixes: a70a9987181 "radv/aco: Setup alternate path in RADV to support the experimental ACO compiler"
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
required for OpenCL
v2: adjust to changes in previous commits
v3: properly convert to NIR in nvc0_cp_state_create
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: minor formatting fixes
v3: call glsl_type_singleton_init_or_ref and glsl_type_singleton_decref
v4: capitalize and punctuate comments
fix text_executable -> text_intermediate in TODO
make glsl_type_singleton wrapper static
v5: rewrite how we run the nir passes
v6: fix unhandled case switch warning in st/mesa
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]> (v4)
|
|
|
|
|
|
|
|
|
|
|
| |
v2: rework arguments to compiler::compile_program
add assert to device::ir_format
v3: remove PIPE_SHADER_IR_SPIRV
change title
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]> (v2)
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most drivers have actually no binary format and just store the IR directly
as a single entry point blob.
v2: add a cap to switch between single or multi entry point binaries
v3: remove the entry_point field
v4: remove PIPE_CAP_MULTI_ENTRY_POINT_BINARIES
v5: remove supports_multiple_entry_points
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: pass argument by value
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to use it for other formats as well, so give it a more generic name
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
| |
makes it easier to consume a IR_NATIVE binary
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Karol Herbst <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Karol Herbst):
silence warnings about unhandled enum values
v3 (Karol Herbst):
added back array size parsing (needed for structs passed by value)
Acked-by: Francisco Jerez <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes since:
* v12:
- remove autotools (Karol Herbst)
- Remove the callback in format_validation_msg. (Francisco Jerez)
- Removed is_binary_spirv. (Francisco Jerez)
- Pass a string reference to is_valid_spirv instead of the
notification callback. (Francisco Jerez)
* v11: Fix compilation error introduced in v11.
* v10:
- Reuse format_validation_msg in is_valid_spirv.
- Remove LVL2STR macro in format_validation_msg.
* v9: Add `clover_cpp_std` to the overrides of the `libclspirv` target
in Meson.
* v7: Add DEFINES to libclspirv and libclover, in autotools, as they
would otherwise never know whether CLOVER_ALLOW_SPIRV has been
defined (Dave Airlie)
* v6: Update the dependency name (meson) and the libs variable
(Makefile) due to the replacement of llvm-spirv to the new
official SPIRV-LLVM-Translator.
* v5: Changed to match the updated “clover/llvm: Allow translating from
SPIR-V to LLVM IR” in the v6.
Reviewed-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes since:
* v12 (Karol Herbst):
- rename CLOVER_ALLOW_SPIRV to HAVE_CLOVER_SPIRV
* v11 (Karol Herbst):
- only set new defines for clover to speed up recompilation
- remove autotools
* v10:
- Add a new flag (`--enable-opencl-spirv` for autotools, and
`-Dopencl-spirv=true` for meson) for enabling SPIR-V support in
clover, and never automagically enable it without that flag. (Dylan Baker)
- When enabling the SPIR-V support, the SPIRV-Tools and
SPIRV-LLVM-Translator libraries are now required dependencies.
* v7:
- Properly align LLVMSPIRVLib comment (Dylan Baker)
- Only define CLOVER_ALLOW_SPIRV when **both** dependencies are found:
autotools was only requiring one or the other.
* v6: Replace the llvm-spirv repository by the new official
SPIRV-LLVM-Translator.
* v4: Add a comment saying where to find llvm-spirv (Karol Herbst).
* v3:
- make SPIRV-Tools and llvm-spirv optional (Francisco Jerez);
- bump requirement for llvm-spirv to version 0.2
* v2:
- Bump the required version of SPIRV-Tools to the latest release;
- Add a dependency on llvm-spirv.
Reviewed-by: Dylan Baker <[email protected]> (v10)
Reviewed-by: Karol Herbst <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Gen11 doesn't require us to bypass the L2 cache for BC* images anymore.
The documentation is a bit hard to follow on this point, but the Windows
driver clearly only applies this workaround on Gen9, and their commit
history indicates that this was an intentional change to drop the
workaround for Gen11+.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is no way to make no context current w/gallium + osmesa.
The non-gallium version of osmesa does this if the context and buffer
passed to `OSMesaMakeCurrent` are both null. This small change makes it
so that this is also the case with the gallium version.
Cc: [email protected]
Signed-off-by: Hal Gentz <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Kevin Strasser <[email protected]>
|
|
|
|
| |
Reviewed-by: Kevin Strasser <[email protected]>
|
|
|
|
| |
Reviewed-by: Kevin Strasser <[email protected]>
|
|
|
|
|
|
| |
No idea how these ended up with 3-then-2-space indents.
Reviewed-by: Kevin Strasser <[email protected]>
|
|
|
|
| |
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
| |
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
| |
We can't really handle it in the little-core 64-bit case but it's not
really needed there. Where we really want this is for when we need to
do 16 -> 8-bit conversions.
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
| |
Because byte immediates aren't a thing on GEN hardware, we return a
signed or unsigned word immediate in the byte case.
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During generate_shuffle(), when we use byte sized registers we end up
with a destination stride of 2. We don't take the stride into
consideration when selecting the group offset for the last MOV
operation, which means we end up moving things to the wrong place,
leaving the last few channels untouched. Take the destination stride
in consideration so we don't miss the last channels.
v2: Assert this is not necessary for the IVB special case (Jason).
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The new order matches that of the comparison functions accepted by the C
standard library qsort() functions. Being consistent with qsort will
hopefully help avoid developer confusion.
The only current user of the red-black tree is aub_mem.c which is pretty
easy to fix up.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
When I wrote the red-black tree implementation, I wrote tests for it but
they never got imported into mesa.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively breaks the instance dispatch table in 2 with entry
points using a physical device as first argument getting their own
dispatch table.
As a result we now have to check instance & physical device dispatch
table instead of just the instance dispatch table before.
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were using the current drawable of the context to name the
appropriate screen for creating the bitmaps. But one, the current
drawable can be None, and two, it can be a GLXDrawable. Passing either
one as the second argument to XCreatePixmap will throw BadDrawable. Use
the root window of the context's screen instead.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/89
LOLed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
atof() is locale-dependent (sigh), which means 1.3 becomes 1.0 if the
locale's decimal separator isn't a full-stop. Just use the protocol
major/minor instead. This would be slightly broken if the server
generically implements 1.3+ but a particular screen is only capable of
less, but in practice no such servers exist.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/74
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I observed this pattern in several shaders in Hand of Fate 2 while
investigating bugzilla #111490. This also led to the related
bugzilla #111578. The shaders from HoF2 are *not* in shader-db.
Reviewed-by: Kenneth Graunke <[email protected]>
Skylake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 16222621 -> 16205419 (-0.11%)
instructions in affected programs: 798418 -> 781216 (-2.15%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35
helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09%
95% mean confidence interval for instructions value: -33.22 -29.56
95% mean confidence interval for instructions %-change: -3.11% -2.56%
Instructions are helped.
total cycles in shared programs: 364676209 -> 363345763 (-0.36%)
cycles in affected programs: 112810504 -> 111480058 (-1.18%)
helped: 546
HURT: 7
helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340
helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08%
HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43
HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35%
95% mean confidence interval for cycles value: -2884.33 -1927.41
95% mean confidence interval for cycles %-change: -1.59% -1.21%
Cycles are helped.
total spills in shared programs: 8870 -> 8514 (-4.01%)
spills in affected programs: 1230 -> 874 (-28.94%)
helped: 161
HURT: 0
total fills in shared programs: 21901 -> 21348 (-2.52%)
fills in affected programs: 2120 -> 1567 (-26.08%)
helped: 155
HURT: 5
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 14994910 -> 14975495 (-0.13%)
instructions in affected programs: 839033 -> 819618 (-2.31%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49
helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22%
95% mean confidence interval for instructions value: -37.46 -33.40
95% mean confidence interval for instructions %-change: -3.12% -2.70%
Instructions are helped.
total cycles in shared programs: 386032453 -> 384450722 (-0.41%)
cycles in affected programs: 117807357 -> 116225626 (-1.34%)
helped: 547
HURT: 6
helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926
helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31%
HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29
HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65%
95% mean confidence interval for cycles value: -3060.28 -2660.27
95% mean confidence interval for cycles %-change: -1.59% -1.37%
Cycles are helped.
total spills in shared programs: 23372 -> 21869 (-6.43%)
spills in affected programs: 11730 -> 10227 (-12.81%)
helped: 352
HURT: 0
total fills in shared programs: 34747 -> 35351 (1.74%)
fills in affected programs: 11013 -> 11617 (5.48%)
helped: 3
HURT: 347
Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956420 -> 11956126 (<.01%)
instructions in affected programs: 14898 -> 14604 (-1.97%)
helped: 98
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -2.18% -1.98%
Instructions are helped.
total cycles in shared programs: 178791217 -> 178790792 (<.01%)
cycles in affected programs: 149763 -> 149338 (-0.28%)
helped: 91
HURT: 7
helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16
helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18%
HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322
HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41%
95% mean confidence interval for cycles value: -18.94 10.27
95% mean confidence interval for cycles %-change: -1.28% 0.49%
Inconclusive result (value mean confidence interval includes 0).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
|
|
|
|
|
|
|
|
|
|
| |
It was set as done by mistake.
Fixes: bc15d74529e ("docs/features: Mark some Vulkan extensions as done")
Signed-off-by: Andres Gomez <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To get the extension list:
$ git grep -hE "extension name=\"VK_KHR" src/vulkan/registry/vk.xml | \
grep -v disabled | awk '{print $2}' | sed -E 's/(name=)?"//g' | sort
To find anv(il) and radv supported extensions:
$ git grep -hE "'VK_([A-Z]+)_[a-z,0-9]" src/intel/
$ git grep -hE "'VK_([A-Z]+)_[a-z,0-9]" src/amd/
v2:
- Keep VK_KHR_device_group and VK_KHR_device_group_creation as not
started (Jason).
Signed-off-by: Andres Gomez <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|