| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.
For example, if a UBO contained the following, tightly packed:
vec4 a; // [0, 16)
float b; // [16, 20)
vec4 c; // [20, 36)
then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.
Similarly, dvec4s would suffer from the same problem.
v2: Rewrite the accounting, my calculations were wrong.
v3: Write a comment about partial values (requested by Jason).
Reviewed-by: Rafael Antognolli <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]> [v3]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SSBOs can be written by a different GPU thread, copy propagating a
read can cause the value to magically change. SSBO reads are also very
expensive, so doing it twice will be slower.
The same shader was helped by this patch and the previous.
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399119 -> 14399113 (<.01%)
instructions in affected programs: 683 -> 677 (-0.88%)
helped: 1
HURT: 0
total cycles in shared programs: 532973113 -> 532971865 (<.01%)
cycles in affected programs: 524666 -> 523418 (-0.24%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SSBOs can be written by other GPU threads, copy propagating a read
can cause the value to magically change. SSBO reads are also very
expensive, so doing it twice will be slower.
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399120 -> 14399119 (<.01%)
instructions in affected programs: 684 -> 683 (-0.15%)
helped: 1
HURT: 0
total cycles in shared programs: 532978931 -> 532973113 (<.01%)
cycles in affected programs: 530484 -> 524666 (-1.10%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to have been missed in the move from autotools
This fixes the following build issue:
../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory
#include <X11/Xlib-xcb.h>
^~~~~~~~~~~~~~~~
Fixes: b1b65397d0c4978e36a84c0a1c98a4bd6cb9588e
("meson: Build gallium auxiliary")
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
| |
To silence compiler warning about unhandled switch cases.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to init the cb_shader_format correctly with the changed
col_format, so this moves the col_format adjustment to before the
adjustment to before the cb_shader_mask gets generated.
Fixes: 06d3c650980 "radv: fix a GPU hang when MRTs are sparse"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903
CC: 18.1 <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On GFX8+, there is a bug that affects TC-compatible depth surfaces
when the ZRange is not reset after LateZ kills pixels.
The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match
the last fast clear value. Because the value is set to 1 by default,
we only need to update it when clearing Z to 0.0.
We also need to set the depth clear regs and to update
ZRANGE_PRECISION when initializing a TC-compat depth image to 0.
Original patch from James Legg.
This fixes random CTS fails with
dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396
CC: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the application asks for the maximum number of fragment input
components (128), use all of them plus some builtins that are
passed in the VUE, then we exceed the maximum number of used VUE
slots (32) and we break one assert that checks this limit.
Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1
builtins in brw_compute_vue_map() because we don't know if
gl_ClipDistance is going to be read/write by an adjacent stage.
Fixes VK-GL-CTS CL#2569.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes:
GL45-CTS.pipeline_statistics_query_tests_ARB.functional_compute_shader_invocations
Cc: 18.0 18.1 <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
This may not be needed yet, but let's set it now.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
This is based on our docs (recently updated), not amdvlk.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
They have a different frequency of updates and don't change when scissors
change.
I think this even fixes something in si_update_vs_viewport_state.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
This might improve performance on Vega10 and Raven.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
same as amdvlk.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
This was missed when adding CLIPVERTEX support into GS & tess.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
| |
RADV might wanna use this helper too.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
The change from MIN2 to MAX2 is intentional.
Cc: 18.1 <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
virgl should now expose GL4.1 where it can.
|
|
|
|
|
|
|
|
|
|
| |
This should add all the pieces to enable tess shaders on virgl.
v2: fixup transform to handle tess and strip out precise.
set default for max patch varyings to work around issue when
tess gets enabled from v1 caps but v2 caps aren't in place. (Elie)
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL 4.60 offically added this but games and older CTS suites actually
had shaders that did this, we may as well enable it everywhere.
Adding stable because it appears apps in the wild do this.
Acked-by: Timothy Arceri <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This causes rendering issues in Shadow Warrior 2 with DXVK.
Cc: [email protected]
Fixes: ccc64f3133 ("radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106912
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some platforms have 64-bit __atomic_load_n but not 64-bit
__atomic_add_fetch, so test for both of them.
Bug: https://bugs.gentoo.org/655616
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Some platforms have 64-bit __atomic_load_n but not 64-bit
__atomic_add_fetch, so test for both of them.
Bug: https://bugs.gentoo.org/655616
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 54ba73ef102f (configure.ac/meson.build: Fix -latomic test) fixed
some checks for -latomic, and then commit 54bbe600ec26 (configure.ac:
rework -latomic check) further extended the fixes in configure.ac but
not in Meson. This commit extends those fixes to the Meson tests.
Fixes: 54bbe600ec26 (configure.ac: rework -latomic check)
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
v3: - Remove "won't do" todos, so only completed todo's are now removed.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
|
|
|
| |
meson 0.43 gained support for optional modules, which clover wold like
to use. Since we require 0.44.1 now we can rely on them being available
for clover.
compile tested only.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: - Use -mpower8-vector in compiler test for altivec
- rename altivec option to power8
- reword power8 option description to be more clear, originally I
had made it a boolean, but replaced it with an auto option.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was blindly copied from autotools and tested by a helpful gentoo
user.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: - split this from the next patch
- Only include x86-64 and not x86 when buiding x86_64
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This just makes using cc and cpp easier.
v2: - Add this patch to fix altivec
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
This reverts commit b8fa847c2ed9c7c743f31e57560a09fae3992f46.
This broke about 30k Vulkan CTS tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.
For example, if a UBO contained the following, tightly packed:
vec4 a; // [0, 16)
float b; // [16, 20)
vec4 c; // [20, 36)
then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.
Similarly, dvec4s would suffer from the same problem.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
| |
brw_bufmgr.h uses time_t without include time.h, so the build fails under musl.
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes to avoid building error after change in image->planes[] structure,
{bo,bo_offset} has to be replaced by address.{bo,offset}
and update is needed also in the assert() for debug builds.
external/mesa/src/intel/vulkan/anv_android.c:188:21:
error: no member named 'bo' in 'struct anv_image::(anonymous at external/mesa/src/intel/vulkan/anv_private.h:2647:4)'
image->planes[0].bo = bo;
~~~~~~~~~~~~~~~~ ^
1 error generated.
Fixes: bf34ef16ac ("anv: Use an address for each anv_image plane")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes to avoid building error:
external/mesa/src/intel/vulkan/anv_android.c:131:72:
error: too few arguments to function call, expected 5, have 4
result = anv_bo_cache_import(device, &device->bo_cache, dma_buf, &bo);
~~~~~~~~~~~~~~~~~~~ ^
1 error generated.
(v2) Set the correct bo_flags based on support of 48bit addresses and soft-pin
Fixes: b0d50247a7 ("anv/allocator: Set the BO flags in bo_cache_alloc/import")
Fixes: e7d0378bd9 ("anv: Soft-pin client-allocated memory")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were enabling undefined memory checking for genxml values based on
Valgrind being installed at build time, even for release builds. This
generates piles and piles of assembly whenever you touch genxml.
With gcc 7.3.1 and -O3 and -march=native on a Kabylake with Valgrind
installed at build time:
text data bss dec hex filename
5978385 262884 13488 6254757 5f70a5 libvulkan_intel.so
3799377 262884 13488 4075749 3e30e5 libvulkan_intel.so
That's a 36% reduction in text size.
Fixes: 047ed02723071d7eccbed3210b5be6ae73603a53 (vk/emit: Use valgrind to validate every packed field)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that we're using GitLab, let's take advantage of the "landing page"
README feature with some minimal information, mostly to point people to
the right resources.
Acked-by: Dylan Baker <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: intel_miptree_release() already takes care of the planes, no need
to hand-code the loop (Lionel)
Coverity ID: 1436909
Fixes: 3352f2d746d3959b22ca4 "i965: Create multiple miptrees for planar YUV images"
Reviewed-by: Lionel Landwerlin <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
At least for PIPE_BUFFER, we could get the resource used as (for
example) R32F imageBuffer. So using cpp=1 from the rsc is wrong.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
copy-pasta fail from how SSBO sizes are handled.
Signed-off-by: Rob Clark <[email protected]>
|