| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Apparently the android part was never ported to meson.
CC: <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter
because the flags were already more-or-less what we wanted. However,
for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT
so we were getting W-tiled which isn't what we want for the shadow. By
passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now
get something that's actually texturable.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
|
|
|
|
|
|
|
|
|
| |
Copies to a shadow image happen during a VkCmdPipelineBarrier or at
subpass transitions. We could potentially be a bit more conservative
but these transitions shouldn't happen often and it's better to have our
bases covered.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
|
|
|
|
| |
%lu is for unsigned long, %zu is for size_t. Just cast the data.
|
|
|
|
|
|
|
| |
Using the existing VK_EXT_debug_report extension.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This should fix floating-point border color on all gen7 HW. Integer is
still thoroughly busted on gen7 because it doesn't exist on IVB and it's
crazy on HSW.
Cc: [email protected]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Whenever stencil texturing is not required (most of the time), we can
use VK_EXT_separate_stencil_usage to only create the shadow image when
VK_IMAGE_USAGE_SAMPLED_BIT is required for stencil. Of course, this
depends on applications to use the extension but hopefully DXVK and
similar translators are doing so and that covers most of the apps.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intel hardware didn't get support for sampling from W-tiled (required
for stencil) images until Broadwell so we can't directly sample from
stencil. Instead, if we want to support stencil texturing on gen7
hardware, we have to keep a texture-capable shadow copy around and use
BLORP to update when stencil changes. The one thing this commit does
not implement is self-dependencies with stencil input attachments.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.
The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.
With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.
v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size
v3:
- fix iris, lima, panfrost
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This significantly slows down the CTS runs.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 32ffd90002b04b ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modern DXVK requires event support [1], but looks like it only
uses vkCmdSetEvent() + vkGetEventStatus(). So we can just
borrow the relevant code from gen8, leaving CmdWaitEvents still
unimplemented.
[1] https://github.com/doitsujin/dxvk/commit/8c3900c533d83d12c970b905183d17a1d3e8df1f
v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason)
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The other sources of the bcsel behave like the sources of an and or
other logical operation. However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.
No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().
These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:
"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
that descriptor sets must not be allocated using this layout, and
descriptors are instead pushed by vkCmdPushDescriptorSetKHR."
"If flags contains
VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
elements of pBindings must not have a descriptorType of
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".
There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We were dropping "/' around arguments grouped together.
This was triggering failures with :
$ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed. However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On CNL+, the clear color struct is composed of RGBA channel values and
fields which are either reserved by the HW or used to control
fast-clears. Currently anv initializes the channel values to zero and
allows the other fields to be undefined.
Satisfy the MBZ field requirements by removing an optimization that
doesn't hold true for CNL+ and pulling in the number of dwords to
initialize from ISL.
Cc: <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I recently discovered that the following code lead to valgrind errors:
struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY;
VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle));
which is surprising, because struct isl_swizzle is simply:
struct isl_swizzle {
enum isl_channel_select r:4;
enum isl_channel_select g:4;
enum isl_channel_select b:4;
enum isl_channel_select a:4;
};
and the above code initializes all of them with a C99 initializer.
Iván Briano reminded me that C99 initializers don't necessarily zero
padding. A quick inspection revealed that sizeof(struct isl_swizzle)
was 4 (rather than the expected 2). Ian Romanick suggested changing
it to uint16_t, since this is essentially dicing up an unsigned, and
that worked.
This patch marks enum isl_channel_select packed, changing its size
from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes,
with no bogus padding fields. This eliminates valgrind undefined
memory warnings.
These isl_swizzle values become part of our BLORP blit program keys,
which are then hashed. This undefined padding was being included in
the hashing, possibly leading to issues. I originally saw this error
when running KHR-GL45.texture_size_promotion.functional in iris under
valgrind.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined
flexible YUV format. Most of the times, it's NV12 or YV12.
On Intel, NV12 is preferred since it can be used by the display
engine.
This API adds a dependency between gralloc and buffer consumers,
unfortunately. Right now, the code seems to work for i915 gralloc,
but not cros_gralloc. Add a preprocessor flag to fix this.
TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Checking isl_fmt returned value in assert seems appropriate
instead of format variable.
Fixes: f1654fa7e31 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Sagar Ghuge <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
resolve analysis
If the 2nd and 3rd source are both Boolean values, we can potentially
avoid a resolve by only resolving the result of the b32csel.
No changes on any Gen6+ Intel platform.
v2: Use ?: instead of cast from bool to unsigned. Suggested by Caio.
Iron Lake
total instructions in shared programs: 8142729 -> 8142677 (<.01%)
instructions in affected programs: 12890 -> 12838 (-0.40%)
helped: 26
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.39%
Instructions are helped.
total cycles in shared programs: 188549632 -> 188549394 (<.01%)
cycles in affected programs: 60754 -> 60516 (-0.39%)
helped: 25
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8
helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for cycles value: -12.91 -5.40
95% mean confidence interval for cycles %-change: -0.84% -0.23%
Cycles are helped.
GM45
total instructions in shared programs: 5013119 -> 5013093 (<.01%)
instructions in affected programs: 6764 -> 6738 (-0.38%)
helped: 13
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.34%
Instructions are helped.
total cycles in shared programs: 128977804 -> 128977700 (<.01%)
cycles in affected programs: 37738 -> 37634 (-0.28%)
helped: 13
HURT: 0
helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26%
95% mean confidence interval for cycles value: -8.00 -8.00
95% mean confidence interval for cycles %-change: -0.36% -0.24%
Cycles are helped.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would blindly emit an sequence like:
mov(1) f0.1<1>UW g1.14<0,1,0>UW
...
cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F */
(+f0.1) cmp.z.f0.1(16) null<1>D g7<8,8,1>D 0D
The first move sets the flags based on the initial execution mask.
Later discard sequences contain a predicated compare that can only
remove more SIMD channels. Often times the only user of the result from
the first compare is the second compare. Instead, generate a sequence
like
mov(1) f0.1<1>UW g1.14<0,1,0>UW
...
cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F */
(+f0.1) cmp.ge.f0.1(8) null<1>F g5<8,8,1>F 0x41700000F /* 15F */
If the results stored in g7 and f0.0 are not used, the comparison will
be eliminated. This removes an instruction and potentially reduces
register pressure.
v2: Major re-write of the commit message (including fixing the assembly
code). Suggested by Matt.
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224434 -> 17198659 (-0.15%)
instructions in affected programs: 2908125 -> 2882350 (-0.89%)
helped: 18891
HURT: 5
helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
HURT stats (abs) min: 9 max: 105 x̄: 51.40 x̃: 35
HURT stats (rel) min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
95% mean confidence interval for instructions value: -1.39 -1.34
95% mean confidence interval for instructions %-change: -1.79% -1.73%
Instructions are helped.
total cycles in shared programs: 361468458 -> 361170679 (-0.08%)
cycles in affected programs: 38470116 -> 38172337 (-0.77%)
helped: 16202
HURT: 1456
helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
HURT stats (abs) min: 1 max: 5982 x̄: 87.51 x̃: 28
HURT stats (rel) min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
95% mean confidence interval for cycles value: -18.24 -15.49
95% mean confidence interval for cycles %-change: -2.26% -2.14%
Cycles are helped.
total spills in shared programs: 12147 -> 12176 (0.24%)
spills in affected programs: 175 -> 204 (16.57%)
helped: 8
HURT: 5
total fills in shared programs: 25262 -> 25292 (0.12%)
fills in affected programs: 269 -> 299 (11.15%)
helped: 8
HURT: 5
Haswell
total instructions in shared programs: 13530316 -> 13502647 (-0.20%)
instructions in affected programs: 2507824 -> 2480155 (-1.10%)
helped: 18859
HURT: 10
helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
HURT stats (abs) min: 5 max: 39 x̄: 25.70 x̃: 31
HURT stats (rel) min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
95% mean confidence interval for instructions value: -1.49 -1.44
95% mean confidence interval for instructions %-change: -2.42% -2.34%
Instructions are helped.
total cycles in shared programs: 377865412 -> 377639034 (-0.06%)
cycles in affected programs: 40169572 -> 39943194 (-0.56%)
helped: 15550
HURT: 1938
helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
HURT stats (abs) min: 1 max: 4862 x̄: 89.17 x̃: 35
HURT stats (rel) min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
95% mean confidence interval for cycles value: -14.42 -11.47
95% mean confidence interval for cycles %-change: -2.05% -1.91%
Cycles are helped.
total spills in shared programs: 26769 -> 26814 (0.17%)
spills in affected programs: 826 -> 871 (5.45%)
helped: 9
HURT: 10
total fills in shared programs: 38383 -> 38425 (0.11%)
fills in affected programs: 834 -> 876 (5.04%)
helped: 9
HURT: 10
LOST: 5
GAINED: 10
Ivy Bridge
total instructions in shared programs: 12079250 -> 12044139 (-0.29%)
instructions in affected programs: 2409680 -> 2374569 (-1.46%)
helped: 16135
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
95% mean confidence interval for instructions value: -2.21 -2.14
95% mean confidence interval for instructions %-change: -2.76% -2.67%
Instructions are helped.
total cycles in shared programs: 180116747 -> 179900405 (-0.12%)
cycles in affected programs: 25439823 -> 25223481 (-0.85%)
helped: 13817
HURT: 1499
helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
HURT stats (abs) min: 1 max: 3684 x̄: 98.99 x̃: 52
HURT stats (rel) min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
95% mean confidence interval for cycles value: -15.68 -12.57
95% mean confidence interval for cycles %-change: -1.77% -1.63%
Cycles are helped.
LOST: 8
GAINED: 10
Sandy Bridge
total instructions in shared programs: 10878990 -> 10863659 (-0.14%)
instructions in affected programs: 1806702 -> 1791371 (-0.85%)
helped: 13023
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
95% mean confidence interval for instructions value: -1.18 -1.17
95% mean confidence interval for instructions %-change: -1.68% -1.62%
Instructions are helped.
total cycles in shared programs: 154082878 -> 153862810 (-0.14%)
cycles in affected programs: 20199374 -> 19979306 (-1.09%)
helped: 12048
HURT: 510
helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
HURT stats (abs) min: 1 max: 448 x̄: 54.39 x̃: 16
HURT stats (rel) min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
95% mean confidence interval for cycles value: -17.97 -17.08
95% mean confidence interval for cycles %-change: -1.84% -1.75%
Cycles are helped.
LOST: 1
GAINED: 0
Iron Lake
total instructions in shared programs: 8155075 -> 8142729 (-0.15%)
instructions in affected programs: 949495 -> 937149 (-1.30%)
helped: 5810
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.59% -2.48%
Instructions are helped.
total cycles in shared programs: 188584610 -> 188549632 (-0.02%)
cycles in affected programs: 17274446 -> 17239468 (-0.20%)
helped: 3881
HURT: 90
helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
HURT stats (abs) min: 2 max: 10 x̄: 2.80 x̃: 2
HURT stats (rel) min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
95% mean confidence interval for cycles value: -9.35 -8.27
95% mean confidence interval for cycles %-change: -0.85% -0.77%
Cycles are helped.
GM45
total instructions in shared programs: 5019308 -> 5013119 (-0.12%)
instructions in affected programs: 489028 -> 482839 (-1.27%)
helped: 2912
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.54% -2.39%
Instructions are helped.
total cycles in shared programs: 129002592 -> 128977804 (-0.02%)
cycles in affected programs: 12669152 -> 12644364 (-0.20%)
helped: 2759
HURT: 37
helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
HURT stats (abs) min: 2 max: 10 x̄: 3.62 x̃: 4
HURT stats (rel) min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
95% mean confidence interval for cycles value: -9.53 -8.20
95% mean confidence interval for cycles %-change: -0.79% -0.70%
Cycles are helped.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the same as the need_dest parameter to
prepare_alu_destination_and_sources. This allows us to not change the
register that is expected to hold an result if an instruction is
re-emitted. This is particularly a problem if the re-emitted
instruction is a partial write. A later patch will use this feature.
No shader-db changes on any Intel platform.
v2: Don't do the Boolean resolve when there is no destination. If the
ALU instruction didn't write a register, there's nothing to resolve.
This replaces an earlier patch "intel/fs: Allocate dummy destination
register when need_dest is false".
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This also helps a later patch (intel/fs: Improve discard_if code
generation) on about 200 shaders.
v2: Document that other instruction sequences are also valid in
subtract_merge_with_compare_intervening_mismatch_flag_write. Suggested
by Caio.
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17224434 (<.01%)
instructions in affected programs: 296 -> 292 (-1.35%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.04% -0.81%
Instructions are helped.
total cycles in shared programs: 361468455 -> 361468458 (<.01%)
cycles in affected programs: 2862 -> 2865 (0.10%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
HURT stats (abs) min: 3 max: 4 x̄: 3.50 x̃: 3
HURT stats (rel) min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -4.34 5.84
95% mean confidence interval for cycles %-change: -0.70% 0.90%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two errors. First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register. For example,
cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F
could be come
cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0. For example,
linterp(16) vgrf6:F, g2:F, attr0:F
cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
(-f0.1) discard_jump(16) (null):UD
could become
linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
(-f0.1) discard_jump(16) (null):UD
None of these cases will occur currently. The only time we use f0.1 is
for generating discard intrinsics. In all those cases, we generate a
squence like:
cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
(+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
(-f0.1) discard_jump(16) (null):UD
Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation. The next patch will change this.
No shader-db changes on any Intel platform.
v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Tests like this should have been added in 4467040cb65 ("i965/fs:
Propagate conditional modifiers from not instructions").
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes blorp_blit handle SINT<->UINT blit value clamping.
After reading the source's integer data (which is expanded to 32-bit),
we either IMAX with 0 (for SINT -> UINT, to clamp negative numbers) or
UMIN with (1 << 31) - 1 (for UINT -> SINT, to clamp positive numbers
outside of the representable range).
Such blits are not allowed by the OpenGL or Vulkan APIs directly:
The Vulkan 1.1 spec for vkCmdBlitImage says:
"Integer formats can only be converted to other integer formats with
the same signedness."
The GL 4.5 spec for glBlitFramebuffer says:
"An INVALID_OPERATION error is generated if format conversions are
not supported, which occurs under any of the following conditions:
[...]
* The read buffer contains unsigned integer values and any draw
buffer does not contain unsigned integer values.
* The read buffer contains signed integer values and any draw buffer
does not contain signed integer values."
However, they are useful for other operations, such as texture upload
and download, which typically are implemented via blorp_blit(). i965
has code to fall back in this case (which the next commit will delete),
and Gallium expects blit() to handle this case for texture upload.
Fixes the following tests on iris:
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This let deref optimizations apply to globals before lowering them.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there. This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.
Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again. I'm not aware of any hardware which
need lowering for one bitsize and not the other.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64. The version there also better handles imprecisions
due to lowered frcp@64. Let's consolidate on one version.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're currently trying to detect dynamic loading config support by
trying to remove to test config (hard coded in the i915 driver) and
checking we get ENOENT.
This can fail if the test config was updated in Mesa but not yet in
i915.
A better way to do this is to pick an invalid ID and check for ENOENT.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is
enabled, we can just take a single pointer and not a pointer to pointer.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
EuThreadsCount is supposed to be the number of threads per EU, not the
total number of threads in the whole device.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 1fc7b951278428 ("i965: Add Gen8+ INTEL_performance_query support")
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes formatting errors for 32 bit compilations, eg:
error: format ‘%lx’ expects argument of type ‘long unsigned int’,
but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’}
[-Werror=format=]
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With 8 and 16-bit types and anything where we have to use non-trivial
strides registersto deal with restrictions, we end up with things that
look like partial writes even though we don't care about any values in
the register except those written by that instruction. This is
particularly important when dealing with loops because liveness sees
is_partial_write and the fact that an old version from a previous loop
iteration may be valid at that point and extends all purely partially
written values to the entire loop.
This commit adds a new UNDEF instruction which does nothing (the
generator doesn't emit anything) but which does a fake write to the
register. This informs liveness that we don't care about any values
before that point so it won't consider those registers to be falsely
live. We can safely emit UNDEF instructions for all SSA values that
come in from NIR and nearly all temporaries generated by various stages
of the compiler. In particular, we need to insert UNDEF instructions
when we handle region restrictions because the newly allocated registers
are almost guaranteed to be partially written.
No shader-db changes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This might be slightly faster since we're doing one read rather than
two before we decide to skip. The more important reason, however, is
because no_spill prevents us from re-spilling spill registers. In the
new world in which we don't re-calculate liveness every spill, we may
not have valid liveness for spill registers so we shouldn't even look
their live ranges up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825
Fixes: e99081e76d4 "intel/fs/ra: Spill without destroying the..."
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Tested-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Fix assertion for src1 (Ian Romanick)
Fixes: 3b967e17 (intel/compiler: Avoid false positive assertions)
Signed-off-by: Sagar Ghuge <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Almost all of the spill / fill benefit is in Deus Ex.
Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs) min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel) min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.
total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs) min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel) min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0
total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3
LOST: 0
GAINED: 17
Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.
total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs) min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel) min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0
total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0
LOST: 0
GAINED: 8
Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.
total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs) min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel) min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.
total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0
total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0
LOST: 0
GAINED: 2
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.
total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs) min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel) min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Cannonlake hardware adds a new resolve type in 3DSTATE_PS called
FAST_CLEAR_0 which does an ambiguate. Now that the hardware can do it
directly, we should use that instead of binding the CCS as a render
target and doing it manually. This was tested with a full Vulkan CTS
run on Cannonlake.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We set header_present but then pass it some random garbage. Give it g0
instead. I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it. There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.
The return type reflects better what the function name suggests. It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it. When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|