| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.
In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."
So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.
We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.
Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..
v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)
Found by Caio Marcelo de Oliveira Filho
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <[email protected]>
Cc: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In a single call to vk_errorf() in the Android code, the arguments were
swapped. The bug has existed since day one. Chrome OS used to forgive
the warning, but it is now a compilation error.
CC: <[email protected]>
Fixes: 053d4c32 "anv: Implement VK_ANDROID_native_buffer (v9)"
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes to vk.xml and anv_entrypoints_gen.py broke the Autotools build
on Android. The changes undef'd the VK_ANDROID_native_buffer entrypoints
in anv_entrypoints.h.
Fix it with CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR.
CC: <[email protected]>
See-Also: 63525ba7 "android: enable VK_ANDROID_native_buffer"
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
| |
Using -vv will increase the verbosity, by printing the ppgtt mappings as
they get written into the aub file.
Cc: Lionel Landwerlin <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There are a number of opcode_desc table entries for many of these
unused opcodes. A symbolic opcode enum will be required in a future
commit in order to keep them in the opcode description tables. The
alternative would be to remove the unused opcodes from the opcode
description tables.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This makes the message length available at the IR level, which should
save some guesswork in a future commit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Constructing a descriptor in-place as part of the immediate of an ALU
instruction is no longer supported.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The return value is not used anymore. This allows simplifying the
code slightly, and in addition it should frustrate anybody's attempts
to continue using the obsolete piecemeal approach to construct a
message descriptor in combination with brw_send_indirect_message().
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
All users of brw_send_indirect_surface_message() should be providing a
full descriptor immediate up front by now, this isn't necessary
anymore.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
messages.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
brw_send_indirect_surface_message().
Instead of the current message_len, response_len and header_present
arguments.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
brw_send_indirect_message().
The current approach of returning a setup instruction where additional
descriptor fields can be specified is still supported in order to keep
things working, but it will be removed later in this series.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
controls.
This replaces brw_set_message_descriptor() with the composition of
brw_set_desc() and a new inline helper function that packs the common
message descriptor controls into an integer. The goal is to represent
all message descriptors as a 32-bit integer which is written at once
into the instruction, which is more flexible (SENDS anyone?), robust
(see d2eecf0b0b24d203d0f171807681dffd830d54de fixing an issue
ultimately caused by some bits of the extended message descriptor
being left undefined) and future-proof than the current approach of
specifying the individual descriptor fields directly into the
instruction.
This approach also seems more self-documenting, since it will allow
removing calls to functions with way too many arguments like
brw_set_*_message() and brw_send_indirect_message(), and instead
provide a single descriptor argument constructed from an appropriate
combination of brw_*_desc() helpers.
Note that because brw_set_message_descriptor() was (conditionally?)
overriding fields of the instruction which strictly speaking weren't
part of the message descriptor, this involves calling
brw_inst_set_sfid() and brw_inst_set_eot() in some cases in addition
to brw_set_desc().
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Allows to specify a bitfield based on its upper and lower bounds
instead of a symbolic field definition, kind of what the current
GET_BITS macro is to GET_FIELD.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
instruction.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
descriptor.
This introduces helpers that can be used to specify or extract the
whole descriptor of a SEND message instruction at once. Because the
the instruction encoding of these is rather awkward on some
generations using the generic brw_inst.h macros doesn't seem like an
option.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Keith Packard <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Error states coming from actual Vulkan applications tend to have fairly
long command buffers and lots of chained batches. 30 total BOs isn't
nearly enough. This commit bumps it to 256, makes some things use the
actual number of sections instead of the #define, and adds asserts if we
ever go over 256 sections.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Our attempt to restart the loop with the second level batch worked at
one point but got broken at some point. It was too fragile anyway and
we're not likely to have enough secondaries to actually overflow the
stack so we may as well recurse in both cases.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
v2: Update comment according to this patch. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.
This restriction was only allowing stride==2 destinations
for 8-bit types.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."
This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.
For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.
If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.
This fixes CTS tests that raised this issue as they were executed as SIMD8:
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
Shader-db results on Skylake:
total instructions in shared programs: 7686798 -> 7686797 (<.01%)
instructions in affected programs: 301 -> 300 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs: 337092322 -> 337091919 (<.01%)
cycles in affected programs: 22420415 -> 22420012 (<.01%)
helped: 712
HURT: 588
Shader-db results on Broadwell:
total instructions in shared programs: 7658574 -> 7658625 (<.01%)
instructions in affected programs: 19610 -> 19661 (0.26%)
helped: 3
HURT: 4
total cycles in shared programs: 340694553 -> 340676378 (<.01%)
cycles in affected programs: 24724915 -> 24706740 (-0.07%)
helped: 998
HURT: 916
total spills in shared programs: 4300 -> 4311 (0.26%)
spills in affected programs: 333 -> 344 (3.30%)
helped: 1
HURT: 3
total fills in shared programs: 5370 -> 5378 (0.15%)
fills in affected programs: 274 -> 282 (2.92%)
helped: 1
HURT: 3
v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
(Jose Maria Casanova)
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: 18.1 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):
"r127 must not be used for return address when there is a src and
dest overlap in send instruction."
v2: Style fixes (Matt Turner)
Reviewed-by: Matt Turner <[email protected]>
Cc: 18.1 <[email protected]>
|
|
|
|
|
|
|
|
| |
We were not properly writing page tables when the virtual address
range spans multiple subtrees of the tables.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
The implementation of CreateRenderPass2 uses the helpers we broke out in
previous commits. The implementations of the new vkCmd functions just
call the old versions.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
This new helper takes a VkSubpassDependency2KHR for future-proofing.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps us to compact original instruction:
mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q };
So now we emit:
mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted };
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
v2: merge both conditions to reduce the diff (Lionel)
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the time of commit 7bc6e455e23 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible. We
were only thinking about type converting moves from D to F, for
example. However, type converting moves w/saturate from F to DF are
definitely possible. This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.
Fixes new piglit tests:
- arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
- arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <[email protected]>
Cc: [email protected]
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|