| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helpers are used to load/store 16-bit types from/to 32-bit
components.
The functions shuffle_32bit_load_result_to_16bit_data and
shuffle_16bit_data_for_32bit_write are implemented in a similar
way than the analogous functions for handling 64-bit types.
v1: Explain need of temporary in shuffle operations. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Used to enable 16-bit reads at do_untyped_vector_read, that is used on
the following intrinsics:
* nir_intrinsic_load_shared
* nir_intrinsic_load_ssbo
v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: - Add bitsize to scattered read operation (Jason Ekstrand)
- Remove implementation of 16-bit UBO read from this patch.
- Avoid assertion at opt_algebraic caused by ADD of two IMM with
offset with BRW_REGISTER_TYPE_UD type found on matrix tests.
(Jose Maria Casanova)
v4: (Jason Ekstrand)
- Put if case for 16-bits at the beginning of the if ladder.
- Use type_sz(dest.type) * 8 as bit_size parameter for scattered read.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Fix alignment style (Topi Pohjolainen)
(Jason Ekstrand)
- Enable bit_size parameter to scattered messages to enable different
bitsizes byte/word/dword.
- Remove use of brw_send_indirect_scattered_message in favor of
brw_send_indirect_surface_message.
- Move scattered messages to surface messages namespace.
- Assert align1 for scattered messages and assume Gen8+.
- Inline brw_set_dp_byte_scattered_read.
v3: (Jason Ekstrand)
- Use renamed brw_byte_scattered_data_element_from_bit_size method
- Assert scattered read for Gen8+ and Haswell.
- Use conditional expresion at components_read.
- Include comment about params for scattered opcodes.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While on Untyped Surface messages the bits of the execution mask are
ANDed with the corresponding bits of the Pixel/Sample Mask, that is
not the case for byte scattered writes. That is needed to avoid ssbo
stores writing on helper invocations. So when that can affect, we load
the sample mask, and predicate the send message.
Note: the need for this patch was tested with a custom test. Right now
the 16 bit storage CTS tests doesnt need this path in order to get a
full pass.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to rely on byte scattered writes as untyped writes are 32-bit
size. We could try to keep using 32-bit messages when we have two or
four 16-bit elements, but for simplicity sake, we use the same message
for any component number. We revisit this aproach in the follwing
patches.
v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: (Jason Ekstrand)
- Include bit_size to scattered write message and remove namespace
- specific for scattered messages.
- Move comment to proper place.
- Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo.
(Jose Maria Casanova)
- Take into account that get_nir_src returns now WORD types for
16-bit sources instead of DWORD.
v4: (Jason Ekstrand)
- Rename lenght variable to num_components.
- Include assertions before emit_untyped_write.
- Remove type_slot in favor of num_slot and first_slot.
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: (Jason Ekstrand)
- Enable bit_size parameter to scattered messages to enable different
bitsizes byte/word/dword.
- Remove use of brw_send_indirect_scattered_message in favor of
brw_send_indirect_surface_message.
- Move scattered messages to surface messages namespace.
- Assert align1 for scattered messages and assume Gen8+.
- Inline brw_set_dp_byte_scattered_write.
v3: - Remove leftover newline (Topi Pohjolainen)
- Rename brw_data_size to brw_scattered_data_element and use
defines instead of an enum (Jason Ekstrand)
- Assert scattered write for Gen8+ and Haswell (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although from SPIR-V point of view, rounding modes are attached to the
operation/destination, on i965 it is a status, so we don't need to
explicitly set the rounding mode if the one we want is already set.
Taking into account that the default mode is RTE, one possible
optimization would be optimize out the first RTE set for each
block. For in order to work, we would need to take into account block
interrelationships. At this point, it is not worth to complicate the
optimization for such small gain.
v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)
v3: Reset optimization for every block. (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default we don't set the rounding mode. We only set
round-to-near-even or round-to-zero mode if explicitly set from nir.
v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)
v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although it is possible to emit them directly as AND/OR on brw_fs_nir,
having a specific opcode makes it easier to remove duplicate settings
later.
v2: (Curro)
- Set thread control to 'switch' when using the control register
- Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode.
- Avoid magic numbers setting rounding mode field at control register.
v3: (Curro)
- Remove redundant and add missing whitespace lines.
- Match printing instruction to IR opcode "rnd_mode"
v4: (Topi Pohjolainen)
- Fix code style.
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Control register cr0 in i965 can be used to change the rounding modes
in 32-bit to 16-bit floating-point conversions.
From intel Skylake PRM, vol 07, section "Register and Tegister Regions",
subsection "Control Register" (page 754):
"Subregister cr0.0:ud contains normal operation control fields such as the
floating-point mode ... "
Floating-point Rounding mode is changed at bits 5:4 of cr0.0:
"Rounding Mode. This field specifies the FPU rounding mode. It is
initialized by Thread Dispatch."
00b = Round to Nearest or Even (RTNE)
01b = Round Up, toward +inf (RU)
10b = Round Down, toward -inf (RD)
11b = Round Toward Zero (RTZ)"
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Conversions to 16-bit need having aligment between the 16-bit
and 32-bit types. So the conversion operations unpack 16-bit types
to with an stride=2 and then applies a MOV with the conversion.
v2 (Jason Ekstrand):
- Avoid the general use of stride=2 for 16-bit register types.
v3 (Topi Pohjolainen)
- Code style fix
(Jason Ekstrand)
- Now nir_op_f2f16 was renamed to nir_op_f2f16_undef
because conversion to f16 with undefined rounding is explicit
Signed-off-by: Eduardo Lima <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Note that we don't remove the assert at i965/vec4. At this point half
float support is only for the scalar backend.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand)
v3: Fix coding style (Topi Pohjolainen)
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Eduardo Lima <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
These types have similar vec4 sizes as their 32-bit counterparts.
The vec4 backend doesn't support 16-bit types and probably never will,
but this method is called by the scalar backend at
fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4
sizes for 16-bit types. In the future, something different should be
implemented to avoid this dependency.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: Minor changes after rebase against recent master (Alejandro
Pinheiro)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
SpvOpFConvert now manages the FPRoundingMode decorator for the
returning values enabling the nir_rounding_mode in the conversion
operation to fp16 values.
v2: Fixed breaking of specialization constants. (Jason Ekstrand)
v3: Avoid nir_rounding_mode * casting. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Added more missing implementations of 16-bit types. (Jason Ekstrand)
v3: Store values in values[0].u16[i] (Jason Ekstrand)
Include switches based on bitsize for 16-bit types
(Chema Casanova)
v4: Coding style fixes (Jason Ekstrand)
Use vtn_u64_literal and u64[0] at 64-bit SpvOpConstant (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Eduardo Lima <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nir_type_conversion enables new operations to handle rounding modes to
convert to fp16 values. Two new opcodes are enabled nir_op_f2f16_rtne
and nir_op_f2f16_rtz.
The undefined behaviour doesn't has any effect and uses the original
nir_op_f2f16 operation.
v2: Indentation fixed (Jason Ekstrand)
v3: Use explicit case for undefined rounding and assert if
rounding mode is used for non 16-bit float conversions
(Jason Ekstrand)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will include the following NIR ALU opcodes:
* nir_op_i2i16
* nir_op_i2f16
* nir_op_u2u16
* nir_op_u2f16
* nir_op_f2i16
* nir_op_f2u16
* nir_op_f2f16
v2: Remove "from" 16-bit in commit subject (Topi Pohjolainen)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: Added comments describing each of the rounding modes. (Jason
Ekstrand)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Renamed glsl_half_float_type() to glsl_float16_t_type().
(Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Eduardo Lima <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is basically to avoid "not handle in switch" warnings.
v2: Let the new types hit the assertion instead. (Marek Olšák
and Jason Ekstrand)
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds new INT16, UINT16 and FLOAT16 base types.
The corresponding GL types for half floats were reused from the
AMD_gpu_shader_half_float extension. The int16 and uint16 types come from
NV_gpu_shader_5 extension.
This adds the builtins and the lexer support.
To avoid a bunch of warnings due to cases not handled in switch, the
new types have been added to a few places using same behavior as
their 32-bit counterparts, except for a few trivial cases where they are
already handled properly. Subsequent patches in this set will provide
correct 16-bit implementations when needed.
v2: * Use FLOAT16 instead of HALF_FLOAT as name of the base type.
* Removed float16_t from builtin types.
* Don't copy 16-bit types as if they were 32-bit values in
copy_constant_to_storage().
* Use get_scalar_type() instead of adding a new custom switch
statement.
(Jason Ekstrand)
v3: Use GL_FLOAT16_NV instead of GL_HALF_FLOAT for consistency
(Ilia Mirkin)
v4: Add missing 16-bit base types support in glsl_to_nir (Eduardo Lima).
v5: Fix coding style (Topi Poholainen).
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Signed-off-by: Eduardo Lima <[email protected]>
Signed-off-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Not to be confused with variablePointersStorageBuffer which is the
subset of VK_KHR_variable_pointers required to enable the extension.
This means we now have "full" support for variable pointers.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SPIR-V spec is a bit underspecified when it comes to exactly how
you're allowed to use OpPtrAccessChain and what it means in certain edge
cases. In particular, what if the base pointer of the OpPtrAccessChain
points to the base struct of an SSBO instead of an element in that SSBO.
The original variable pointers implementation in mesa assumed that you
weren't allowed to do an OpPtrAccessChain that adjusted the block index
and asserted such. However, there are some CTS tests that do this and,
if the CTS does it, someone will do it in the wild so we should probably
handle it. With this commit, we significantly reduce our assumptions
and should be able to handle more-or-less anything.
The one assumption we still make for correctness is that if we see an
OpPtrAccessChain on a pointer to a struct decorated block that the block
index should be adjusted. In theory, someone could try to put an array
stride on such a pointer and try to make the SSBO an implicit array of
the base struct and we would not give them what they want. That said,
any index other than 0 would count as an out-of-bounds access which is
invalid.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
This is required for being able to handle OpPtrAccessChain in SPIR-V
where the base type of the incoming pointer requires us to add to the
block index instead of the byte offset.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we always left workgroup variables as shared nir_variables and
let the driver call nir_lower_io. This adds an option to do the
lowering directly in spirv_to_nir. To do this, we implicitly assign the
variables a std430 layout and then treat them like a UBO or SSBO and
immediately lower all the way to an offset.
As a side-effect, the spirv_to_nir pass now handles variable pointers
for workgroup variables.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Up until now, all pointers have been ivec2s. We're about to add support
for pointers to workgroup storage and those are going to be uints.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There is no good reason why we should have the same logic repeated in
get_vulkan_resource_index and vtn_ssa_offset_pointer_dereference. If
we're a bit more careful about how we do things, we can just use the one
function and get rid of the other entirely. This also makes the push
constant special case a lot more clear.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit moves them both into vtn_variables.c towards the top, makes
them take a vtn_builder, and replaces a hand-rolled instance of
is_external_block with a function call.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This makes us key off of !offset instead of !block_index. It also puts
the guts inside a switch statement so that we can handle more than just
UBOs and SSBOs.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
This parallels what we do for vtn_block_load except that we don't yet
support anything except SSBO loads through this path.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
This is equivalent and means we don't have resource index code scattered
about.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
The layer parameter is signed. Fixes the error message seen when
running the arb_texture_multisample-errors test which checks a
negative layer value.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
To be consistent.
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Grrr..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the mesa/st nir linking support, we start to see inputs/outputs
like:
decl_var shader_out INTERP_MODE_NONE float packed:uv (VARYING_SLOT_VAR9.x, 1, 0)
decl_var shader_out INTERP_MODE_NONE float packed:uv@0 (VARYING_SLOT_VAR9.y, 1, 0)
(ie. were location_frac != .x)
Unfortunately I overlooked the addition of the component parameter to
load_input/store_output, so when we started encountering inputs/outputs
with component other than .x, we'd end up loading/storing the wrong
input/output.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since in the NIR case, driver takes ownership of the NIR shader, we need
to clone what is passed to the driver. Normally this is done as part of
creating the shader variant (where is clone is anyways needed). But
compute shaders have no variants, so we were cloning earlier.
The problem is that after the NIR linking optimizations, we ended up
cloning *before* all the lowering passes where done.
So move this into st_get_cp_variant(), to make compute shaders work more
like other shader stages.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
This just moves some code around to make it easier to add compute.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This just adds support for decompressing compute resources.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|