mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel/cfg: Represent divergent control flow paths caused by non-uniform loop ↵	Francisco Jerez	2017-12-07	1	-6/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	execution. This addresses a long-standing back-end compiler bug that could lead to cross-channel data corruption in loops executed non-uniformly. In some cases live variables extending through a loop divergence point (e.g. a non-uniform break) into a convergence point (e.g. the end of the loop) wouldn't be considered live along all physical control flow paths the SIMD thread could possibly have taken in between due to some channels remaining in the loop for additional iterations. This patch fixes the problem by extending the CFG with physical edges that don't exist in the idealized non-vectorized program, but represent valid control flow paths the SIMD EU may take due to the divergence of logical threads. This makes sense because the i965 IR is explicitly SIMD, and it's not uncommon for instructions to have an influence on neighboring channels (e.g. a force_writemask_all header setup), so the behavior of the SIMD thread as a whole needs to be considered. No changes in shader-db. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/fs: Don't let undefined values prevent copy propagation.	Francisco Jerez	2017-12-07	1	-3/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the dataflow propagation logic of the copy propagation pass more intelligent in cases where the destination of a copy is known to be undefined for some incoming CFG edges, building upon the definedness information provided by the last patch. Helps a few programs, and avoids a handful shader-db regressions from the next patch. shader-db results on ILK: total instructions in shared programs: 6541547 -> 6541523 (-0.00%) instructions in affected programs: 360 -> 336 (-6.67%) helped: 8 HURT: 0 LOST: 0 GAINED: 10 shader-db results on BDW: total instructions in shared programs: 8174323 -> 8173882 (-0.01%) instructions in affected programs: 7730 -> 7289 (-5.71%) helped: 5 HURT: 2 LOST: 0 GAINED: 4 shader-db results on SKL: total instructions in shared programs: 8185669 -> 8184598 (-0.01%) instructions in affected programs: 10364 -> 9293 (-10.33%) helped: 5 HURT: 2 LOST: 0 GAINED: 2 Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Restrict live intervals to the subset possibly reachable from any ↵	Francisco Jerez	2017-12-07	2	-4/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	definition. Currently the liveness analysis pass would extend a live interval up to the top of the program when no unconditional and complete definition of the variable is found that dominates all of its uses. This can lead to a serious performance problem in shaders containing many partial writes, like scalar arithmetic, FP64 and soon FP16 operations. The number of oversize live intervals in such workloads can cause the compilation time of the shader to explode because of the worse than quadratic behavior of the register allocator and scheduler when running out of registers, and it can also cause the running time of the shader to explode due to the amount of spilling it leads to, which is orders of magnitude slower than GRF memory. This patch fixes it by computing the intersection of our current live intervals with the subset of the program that can possibly be reached from any definition of the variable. Extending the storage allocation of the variable beyond that is pretty useless because its value is guaranteed to be undefined at a point that cannot be reached from any definition. According to Jason, this improves performance of the subgroup Vulkan CTS tests significantly (e.g. the runtime of the dvec4 broadcast test improves by nearly 50x). No significant change in the running time of shader-db (with 5% statistical significance). shader-db results on IVB: total cycles in shared programs: 61108780 -> 60932856 (-0.29%) cycles in affected programs: 16335482 -> 16159558 (-1.08%) helped: 5121 HURT: 4347 total spills in shared programs: 1309 -> 1288 (-1.60%) spills in affected programs: 249 -> 228 (-8.43%) helped: 3 HURT: 0 total fills in shared programs: 1652 -> 1597 (-3.33%) fills in affected programs: 262 -> 207 (-20.99%) helped: 4 HURT: 0 LOST: 2 GAINED: 209 shader-db results on BDW: total cycles in shared programs: 67617262 -> 67361220 (-0.38%) cycles in affected programs: 23397142 -> 23141100 (-1.09%) helped: 8045 HURT: 6488 total spills in shared programs: 1456 -> 1252 (-14.01%) spills in affected programs: 465 -> 261 (-43.87%) helped: 3 HURT: 0 total fills in shared programs: 1720 -> 1465 (-14.83%) fills in affected programs: 471 -> 216 (-54.14%) helped: 4 HURT: 0 LOST: 2 GAINED: 162 shader-db results on SKL: total cycles in shared programs: 65436248 -> 65245186 (-0.29%) cycles in affected programs: 22560936 -> 22369874 (-0.85%) helped: 8457 HURT: 6247 total spills in shared programs: 437 -> 437 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 870 -> 854 (-1.84%) fills in affected programs: 16 -> 0 helped: 1 HURT: 0 LOST: 0 GAINED: 107 Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Teach instruction scheduler about GRF bank conflict cycles.	Francisco Jerez	2017-12-07	3	-2/+23
\| \| \| \| \| \| \| \| \| \| \|	This should allow the post-RA scheduler to do a slightly better job at hiding latency in presence of instructions incurring bank conflicts. The main purpuse of this patch is not to improve performance though, but to get conflict cycles to show up in shader-db statistics in order to make sure that regressions in the bank conflict mitigation pass don't go unnoticed. Acked-by: Matt Turner <[email protected]>
*	intel/fs: Implement GRF bank conflict mitigation pass.	Francisco Jerez	2017-12-07	5	-0/+898
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <[email protected]>
*	meson: add dep_thread to every lib that includes threads.h	Eric Engestrom	2017-12-07	1	-1/+1
\| \| \| \| \| \|	Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
*	anv: fix a case statement in GetMemoryFdPropertiesKHR	Fredrik Höglund	2017-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: ab18e8e59b6 ("anv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Use untyped_surface_read for 16-bit load_ssbo	Jose Maria Casanova Crespo	2017-12-06	1	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_surface_read message to read pairs of 16-bit components using only one message. Once each pair is read it is unshuffled to return the proper 16-bit components. vec3 case is assimilated to vec4 but the 4th component is ignored. 16-bit scalars are read using one byte_scattered_read message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using unshuffle 16 reads (Chema Casanova) v3: Use W and D types insead of HF and F in shuffle to avoid rounding erros (Jason Ekstrand) Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand) v4: Use subscript insead of chaging type and stride (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg	Jose Maria Casanova Crespo	2017-12-06	1	-15/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we use byte-scattered write messages for storing 16-bit into an SSBO. This is because untyped surface messages have a fixed 32-bit size. This patch optimizes these 16-bit writes by combining 2 values (e.g, two consecutive components aligned with 32-bits) into a 32-bit register, packing the two 16-bit words. 16-bit single component values will continue to use byte-scattered write messages. The same will happens when the first consecutive component is not aligned 32-bits. This optimization reduces the number of SEND messages used for storing 16-bit values potentially by 2 or 4, which cuts down execution time significantly because byte-scattered writes are an expensive operation as they only write a component for message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using shuffle 16 write and enable writes of 16bit vec4 with only one message of 32-bits. (Chema Casanova) v3: - Fix coding style (Eduardo Lima) - Reorganize code to avoid duplication. (Jason Ekstrand) - Include new comments to explain the length calculations to fix alignment issues of components. (Jason Ekstrand) - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand) v4: (Jason Ekstrand) - Reorganize 64-bit ssbo-writes to avoid using slots_per_component. - Comment about why suffle is needed when using byte_scattered_write. Signed-off-by: Eduardo Lima <[email protected]> Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage for SSBO/UBO	Alejandro Piñeiro	2017-12-06	3	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enables SPV_KHR_16bit_storage on gen 8+. VK_KHR_16bit_storage is enabled for SSBO/UBO using the VK_KHR_get_physical_device_properties2 functionality to expose if the extension is supported or not. v2: update due rebase against master (Alejandro) v3: (Jason Ekstrand) - Move this patch up in VK_KHR_16bit_storage series enabling only storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess. - Only expose VK_KHR_16bit_storage on Gen8+ v4: (Jason Ekstrand) - Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage enablement for SSBO/UBO. Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Enables 16-bit load_ubo with sampler	Jason Ekstrand	2017-12-06	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit surface format defined. So when reading 16-bit components with the sampler we need to unshuffle two 16-bit components from each 32-bit component. Using the sampler avoids the use of the byte_scattered_read message that needs one message for each component and is supposed to be slower. v2: (Jason Ekstrand) - Simplify component selection and unshuffling for different bitsizes - Remove SKL optimization of reading only two 32-bit components when reading 16-bits types. Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
*	i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components	Jose Maria Casanova Crespo	2017-12-06	2	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \|	This helpers are used to load/store 16-bit types from/to 32-bit components. The functions shuffle_32bit_load_result_to_16bit_data and shuffle_16bit_data_for_32bit_write are implemented in a similar way than the analogous functions for handling 64-bit types. v1: Explain need of temporary in shuffle operations. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Use byte scattered read for 16-bit load_ssbo	Jose Maria Casanova Crespo	2017-12-06	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Used to enable 16-bit reads at do_untyped_vector_read, that is used on the following intrinsics: * nir_intrinsic_load_shared * nir_intrinsic_load_ssbo v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: - Add bitsize to scattered read operation (Jason Ekstrand) - Remove implementation of 16-bit UBO read from this patch. - Avoid assertion at opt_algebraic caused by ADD of two IMM with offset with BRW_REGISTER_TYPE_UD type found on matrix tests. (Jose Maria Casanova) v4: (Jason Ekstrand) - Put if case for 16-bits at the beginning of the if ladder. - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Add byte scattered read message and fs support	Jose Maria Casanova Crespo	2017-12-06	9	-1/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Fix alignment style (Topi Pohjolainen) (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_read. v3: (Jason Ekstrand) - Use renamed brw_byte_scattered_data_element_from_bit_size method - Assert scattered read for Gen8+ and Haswell. - Use conditional expresion at components_read. - Include comment about params for scattered opcodes. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Predicate byte scattered writes if needed	Alejandro Piñeiro	2017-12-06	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While on Untyped Surface messages the bits of the execution mask are ANDed with the corresponding bits of the Pixel/Sample Mask, that is not the case for byte scattered writes. That is needed to avoid ssbo stores writing on helper invocations. So when that can affect, we load the sample mask, and predicate the send message. Note: the need for this patch was tested with a custom test. Right now the 16 bit storage CTS tests doesnt need this path in order to get a full pass. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Use byte_scattered_write on 16-bit store_ssbo	Alejandro Piñeiro	2017-12-06	1	-20/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to rely on byte scattered writes as untyped writes are 32-bit size. We could try to keep using 32-bit messages when we have two or four 16-bit elements, but for simplicity sake, we use the same message for any component number. We revisit this aproach in the follwing patches. v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: (Jason Ekstrand) - Include bit_size to scattered write message and remove namespace - specific for scattered messages. - Move comment to proper place. - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo. (Jose Maria Casanova) - Take into account that get_nir_src returns now WORD types for 16-bit sources instead of DWORD. v4: (Jason Ekstrand) - Rename lenght variable to num_components. - Include assertions before emit_untyped_write. - Remove type_slot in favor of num_slot and first_slot. Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Add byte scattered write message and fs support	Jose Maria Casanova Crespo	2017-12-06	9	-0/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_write. v3: - Remove leftover newline (Topi Pohjolainen) - Rename brw_data_size to brw_scattered_data_element and use defines instead of an enum (Jason Ekstrand) - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Add remove_extra_rounding_modes optimization	Alejandro Piñeiro	2017-12-06	3	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although from SPIR-V point of view, rounding modes are attached to the operation/destination, on i965 it is a status, so we don't need to explicitly set the rounding mode if the one we want is already set. Taking into account that the default mode is RTE, one possible optimization would be optimize out the first RTE set for each block. For in order to work, we would need to take into account block interrelationships. At this point, it is not worth to complicate the optimization for such small gain. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Reset optimization for every block. (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Enable rounding mode on f2f16 ops	Alejandro Piñeiro	2017-12-06	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default we don't set the rounding mode. We only set round-to-near-even or round-to-zero mode if explicitly set from nir. v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode (Curro) v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Define new shader opcode to set rounding modes	Alejandro Piñeiro	2017-12-06	5	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although it is possible to emit them directly as AND/OR on brw_fs_nir, having a specific opcode makes it easier to remove duplicate settings later. v2: (Curro) - Set thread control to 'switch' when using the control register - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate with the rounding mode. - Avoid magic numbers setting rounding mode field at control register. v3: (Curro) - Remove redundant and add missing whitespace lines. - Match printing instruction to IR opcode "rnd_mode" v4: (Topi Pohjolainen) - Fix code style. Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add support for control register	Jose Maria Casanova Crespo	2017-12-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Control register cr0 in i965 can be used to change the rounding modes in 32-bit to 16-bit floating-point conversions. From intel Skylake PRM, vol 07, section "Register and Tegister Regions", subsection "Control Register" (page 754): "Subregister cr0.0:ud contains normal operation control fields such as the floating-point mode ... " Floating-point Rounding mode is changed at bits 5:4 of cr0.0: "Rounding Mode. This field specifies the FPU rounding mode. It is initialized by Thread Dispatch." 00b = Round to Nearest or Even (RTNE) 01b = Round Up, toward +inf (RU) 10b = Round Down, toward -inf (RD) 11b = Round Toward Zero (RTZ)" Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Handle 32-bit to 16-bit conversions	Alejandro Piñeiro	2017-12-06	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conversions to 16-bit need having aligment between the 16-bit and 32-bit types. So the conversion operations unpack 16-bit types to with an stride=2 and then applies a MOV with the conversion. v2 (Jason Ekstrand): - Avoid the general use of stride=2 for 16-bit register types. v3 (Topi Pohjolainen) - Code style fix (Jason Ekstrand) - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef because conversion to f16 with undefined rounding is explicit Signed-off-by: Eduardo Lima <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type	Alejandro Piñeiro	2017-12-06	1	-3/+0
\| \| \| \| \| \| \|	Note that we don't remove the assert at i965/vec4. At this point half float support is only for the scalar backend. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Support for 16-bit base types in helper functions	Jose Maria Casanova Crespo	2017-12-06	3	-0/+25
\| \| \| \| \| \| \| \| \|	v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand) v3: Fix coding style (Topi Pohjolainen) Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Eduardo Lima <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/vec4: Handle 16-bit types at type_size_xvec4	Alejandro Piñeiro	2017-12-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	These types have similar vec4 sizes as their 32-bit counterparts. The vec4 backend doesn't support 16-bit types and probably never will, but this method is called by the scalar backend at fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4 sizes for 16-bit types. In the future, something different should be implemented to avoid this dependency. Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Add support for the variablePointers feature	Jason Ekstrand	2017-12-05	2	-4/+3
\| \| \| \| \| \| \| \|	Not to be confused with variablePointersStorageBuffer which is the subset of VK_KHR_variable_pointers required to enable the extension. This means we now have "full" support for variable pointers. Reviewed-by: Kristian H. Kristensen <[email protected]>
*	anv: Handle nir_intrinsic_vulkan_resource_reindex	Jason Ekstrand	2017-12-05	1	-1/+27
\| \| \| \|	Reviewed-by: Kristian H. Kristensen <[email protected]>
*	intel/isl: Declare private array as static const	Chad Versace	2017-12-04	1	-1/+1
\| \| \| \| \| \| \| \|	It's array isl_drm.c:modifier_info[] . Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
*	anv: query CS timestamp frequency from the kernel	Lionel Landwerlin	2017-12-04	1	-0/+13
\| \| \| \| \| \| \| \| \|	The reference value in gen_device_info isn't going to be acurate on Gen10+. We should query it from the kernel, which reads a couple of register to compute the actual value. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
*	vulkan/wsi: Initialize individual WSI interfaces in wsi_device_init	Jason Ekstrand	2017-12-04	1	-30/+6
\| \| \| \| \| \| \| \|	Now that we have anv_device_init/finish functions, there's no reason to have the individual driver do any more work than that. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Drop some unneeded cruft from the API	Jason Ekstrand	2017-12-04	1	-18/+1
\| \| \| \| \| \| \| \|	This drops the unneeded callbacks struct as well as the queue_get_family callback we were using before we'd pulled QueuePresent inside. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Add wrappers for all of the surface queries	Jason Ekstrand	2017-12-04	1	-28/+23
\| \| \| \| \| \| \|	This lets us move wsi_interface to wsi_common_private.h Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Drop the can_handle_different_gpu parameter from get_support	Jason Ekstrand	2017-12-04	1	-1/+1
\| \| \| \| \| \| \|	Both anv and radv can handle prime now. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Add a helper for AcquireNextImage	Jason Ekstrand	2017-12-04	1	-8/+11
\| \| \| \| \| \| \| \| \|	Unfortunately, due to the fact that AcquireNextImage does not take a queue, the ANV trick for triggering the fence won't work in general. We leave dealing with the fence up to the caller for now. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: move swapchain create/destroy to common code	Dave Airlie	2017-12-04	1	-30/+5
\| \| \| \| \| \| \| \| \| \|	v2 (Jason Ekstrand): - Rebase - Alter the names of the helpers to better match the vulkan entrypoints - Use the helpers in anv Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Move get_images into common code	Jason Ekstrand	2017-12-04	1	-4/+3
\| \| \| \| \| \| \| \| \|	This moves bits out of all four corners (anv, radv, x11, wayland) and into the wsi common code. We also switch to using an outarray to ensure we get our return code right. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	anv/wsi: Enable prime support	Jason Ekstrand	2017-12-04	1	-1/+1
\| \| \| \| \| \| \| \|	Now that we're using the same common code as radv, we get prime support for free. Just enable it. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	anv/wsi: Use the common QueuePresent code	Jason Ekstrand	2017-12-04	1	-57/+6
\| \| \| \| \|	Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Do image creation in common code	Jason Ekstrand	2017-12-04	1	-121/+1
\| \| \| \| \| \| \| \| \| \| \|	This uses the mock extension created in a previous commit to tell the driver that the image it's just been asked to create is, in fact, a window system image with whatever assumptions that implies. There was a lot of redundant code between the two drivers to do basically exactly the same thing. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Implement prime in a completely generic way	Jason Ekstrand	2017-12-04	1	-2/+12
\| \| \| \| \|	Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	anv/image: Implement the wsi "extension"	Jason Ekstrand	2017-12-04	2	-4/+37
\| \| \| \| \|	Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	anv: Require a dedicated allocation for modified images	Jason Ekstrand	2017-12-04	1	-4/+49
\| \| \| \| \| \| \| \|	This lets us set the BO tiling when we allocate the memory. This is required for GL to work properly. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	anv/image: Add a drm_format_mod field	Jason Ekstrand	2017-12-04	2	-0/+7
\| \| \| \| \| \| \|	At the moment, this is always initialized to DRM_FORMAT_MOD_INVALID. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	anv: Implement VK_EXT_external_memory_dma_buf	Jason Ekstrand	2017-12-04	3	-17/+34
\| \| \| \| \| \| \| \| \|	This is a modified version of the patch originally sent by Chad Versace. The primary difference is that this version claims that OPQAUE_FD and DMA_BUF are compatible handle types. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Add a wsi_device_init function	Jason Ekstrand	2017-12-04	1	-1/+10
\| \| \| \| \| \| \| \|	This gives the opportunity to collect some function pointers if we'd like which will be very useful in future. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: Add a wsi_image structure	Daniel Stone	2017-12-04	1	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is used to hold information about the allocated image, rather than an ever-growing function argument list. v2 (Jason Ekstrand): - Rename wsi_image_base to wsi_image Signed-off-by: Daniel Stone <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	vulkan/wsi: use function ptr definitions from the spec.	Dave Airlie	2017-12-04	1	-1/+2
\| \| \| \| \| \| \|	This just seems cleaner, and we may expand this in future. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	spirv: Convert the supported_extensions struct to spirv_options	Jason Ekstrand	2017-12-02	1	-9/+11
\| \| \| \| \| \| \| \|	This is a bit more general and lets us pass additional options into the spirv_to_nir pass beyond what capabilities we support. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	intel/compiler: Implement WaClearTDRRegBeforeEOTForNonPS.	Rafael Antognolli	2017-12-01	2	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	The bspec describes: "WA: Clear tdr register before send EOT in all non-PS shader kernels mov(8) tdr0:ud 0x0:ud {NoMask}" Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/blorp: Fix possible NULL pointer dereferencing	Vadym Shovkoplias	2017-11-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Fix incomplete check of input params in blorp_surf_convert_to_uncompressed() which can lead to NULL pointer dereferencing. Fixes: 5ae8043fed2 ("intel/blorp: Add an entrypoint for doing bit-for-bit copies") Fixes: f395d0abc83 ("intel/blorp: Internally expose surf_convert_to_uncompressed") Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Andres Gomez <[email protected]>