mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel/fs: Implement GRF bank conflict mitigation pass.	Francisco Jerez	2017-12-07	5	-0/+898
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <[email protected]>
*	meson: Fix building gallium media targets with gallium-xlib glx	Dylan Baker	2017-12-07	1	-3/+3
\| \| \| \| \| \| \| \|	To demonstrate this bug run meson with the options: -Ddri-drivers= -Dglx=gallium-xlib Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	meson: Add lmsensors to gallium libgl-xlib target.	Dylan Baker	2017-12-07	1	-1/+3
\| \| \| \| \| \|	Fixes: 5e71efef44b992b5d70b ("meson: Add lmsensors support") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	meson: add dep_thread to every lib that includes threads.h	Eric Engestrom	2017-12-07	9	-8/+9
\| \| \| \| \| \|	Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
*	meson: fix pl111 dependency on vc4	Eric Engestrom	2017-12-07	3	-5/+9
\| \| \| \| \| \| \| \|	src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create': pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly' Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
*	radv: use a faster version for nir_op_pack_half_2x16	Samuel Pitoiset	2017-12-07	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is ported from RadeonSI and it has two effects. It fixes a rendering issue which affects F1 2017 and Dawn of War 3 (Vega only) because LLVM was ending up by generating the new v_mad_mix_{hi,lo} instructions which appear to be buggy in some way. Not sure if Mesa is generating something wrong or if the issue is in LLVM only. Anyway, that explains why the DOW3 issue can't be reproduced with GL on Vega. It also improves performance because v_cvt_pkrtz_f16 is faster, and because I guess the rounding mode behaviour is similar between GL and VK, we can use it. About performance, it improves Talos by +3/4% but I don't see any other impacts. No CTS regressions on Polaris. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	mesa/spirv: move and rename nir_spirv_supported_capabilities	Alejandro Piñeiro	2017-12-07	3	-15/+15
\| \| \| \| \| \| \| \| \| \|	To avoid any vulkan driver to include the GL mtypes.h. Renamed as eventually this could be used by drivers not using nir. v2: remove compiler/spirv/spirv.h from mtypes (Alejandro) v3: added the definition at compiler/shader_info.h (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
*	util/disk_cache: Remove unneeded free() on always null string	Vadym Shovkoplias	2017-12-07	1	-1/+0
\| \| \| \| \| \| \| \| \|	At this point dc_job->cache_item_metadata.keys always equals NULL, so call to free() is useless Fixes: b86ecea3446 ("util/disk_cache: write cache item metadata to disk") Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	spirv: fix bug when OpSpecConstantOp calls a conversion	Samuel Iglesias Gonsálvez	2017-12-07	1	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In that case, nir_eval_const_opcode() will evaluate the conversion but as it was using destination's bit_size, the resulting value was just a cast of the source constant value. By passing the source's bit size, it does the conversion properly. Fixes: dEQP-VK.spirv_assembly.instruction..opspecconstantop.convert* v2: - Remove invalid conversion op cases. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	spirv: allow specialization constants with bitsize different than 32 bits	Samuel Iglesias Gonsálvez	2017-12-07	1	-1/+0
\| \| \| \| \|	Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	nir/opcodes: Fix constant-folding of bitfield_insert	James Legg	2017-12-07	1	-2/+2
\| \| \| \| \| \| \|	Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119 CC: <[email protected]> CC: Samuel Pitoiset <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	radv: Add LLVM version to the device name string	Alex Smith	2017-12-07	2	-26/+37
\| \| \| \| \| \| \| \| \| \|	Allows apps to determine the LLVM version so that they can decide whether or not to enable workarounds for LLVM issues. Signed-off-by: Alex Smith <[email protected]> Cc: "17.2 17.3" <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	mesa: remove set_entry from forward type declarations	Alejandro Piñeiro	2017-12-07	1	-1/+0
\| \| \| \| \| \|	This type was used at gl_sync_object, but it is not used anymore. Reviewed-by: Timothy Arceri <[email protected]>
*	meta: Fix ClearTexture with GL_DEPTH_COMPONENT.	Kenneth Graunke	2017-12-06	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	We only handled unpacking for GL_DEPTH_STENCIL formats. Cemu was hitting _mesa_problem() for an unsupported format in _mesa_unpack_float_32_uint_24_8_depth_stencil_row(), because the format was depth-only, rather than depth-stencil. Cc: "13.0 12.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94739 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103966 Reviewed-by: Tapani Pälli <[email protected]>
*	meta: Initialize depth/clear values on declaration.	Kenneth Graunke	2017-12-06	1	-5/+2
\| \| \| \| \| \| \|	This helps avoid compiler warningss in the next commit - everything was initialized, but it wasn't obvious to static analysis. Suggested-by: Tapani Pälli <[email protected]>
*	glsl: get correct member type when processing xfb ifc arrays	Timothy Arceri	2017-12-07	1	-2/+4
\| \| \| \| \| \| \| \| \|	This fixes a crash in: KHR-GL45.enhanced_layouts.xfb_block_stride Fixes: 0822517936d4 "glsl: add helper to process xfb qualifiers during linking" Reviewed-by: Kenneth Graunke <[email protected]>
*	r600/sb: do not convert if-blocks that contain indirect array access	Gert Wollny	2017-12-07	3	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an array is accessed within an if block, then currently it is not known whether the value in the address register is involved in the evaluation of the if condition, and converting the if condition may actually result in out-of-bounds array access. Consequently, if blocks that contain indirect array access should not be converted. Fixes piglits on r600/BARTS: spec/glsl-1.10/execution/variable-indexing/ vs-output-array-float-index-wr vs-output-array-vec3-index-wr vs-output-array-vec4-index-wr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143 Signed-off-by: Gert Wollny <[email protected]> Cc: <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	r600: add support for compute grid/block sizes. (v2)	Dave Airlie	2017-12-06	4	-3/+100
\| \| \| \| \| \| \| \| \| \|	We just pass these in from outside in a constant buffer. The shader side stores them once they are accessed once. v2: fix to not use a temp_reg. Signed-off-by: Dave Airlie <[email protected]>
*	r600: handle image/buffer sizes correctly.	Dave Airlie	2017-12-06	3	-4/+21
\| \| \| \| \| \|	This adds support to compute for the resq workarounds (buffer/cube sizes) Signed-off-by: Dave Airlie <[email protected]>
*	r600/compute: add support for emitting compute image/buffer atoms	Dave Airlie	2017-12-06	1	-1/+9
\| \| \| \|	Signed-off-by: Dave Airlie <[email protected]>
*	r600/compute: handle atomic counters in compute state.	Dave Airlie	2017-12-06	1	-0/+9
\| \| \| \|	Signed-off-by: Dave Airlie <[email protected]>
*	r600/compute: add support for TGSI compute shaders. (v1.1)	Dave Airlie	2017-12-06	2	-28/+103
\| \| \| \| \| \| \| \| \| \| \|	This add paths to handle TGSI compute shaders and shader selection. It also avoids emitting certain things on tgsi paths, CBs, vertex buffers, config reg init (not required). v1.1: fix rat mask calc Signed-off-by: Dave Airlie <[email protected]>
*	r600/shader: add compute support to shader assembler	Dave Airlie	2017-12-06	1	-0/+14
\| \| \| \|	Signed-off-by: Dave Airlie <[email protected]>
*	r600/texture: drop lowering 1d/2d images to linear.	Dave Airlie	2017-12-06	1	-8/+0
\| \| \| \| \| \| \|	This appears to cause hangs with compute images. Unless we can find more specifics, just don't do this for now. Signed-off-by: Dave Airlie <[email protected]>
*	mesa: define nir_spirv_supported_capabilities	Alejandro Piñeiro	2017-12-06	2	-13/+15
\| \| \| \| \| \| \| \| \|	Until now it was part of spirv_to_nir_options. But it will be used on the implementation of ARB_gl_spirv and ARB_spirv_extensions, and added to the OpenGL context, as a way to save what SPIR-V capabilities the current OpenGL implementation supports. Reviewed-by: Ian Romanick <[email protected]>
*	anv: fix a case statement in GetMemoryFdPropertiesKHR	Fredrik Höglund	2017-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: ab18e8e59b6 ("anv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	radv: fix a case statement in GetMemoryFdPropertiesKHR	Fredrik Höglund	2017-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The handle type in the case statement is supposed to be VK_EXTERNAL_- MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. Fixes: 546e747867c ("radv: Implement VK_EXT_external_memory_dma_buf") Signed-off-by: Fredrik Höglund <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	meson: fix keyword argument in declare_dependency()	Eric Engestrom	2017-12-06	2	-2/+2
\| \| \| \| \| \| \| \| \|	`declare_dependency()` takes `compile_args`, not `c_args`. It was correct in all the other `declare_dependency()` from that commit. Fixes: 0bbecc5a8548883f76a71 "meson: define driver dependencies" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
*	i965: include brw_pipe_control.h in the tarball	Emil Velikov	2017-12-06	1	-0/+1
\| \| \| \| \| \|	Fixes: bfe0f3a7027 ("i965: Move PIPE_CONTROL defines and prototypes to brw_pipe_control.h.") Signed-off-by: Emil Velikov <[email protected]>
*	mesa: document _mesa_extension_override_* variables	Emil Velikov	2017-12-06	1	-0/+9
\| \| \| \| \| \| \| \| \|	Currently there are no users of these outside of extensions.c. Provide some information why they exist and how to use them. Cc: Jordan Justen <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
*	docs: annotate MESA_program_debug as obsolete	Emil Velikov	2017-12-06	1	-1/+1
\| \| \| \| \| \| \|	It has been obsolete for years - state it explicitly. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	swr/scons: Fix another intermittent build failure	George Kyriazis	2017-12-06	1	-0/+1
\| \| \| \| \| \| \|	gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp. Fix missing dependency. Reviewed-by: Bruce Cherniak <[email protected]>
*	radeonsi: make const and stream uploaders allocate read-only memory	Marek Olšák	2017-12-06	1	-2/+5
\| \| \| \| \| \| \|	and anything that clones these uploaders, like u_threaded_context. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use a separate allocator for fine fences	Marek Olšák	2017-12-06	3	-1/+9
\| \| \| \| \|	Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: make shader binaries use read-only memory	Marek Olšák	2017-12-06	5	-3/+13
\| \| \| \| \|	Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	winsys/amdgpu: make IBs use read-only memory	Marek Olšák	2017-12-06	1	-0/+1
\| \| \| \| \|	Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: print the buffer list for CHECK_VM	Marek Olšák	2017-12-06	1	-0/+1
\| \| \| \| \|	Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: allow DMABUF exports for local buffers	Marek Olšák	2017-12-06	1	-1/+4
\| \| \| \| \| \|	Cc: 17.3 <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: always place sparse buffers in VRAM	Nicolai Hähnle	2017-12-06	2	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Together with "radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check", this ensures that sparse buffers are placed in VRAM. Noticed by an assertion that started triggering with commit d4fac1e1d7 ("gallium/radeon: enable suballocations for VRAM with no CPU access") Fixes KHR-GL45.sparse_buffer_tests.BufferStorageTest in debug builds. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check	Nicolai Hähnle	2017-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The flag is on the pipe_resource, not the r600_resource. I don't see an obvious bug related to this, but it could potentially lead to suboptimal placement of some resources. Fixes: a41587433c4d ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE") Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	i965/fs: Use untyped_surface_read for 16-bit load_ssbo	Jose Maria Casanova Crespo	2017-12-06	1	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_surface_read message to read pairs of 16-bit components using only one message. Once each pair is read it is unshuffled to return the proper 16-bit components. vec3 case is assimilated to vec4 but the 4th component is ignored. 16-bit scalars are read using one byte_scattered_read message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using unshuffle 16 reads (Chema Casanova) v3: Use W and D types insead of HF and F in shuffle to avoid rounding erros (Jason Ekstrand) Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand) v4: Use subscript insead of chaging type and stride (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg	Jose Maria Casanova Crespo	2017-12-06	1	-15/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we use byte-scattered write messages for storing 16-bit into an SSBO. This is because untyped surface messages have a fixed 32-bit size. This patch optimizes these 16-bit writes by combining 2 values (e.g, two consecutive components aligned with 32-bits) into a 32-bit register, packing the two 16-bit words. 16-bit single component values will continue to use byte-scattered write messages. The same will happens when the first consecutive component is not aligned 32-bits. This optimization reduces the number of SEND messages used for storing 16-bit values potentially by 2 or 4, which cuts down execution time significantly because byte-scattered writes are an expensive operation as they only write a component for message. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Rework optimization using shuffle 16 write and enable writes of 16bit vec4 with only one message of 32-bits. (Chema Casanova) v3: - Fix coding style (Eduardo Lima) - Reorganize code to avoid duplication. (Jason Ekstrand) - Include new comments to explain the length calculations to fix alignment issues of components. (Jason Ekstrand) - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand) v4: (Jason Ekstrand) - Reorganize 64-bit ssbo-writes to avoid using slots_per_component. - Comment about why suffle is needed when using byte_scattered_write. Signed-off-by: Eduardo Lima <[email protected]> Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage for SSBO/UBO	Alejandro Piñeiro	2017-12-06	3	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enables SPV_KHR_16bit_storage on gen 8+. VK_KHR_16bit_storage is enabled for SSBO/UBO using the VK_KHR_get_physical_device_properties2 functionality to expose if the extension is supported or not. v2: update due rebase against master (Alejandro) v3: (Jason Ekstrand) - Move this patch up in VK_KHR_16bit_storage series enabling only storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess. - Only expose VK_KHR_16bit_storage on Gen8+ v4: (Jason Ekstrand) - Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage enablement for SSBO/UBO. Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Enables 16-bit load_ubo with sampler	Jason Ekstrand	2017-12-06	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit surface format defined. So when reading 16-bit components with the sampler we need to unshuffle two 16-bit components from each 32-bit component. Using the sampler avoids the use of the byte_scattered_read message that needs one message for each component and is supposed to be slower. v2: (Jason Ekstrand) - Simplify component selection and unshuffling for different bitsizes - Remove SKL optimization of reading only two 32-bit components when reading 16-bits types. Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
*	i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components	Jose Maria Casanova Crespo	2017-12-06	2	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \|	This helpers are used to load/store 16-bit types from/to 32-bit components. The functions shuffle_32bit_load_result_to_16bit_data and shuffle_16bit_data_for_32bit_write are implemented in a similar way than the analogous functions for handling 64-bit types. v1: Explain need of temporary in shuffle operations. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Use byte scattered read for 16-bit load_ssbo	Jose Maria Casanova Crespo	2017-12-06	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Used to enable 16-bit reads at do_untyped_vector_read, that is used on the following intrinsics: * nir_intrinsic_load_shared * nir_intrinsic_load_ssbo v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: - Add bitsize to scattered read operation (Jason Ekstrand) - Remove implementation of 16-bit UBO read from this patch. - Avoid assertion at opt_algebraic caused by ADD of two IMM with offset with BRW_REGISTER_TYPE_UD type found on matrix tests. (Jose Maria Casanova) v4: (Jason Ekstrand) - Put if case for 16-bits at the beginning of the if ladder. - Use type_sz(dest.type) * 8 as bit_size parameter for scattered read. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Add byte scattered read message and fs support	Jose Maria Casanova Crespo	2017-12-06	9	-1/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Fix alignment style (Topi Pohjolainen) (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_read. v3: (Jason Ekstrand) - Use renamed brw_byte_scattered_data_element_from_bit_size method - Assert scattered read for Gen8+ and Haswell. - Use conditional expresion at components_read. - Include comment about params for scattered opcodes. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Predicate byte scattered writes if needed	Alejandro Piñeiro	2017-12-06	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While on Untyped Surface messages the bits of the execution mask are ANDed with the corresponding bits of the Pixel/Sample Mask, that is not the case for byte scattered writes. That is needed to avoid ssbo stores writing on helper invocations. So when that can affect, we load the sample mask, and predicate the send message. Note: the need for this patch was tested with a custom test. Right now the 16 bit storage CTS tests doesnt need this path in order to get a full pass. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Use byte_scattered_write on 16-bit store_ssbo	Alejandro Piñeiro	2017-12-06	1	-20/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to rely on byte scattered writes as untyped writes are 32-bit size. We could try to keep using 32-bit messages when we have two or four 16-bit elements, but for simplicity sake, we use the same message for any component number. We revisit this aproach in the follwing patches. v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: (Jason Ekstrand) - Include bit_size to scattered write message and remove namespace - specific for scattered messages. - Move comment to proper place. - Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo. (Jose Maria Casanova) - Take into account that get_nir_src returns now WORD types for 16-bit sources instead of DWORD. v4: (Jason Ekstrand) - Rename lenght variable to num_components. - Include assertions before emit_untyped_write. - Remove type_slot in favor of num_slot and first_slot. Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Add byte scattered write message and fs support	Jose Maria Casanova Crespo	2017-12-06	9	-0/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_write. v3: - Remove leftover newline (Topi Pohjolainen) - Rename brw_data_size to brw_scattered_data_element and use defines instead of an enum (Jason Ekstrand) - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>