mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	anv: Implement VK_NV_compute_shader_derivatives	Caio Marcelo de Oliveira Filho	2019-04-08	3	-0/+10
\| \| \| \| \|	Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Use NIR_PASS_V when lowering CS intrinsics	Caio Marcelo de Oliveira Filho	2019-04-08	1	-3/+4
\| \| \| \| \| \| \| \| \|	This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Don't loop when lowering CS intrinsics	Caio Marcelo de Oliveira Filho	2019-04-08	1	-15/+10
\| \| \| \| \| \| \| \| \|	This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After 060817b2 "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Add support for CS to group invocations in quads	Caio Marcelo de Oliveira Filho	2019-04-08	3	-16/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <[email protected]> (v3) Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Use TEX_LOGICAL whenever implicit lod is supported	Caio Marcelo de Oliveira Filho	2019-04-08	1	-2/+6
\| \| \| \| \| \| \|	Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/radv: remove restrictions on opt_if_loop_last_continue()	Timothy Arceri	2019-04-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
*	intel: add dependency on genxml generated files	Lionel Landwerlin	2019-04-08	5	-4/+6
\| \| \| \| \| \| \| \| \| \|	Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
*	anv: implement VK_KHR_swapchain revision 70	Lionel Landwerlin	2019-04-08	3	-3/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
*	intel/compiler: use defined size for vector components	Dave Airlie	2019-04-03	1	-1/+1
\| \| \| \| \| \| \|	If we increase vector sizing later it would be nice to avoid tripped over this again. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	intel: Add support for Comet Lake	Anuj Phogat	2019-04-01	1	-0/+1
\| \| \| \| \|	Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	intel/compiler: Use partial redundancy elimination for compares	Ian Romanick	2019-03-28	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost all of the hurt shaders are repeated instances of the same shader in synmark's compilation speed tests. shader-db results: All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256389 (<.01%) instructions in affected programs: 54137 -> 53686 (-0.83%) helped: 288 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74% 95% mean confidence interval for instructions value: -1.76 -1.38 95% mean confidence interval for instructions %-change: -2.47% -1.50% Instructions are helped. total cycles in shared programs: 372286583 -> 372283851 (<.01%) cycles in affected programs: 833829 -> 831097 (-0.33%) helped: 265 HURT: 16 helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4 helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35% HURT stats (abs) min: 2 max: 130 x̄: 24.88 x̃: 8 HURT stats (rel) min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27% 95% mean confidence interval for cycles value: -12.30 -7.15 95% mean confidence interval for cycles %-change: -1.06% -0.64% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5038653 -> 5038495 (<.01%) instructions in affected programs: 13939 -> 13781 (-1.13%) helped: 50 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4 helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83% 95% mean confidence interval for instructions value: -3.73 -2.47 95% mean confidence interval for instructions %-change: -3.16% -1.21% Instructions are helped. total cycles in shared programs: 128118922 -> 128118228 (<.01%) cycles in affected programs: 134906 -> 134212 (-0.51%) helped: 50 HURT: 0 helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18 helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70% 95% mean confidence interval for cycles value: -16.54 -11.22 95% mean confidence interval for cycles %-change: -0.95% -0.53% Cycles are helped. Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/genxml: Media instructions and structures for gen11	Toni Lönnberg	2019-03-28	1	-24/+3450
\| \| \| \| \| \| \| \| \| \| \|	v2: Lionel Landwerlin <[email protected]> - fix missing type - fix _FQM_/_QM_ commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Media instructions and structures for gen10	Toni Lönnberg	2019-03-28	1	-24/+3284
\| \| \| \| \| \| \| \| \| \| \|	v2: Lionel Landwerlin <[email protected]> - fix missing type - fix _FQM_/_QM_ commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Media instructions and structures for gen9	Toni Lönnberg	2019-03-28	1	-24/+3090
\| \| \| \| \| \| \| \| \| \| \|	v2: Lionel Landwerlin <[email protected]> - fix missing type - fix _FQM_/_QM_ commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Media instructions and structures for gen8	Toni Lönnberg	2019-03-28	1	-0/+1572
\| \| \| \| \| \| \|	v2: Lionel Landwerlin <[email protected]> - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Media instructions and structures for gen7.5	Toni Lönnberg	2019-03-28	1	-1/+1291
\| \| \| \| \| \|	v2: Fixed MI_WAIT_FOR_EVENT to be for video also Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Media instructions and structures for gen7	Toni Lönnberg	2019-03-28	1	-1/+1347
\| \| \| \| \| \|	v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Media instructions and structures for gen6	Toni Lönnberg	2019-03-28	1	-1/+1003
\| \| \| \| \| \|	v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/genxml: Only handle instructions meant for render engine when generating	Toni Lönnberg	2019-03-28	2	-7/+59
\| \| \| \| \| \| \| \| \| \|	headers v2: Fixed the check for engine v3: Changed engine into an argument given to the scripts Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel: Add Elkhart Lake device info	Anuj Phogat	2019-03-27	1	-0/+60
\| \| \| \| \| \| \| \|	V2: Fix L3 bank count (Vivek) Fix simulator_id and num_eu_per_subslice (Lionel) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"	Jason Ekstrand	2019-03-27	1	-2/+0
\| \| \| \| \| \| \| \|	This reverts commit 4e1bbb000cdfe4ba01bee5a6868c54fed7285dae. It turns out that some DXVK apps due to some implementation detail of DXVK or other create and destroy instances in an interleaved way. Freeing the glsl_type memory without being a bit more careful causes use-after-free issues. Looks like we need to try again.
*	intel/fs: Make alpha test work with MRT and sample mask	Danylo Piliaiev	2019-03-25	1	-18/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the order of src0_alpha and sample mask in fb payload. From SKL PRM Volume 7, "Data Payload Register Order for Render Target Write Messages": Type S0A oM sZ oS M2 M3 M4 SIMD8 1 1 0 0 s0A oM R SIMD16 1 1 0 0 1/0s0A 3/2s0A oM It also fixes working of alpha to coverage with sample mask on GEN6 since now they are in correct order. Signed-off-by: Danylo Piliaiev <[email protected]> Signed-off-by: Francisco Jerez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965,iris,anv: Make alpha to coverage work with sample mask	Danylo Piliaiev	2019-03-25	5	-6/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) \| 0x0808 * (m & 2) \| 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <[email protected]> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	compiler/nir: add lowering for 16-bit flrp	Iago Toral Quiroga	2019-03-25	1	-0/+1
\| \| \| \| \| \| \| \| \|	And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler/nir: add lowering option for 16-bit fmod	Iago Toral Quiroga	2019-03-25	1	-0/+1
\| \| \| \| \| \| \| \| \|	And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	android: static link with libexpat with Android O+	Kishore Kadiyala	2019-03-25	2	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	In Android O, MESA needs to statically link libexpat so that it's in same VNDK namespace. v2: apply change also to anv driver (Tapani) v3: use += in anv change (Eric Engestrom) Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33 Signed-off-by: Kishore Kadiyala <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCT	Caio Marcelo de Oliveira Filho	2019-03-23	3	-3/+3
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass	Samuel Pitoiset	2019-03-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: fix build on Nougat	Gurchetan Singh	2019-03-21	4	-6/+22
\| \| \| \| \| \|	AHardwareBuffer is only available on O and above. Reviewed-by: Tapani Pälli <[email protected]>
*	anv: move anv_GetMemoryAndroidHardwareBufferANDROID up a bit	Gurchetan Singh	2019-03-21	1	-28/+28
\| \| \| \| \| \|	No functional change, just makes the next patch a little easier. Reviewed-by: Tapani Pälli <[email protected]>
*	anv/radv: release memory allocated by glsl types during spirv_to_nir	Tapani Pälli	2019-03-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes leaks for each glsl_type generated: ==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18 ==32470== at 0x483880B: malloc (vg_replace_malloc.c:309) ==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119) ==32470== by 0x4C44014: rzalloc_size (ralloc.c:151) ==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215) ==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const, unsigned int, char const) (glsl_types.cpp:114) ==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const, unsigned int, char const) (glsl_types.cpp:1146) ==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501) ==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269) ==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018) ==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365) ==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490) ==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173) v2: move release call to vkDestroyInstance v3: apply fix also to radv driver Signed-off-by: Tapani Pälli <[email protected]> Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
*	anv,radv,turnip: Lower TG4 offsets with nir_lower_tex	Jason Ekstrand	2019-03-21	1	-0/+1
\| \| \| \| \| \|	v2: turn on for turnip as well (Karol Herbst) Reviewed-by: Karol Herbst <[email protected]>
*	intel/blorp: Make swizzle_color_value public.	Rafael Antognolli	2019-03-20	2	-1/+4
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/isl: Add isl_format_has_color_component() function.	Rafael Antognolli	2019-03-20	2	-0/+25
\| \| \| \| \| \|	v2: Get luminance bits from luminance component (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
*	anv: implement VK_EXT_pipeline_creation_feedback	Lionel Landwerlin	2019-03-20	4	-6/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An extension reporting cache hit in the user supplied pipeline cache as well as timing information for creating the pipelines & stages. v2: Don't consider no cache for cache hits (Jason) Rework duration accumulation (Jason) v3: Fold feedback creation writing into pipeline compile functions (Jason/Lionel) v4: Get cache hit information from anv_device_search_for_kernel() (Jason) Only set cache hit from the whole pipeline if all stages also have that bit (Lionel) v5: Always user_cache_hit in anv_device_search_for_kernel() (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Bump maxComputeWorkgroupInvocations	Jason Ekstrand	2019-03-20	1	-1/+1
\| \| \| \| \| \| \| \|	We initially set this lower because we didn't have SIMD32 support yet but we've supported SIMD32 for quite some time now. We should bump it up to the real limit. Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv/icl: Add WA_2204188704 to disable pixel shader panic dispatch	Anuj Phogat	2019-03-19	2	-0/+17
\| \| \| \| \| \|	Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv,radv: Implement VK_KHR_surface_capability_protected	Jason Ekstrand	2019-03-18	1	-0/+1
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	anv: Treat zero size XFB buffer as disabled	Danylo Piliaiev	2019-03-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Vulkan spec doesn't explicitly forbid zero size transform feedback buffers. Having zero size xfb caused SurfaceSize overflow and triggered assert in debug build. The only way to have zero size SO_BUFFER is to disable SO_BUFFER as stated in hardware spec. From SKL PRM, Vol 2a, "3DSTATE_SO_BUFFER": "If set, stream output to SO Buffer is enabled, if 3DSTATE_STREAMOUT::SO Function ENABLE is also enabled. If clear, the SO Buffer is considered "not bound" and effectively treated as a zero- length buffer for the purposes of SO output and overflow detection. If an enabled stream's Stream to Buffer Selects includes this buffer it is by definition an overflow condition. That stream will cause no writes to occur, and only SO_PRIM_STORAGE_NEEDED[<stream>] will increment." Fixes: 36ee2fd61c8 "anv: Implement the basic form of VK_EXT_transform_feedback" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Implement VK_EXT_host_query_reset	Jason Ekstrand	2019-03-18	3	-0/+22
\| \| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	android: Build fixes for OMR1	Tapani Pälli	2019-03-18	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the header file locations are changed between Android versions (when VNDK is used), patch makes sure we get all the required headers. v2: cleanups, put SDK version checks in all places (Tapani) Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Chen Lin Z <[email protected]> Tested-by: Clayton Craft <[email protected]> Acked-by: Eric Engestrom <[email protected]>
*	isl: fix automake build when sse41 is not supported	Tapani Pälli	2019-03-18	1	-4/+7
\| \| \| \| \| \| \| \|	Fixes: 864cc419eb0a41882762 "intel/isl: move tiled_memcpy static libs from i965 to isl" Cc: [email protected] Reported-by: Milav Soni <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	intel/nir: Lower array-deref-of-vector UBO and SSBO loads	Jason Ekstrand	2019-03-15	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a serious performance issue with DXVK: https://github.com/doitsujin/dxvk/issues/937 This was caused by a recent change that to improve performance on RADV which back-fired on ANV and killed performance for some apps: https://github.com/doitsujin/dxvk/commit/e5a06d3f4a103a54cd4eb51970fedee405d1d698 Throwing in this bit of lowering lets us come along and CSE those UBO loads (or copy-prop for SSBO load) and get one load where we previously would have gotten several. VkPipeline-db results on Kaby Lake: total instructions in shared programs: 5115361 -> 5073185 (-0.82%) instructions in affected programs: 1754333 -> 1712157 (-2.40%) helped: 5331 HURT: 63 total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%) cycles in affected programs: 2531058653 -> 2467702029 (-2.50%) helped: 9202 HURT: 4323 total loops in shared programs: 3340 -> 3331 (-0.27%) loops in affected programs: 9 -> 0 helped: 9 HURT: 0 total spills in shared programs: 3246 -> 3053 (-5.95%) spills in affected programs: 384 -> 191 (-50.26%) helped: 10 HURT: 5 total fills in shared programs: 4626 -> 4452 (-3.76%) fills in affected programs: 439 -> 265 (-39.64%) helped: 10 HURT: 5 All of the shaders with hurt spilling were in Rise of the Tomb Raider which also had shaders solidly helped in the spilling department. Not shown in those results (because I've not had success dumping the shaders) is Witcher 3 where this reduces spilling and improves over-all perf by around 20-25%. There were no shader-db changes. Apparently, this just isn't a pattern that happens in OpenGL. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: "19.0" [email protected]
*	i965: Stop setting LowerBuferInterfaceBlocks	Jason Ekstrand	2019-03-15	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Rename nir_address_format_vk_index_offset to not be vk	Jason Ekstrand	2019-03-15	2	-4/+4
\| \| \| \| \| \| \| \|	It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	anv: Only set 3DSTATE_PS::VectorMaskEnable on gen8+	Jason Ekstrand	2019-03-14	1	-1/+1
\| \| \| \| \| \| \|	We don't set it on HSW and earlier in i965 and disabling it appears to make derivatives somewhat more reliable. Acked-by: Kenneth Graunke <[email protected]>
*	i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9	Plamena Manolova	2019-03-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv/pass: Flag the need for a RT flush for resolve attachments	Jason Ekstrand	2019-03-13	1	-1/+17
\| \| \| \| \|	Reviewed-by: Nanley Chery <[email protected]> Cc: [email protected]
*	anv: Stop using VK_TRUE/FALSE	Jason Ekstrand	2019-03-13	1	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \|	We've been fairly inconsistent about this so we should really choose whether we're going to use VK_TRUE/FALSE or the C boolean values. The Vulkan #defines are set to 1 and 0 respectively so it's the same value as C gives you when you cast a boolean expression to an integer. Since there are several places where we set a VkBool32 to a C logical expression, let's just embrace C booleans and stop using the VK defines. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	intel/nir: Combine store_derefs to improve code from SPIR-V	Caio Marcelo de Oliveira Filho	2019-03-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to lack of write mask in SPIR-V store, generators may produce multiple stores to the same vector but using different array derefs. Use the combining store pass to clean this up. For example, layout(binding = 3) buffer block { vec4 v; }; void main() { v.x = 11; v.y = 22; } after going to SPIR-V and NIR, ends up with in two store_derefs to v[0] and v[1] vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block )ssa_2)->field0 / vec2 32 ssa_6 = deref_array &(ssa_4)[0] (ssbo float) / &((block )ssa_2)->field0[0] / intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x / / access=0 / vec1 32 ssa_13 = load_const (0x00000001 / 0.000000 /) vec2 32 ssa_14 = deref_array &(ssa_4)[1] (ssbo float) /* &((block )ssa_2)->field0[1] / intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x / / access=0 / producing two different sends instructions in skl. The combining pass transform the snippet above into vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) / &((block )ssa_2)->field0 / vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17 intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy / / access=0 */ producing a single sends instruction. v2: Move this from spirv_to_nir into the general optimization pass for intel compiler. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>