summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel: add dependency on genxml generated filesLionel Landwerlin2019-04-085-4/+6
| | | | | | | | | | Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
* anv: implement VK_KHR_swapchain revision 70Lionel Landwerlin2019-04-083-3/+103
| | | | | | | | | | | | | | | | | | This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/compiler: use defined size for vector componentsDave Airlie2019-04-031-1/+1
| | | | | | | If we increase vector sizing later it would be nice to avoid tripped over this again. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel: Add support for Comet LakeAnuj Phogat2019-04-011-0/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/compiler: Use partial redundancy elimination for comparesIan Romanick2019-03-281-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Almost all of the hurt shaders are repeated instances of the same shader in synmark's compilation speed tests. shader-db results: All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256389 (<.01%) instructions in affected programs: 54137 -> 53686 (-0.83%) helped: 288 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74% 95% mean confidence interval for instructions value: -1.76 -1.38 95% mean confidence interval for instructions %-change: -2.47% -1.50% Instructions are helped. total cycles in shared programs: 372286583 -> 372283851 (<.01%) cycles in affected programs: 833829 -> 831097 (-0.33%) helped: 265 HURT: 16 helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4 helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35% HURT stats (abs) min: 2 max: 130 x̄: 24.88 x̃: 8 HURT stats (rel) min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27% 95% mean confidence interval for cycles value: -12.30 -7.15 95% mean confidence interval for cycles %-change: -1.06% -0.64% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5038653 -> 5038495 (<.01%) instructions in affected programs: 13939 -> 13781 (-1.13%) helped: 50 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4 helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83% 95% mean confidence interval for instructions value: -3.73 -2.47 95% mean confidence interval for instructions %-change: -3.16% -1.21% Instructions are helped. total cycles in shared programs: 128118922 -> 128118228 (<.01%) cycles in affected programs: 134906 -> 134212 (-0.51%) helped: 50 HURT: 0 helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18 helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70% 95% mean confidence interval for cycles value: -16.54 -11.22 95% mean confidence interval for cycles %-change: -0.95% -0.53% Cycles are helped. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/genxml: Media instructions and structures for gen11Toni Lönnberg2019-03-281-24/+3450
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen10Toni Lönnberg2019-03-281-24/+3284
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen9Toni Lönnberg2019-03-281-24/+3090
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen8Toni Lönnberg2019-03-281-0/+1572
| | | | | | | v2: Lionel Landwerlin <[email protected]> - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen7.5Toni Lönnberg2019-03-281-1/+1291
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen7Toni Lönnberg2019-03-281-1/+1347
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen6Toni Lönnberg2019-03-281-1/+1003
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Only handle instructions meant for render engine when generatingToni Lönnberg2019-03-282-7/+59
| | | | | | | | | | headers v2: Fixed the check for engine v3: Changed engine into an argument given to the scripts Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Add Elkhart Lake device infoAnuj Phogat2019-03-271-0/+60
| | | | | | | | V2: Fix L3 bank count (Vivek) Fix simulator_id and num_eu_per_subslice (Lionel) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"Jason Ekstrand2019-03-271-2/+0
| | | | | | | | This reverts commit 4e1bbb000cdfe4ba01bee5a6868c54fed7285dae. It turns out that some DXVK apps due to some implementation detail of DXVK or other create and destroy instances in an interleaved way. Freeing the glsl_type memory without being a bit more careful causes use-after-free issues. Looks like we need to try again.
* intel/fs: Make alpha test work with MRT and sample maskDanylo Piliaiev2019-03-251-18/+17
| | | | | | | | | | | | | | | | Fix the order of src0_alpha and sample mask in fb payload. From SKL PRM Volume 7, "Data Payload Register Order for Render Target Write Messages": Type S0A oM sZ oS M2 M3 M4 SIMD8 1 1 0 0 s0A oM R SIMD16 1 1 0 0 1/0s0A 3/2s0A oM It also fixes working of alpha to coverage with sample mask on GEN6 since now they are in correct order. Signed-off-by: Danylo Piliaiev <[email protected]> Signed-off-by: Francisco Jerez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965,iris,anv: Make alpha to coverage work with sample maskDanylo Piliaiev2019-03-255-6/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) | 0x0808 * (m & 2) | 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <[email protected]> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* compiler/nir: add lowering for 16-bit flrpIago Toral Quiroga2019-03-251-0/+1
| | | | | | | | | And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add lowering option for 16-bit fmodIago Toral Quiroga2019-03-251-0/+1
| | | | | | | | | And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* android: static link with libexpat with Android O+Kishore Kadiyala2019-03-252-2/+21
| | | | | | | | | | | | | In Android O, MESA needs to statically link libexpat so that it's in same VNDK namespace. v2: apply change also to anv driver (Tapani) v3: use += in anv change (Eric Engestrom) Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33 Signed-off-by: Kishore Kadiyala <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCTCaio Marcelo de Oliveira Filho2019-03-233-3/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* spirv,nir: lower frexp_exp/frexp_sig inside a new NIR passSamuel Pitoiset2019-03-221-0/+2
| | | | | | | | | | This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: fix build on NougatGurchetan Singh2019-03-214-6/+22
| | | | | | AHardwareBuffer is only available on O and above. Reviewed-by: Tapani Pälli <[email protected]>
* anv: move anv_GetMemoryAndroidHardwareBufferANDROID up a bitGurchetan Singh2019-03-211-28/+28
| | | | | | No functional change, just makes the next patch a little easier. Reviewed-by: Tapani Pälli <[email protected]>
* anv/radv: release memory allocated by glsl types during spirv_to_nirTapani Pälli2019-03-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Fixes leaks for each glsl_type generated: ==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18 ==32470== at 0x483880B: malloc (vg_replace_malloc.c:309) ==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119) ==32470== by 0x4C44014: rzalloc_size (ralloc.c:151) ==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215) ==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:114) ==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:1146) ==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501) ==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269) ==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018) ==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365) ==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490) ==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173) v2: move release call to vkDestroyInstance v3: apply fix also to radv driver Signed-off-by: Tapani Pälli <[email protected]> Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
* anv,radv,turnip: Lower TG4 offsets with nir_lower_texJason Ekstrand2019-03-211-0/+1
| | | | | | v2: turn on for turnip as well (Karol Herbst) Reviewed-by: Karol Herbst <[email protected]>
* intel/blorp: Make swizzle_color_value public.Rafael Antognolli2019-03-202-1/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/isl: Add isl_format_has_color_component() function.Rafael Antognolli2019-03-202-0/+25
| | | | | | v2: Get luminance bits from luminance component (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* anv: implement VK_EXT_pipeline_creation_feedbackLionel Landwerlin2019-03-204-6/+94
| | | | | | | | | | | | | | | | | | An extension reporting cache hit in the user supplied pipeline cache as well as timing information for creating the pipelines & stages. v2: Don't consider no cache for cache hits (Jason) Rework duration accumulation (Jason) v3: Fold feedback creation writing into pipeline compile functions (Jason/Lionel) v4: Get cache hit information from anv_device_search_for_kernel() (Jason) Only set cache hit from the whole pipeline if all stages also have that bit (Lionel) v5: Always user_cache_hit in anv_device_search_for_kernel() (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Bump maxComputeWorkgroupInvocationsJason Ekstrand2019-03-201-1/+1
| | | | | | | | We initially set this lower because we didn't have SIMD32 support yet but we've supported SIMD32 for quite some time now. We should bump it up to the real limit. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/icl: Add WA_2204188704 to disable pixel shader panic dispatchAnuj Phogat2019-03-192-0/+17
| | | | | | Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv,radv: Implement VK_KHR_surface_capability_protectedJason Ekstrand2019-03-181-0/+1
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv: Treat zero size XFB buffer as disabledDanylo Piliaiev2019-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Vulkan spec doesn't explicitly forbid zero size transform feedback buffers. Having zero size xfb caused SurfaceSize overflow and triggered assert in debug build. The only way to have zero size SO_BUFFER is to disable SO_BUFFER as stated in hardware spec. From SKL PRM, Vol 2a, "3DSTATE_SO_BUFFER": "If set, stream output to SO Buffer is enabled, if 3DSTATE_STREAMOUT::SO Function ENABLE is also enabled. If clear, the SO Buffer is considered "not bound" and effectively treated as a zero- length buffer for the purposes of SO output and overflow detection. If an enabled stream's Stream to Buffer Selects includes this buffer it is by definition an overflow condition. That stream will cause no writes to occur, and only SO_PRIM_STORAGE_NEEDED[<stream>] will increment." Fixes: 36ee2fd61c8 "anv: Implement the basic form of VK_EXT_transform_feedback" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Implement VK_EXT_host_query_resetJason Ekstrand2019-03-183-0/+22
| | | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* android: Build fixes for OMR1Tapani Pälli2019-03-181-0/+20
| | | | | | | | | | | | | Some of the header file locations are changed between Android versions (when VNDK is used), patch makes sure we get all the required headers. v2: cleanups, put SDK version checks in all places (Tapani) Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Chen Lin Z <[email protected]> Tested-by: Clayton Craft <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* isl: fix automake build when sse41 is not supportedTapani Pälli2019-03-181-4/+7
| | | | | | | | Fixes: 864cc419eb0a41882762 "intel/isl: move tiled_memcpy static libs from i965 to isl" Cc: [email protected] Reported-by: Milav Soni <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/nir: Lower array-deref-of-vector UBO and SSBO loadsJason Ekstrand2019-03-151-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a serious performance issue with DXVK: https://github.com/doitsujin/dxvk/issues/937 This was caused by a recent change that to improve performance on RADV which back-fired on ANV and killed performance for some apps: https://github.com/doitsujin/dxvk/commit/e5a06d3f4a103a54cd4eb51970fedee405d1d698 Throwing in this bit of lowering lets us come along and CSE those UBO loads (or copy-prop for SSBO load) and get one load where we previously would have gotten several. VkPipeline-db results on Kaby Lake: total instructions in shared programs: 5115361 -> 5073185 (-0.82%) instructions in affected programs: 1754333 -> 1712157 (-2.40%) helped: 5331 HURT: 63 total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%) cycles in affected programs: 2531058653 -> 2467702029 (-2.50%) helped: 9202 HURT: 4323 total loops in shared programs: 3340 -> 3331 (-0.27%) loops in affected programs: 9 -> 0 helped: 9 HURT: 0 total spills in shared programs: 3246 -> 3053 (-5.95%) spills in affected programs: 384 -> 191 (-50.26%) helped: 10 HURT: 5 total fills in shared programs: 4626 -> 4452 (-3.76%) fills in affected programs: 439 -> 265 (-39.64%) helped: 10 HURT: 5 All of the shaders with hurt spilling were in Rise of the Tomb Raider which also had shaders solidly helped in the spilling department. Not shown in those results (because I've not had success dumping the shaders) is Witcher 3 where this reduces spilling and improves over-all perf by around 20-25%. There were no shader-db changes. Apparently, this just isn't a pattern that happens in OpenGL. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: "19.0" [email protected]
* i965: Stop setting LowerBuferInterfaceBlocksJason Ekstrand2019-03-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Rename nir_address_format_vk_index_offset to not be vkJason Ekstrand2019-03-152-4/+4
| | | | | | | | It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Only set 3DSTATE_PS::VectorMaskEnable on gen8+Jason Ekstrand2019-03-141-1/+1
| | | | | | | We don't set it on HSW and earlier in i965 and disabling it appears to make derivatives somewhat more reliable. Acked-by: Kenneth Graunke <[email protected]>
* i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9Plamena Manolova2019-03-141-0/+1
| | | | | | | | | | | ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/pass: Flag the need for a RT flush for resolve attachmentsJason Ekstrand2019-03-131-1/+17
| | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: [email protected]
* anv: Stop using VK_TRUE/FALSEJason Ekstrand2019-03-131-21/+21
| | | | | | | | | | | | We've been fairly inconsistent about this so we should really choose whether we're going to use VK_TRUE/FALSE or the C boolean values. The Vulkan #defines are set to 1 and 0 respectively so it's the same value as C gives you when you cast a boolean expression to an integer. Since there are several places where we set a VkBool32 to a C logical expression, let's just embrace C booleans and stop using the VK defines. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/nir: Combine store_derefs to improve code from SPIR-VCaio Marcelo de Oliveira Filho2019-03-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to lack of write mask in SPIR-V store, generators may produce multiple stores to the same vector but using different array derefs. Use the combining store pass to clean this up. For example, layout(binding = 3) buffer block { vec4 v; }; void main() { v.x = 11; v.y = 22; } after going to SPIR-V and NIR, ends up with in two store_derefs to v[0] and v[1] vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */ vec2 32 ssa_6 = deref_array &(*ssa_4)[0] (ssbo float) /* &((block *)ssa_2)->field0[0] */ intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x */ /* access=0 */ vec1 32 ssa_13 = load_const (0x00000001 /* 0.000000 */) vec2 32 ssa_14 = deref_array &(*ssa_4)[1] (ssbo float) /* &((block *)ssa_2)->field0[1] */ intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x */ /* access=0 */ producing two different sends instructions in skl. The combining pass transform the snippet above into vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */ vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17 intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy */ /* access=0 */ producing a single sends instruction. v2: Move this from spirv_to_nir into the general optimization pass for intel compiler. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir: Combine store_derefs after vectorizing IOCaio Marcelo de Oliveira Filho2019-03-131-0/+1
| | | | | | | | | | | | | | | | | | Shader-db results for skl: total instructions in shared programs: 15232903 -> 15224781 (-0.05%) instructions in affected programs: 61246 -> 53124 (-13.26%) helped: 221 HURT: 0 total cycles in shared programs: 371440470 -> 371398018 (-0.01%) cycles in affected programs: 281363 -> 238911 (-15.09%) helped: 221 HURT: 0 Results for bdw are very similar. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a pass to combine store_derefs to same vectorCaio Marcelo de Oliveira Filho2019-03-131-0/+1
| | | | | | | | | v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Fix opt_peephole_csel to not throw away saturates.Kenneth Graunke2019-03-121-0/+1
| | | | | | | | | | | | | | | | We were not copying the saturate bit from the original instruction to the new replacement instruction. This caused major misrendering in DiRT Rally on iris, where comparisons leading to discards failed due to the missing saturate, causing lots of extra garbage pixels to be drawn in text rendering, trees, and so on. This did not show up on i965 because st/nir performs a more aggressive version of nir_opt_peephole_select, yielding more b32csel operations. Fixes: 52c7df1643e i965/fs: Merge CMP and SEL into CSEL on Gen8+ Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* anv: Ignore VkRenderPassInputAttachementAspectCreateInfoJason Ekstrand2019-03-121-0/+4
| | | | | | | | We don't care about the information but there's no sense in throwing a debug warning about it. It's harmless but annoying to users. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 Reviewed-by: Sagar Ghuge <[email protected]>
* anv: Fix destroying descriptor sets when pool gets resetDanylo Piliaiev2019-03-121-6/+5
| | | | | | | | | | pool->next and pool->free_list were reset before their usage in anv_descriptor_pool_free_set Fixes: 775aabdd "anv: destroy descriptor sets when pool gets reset" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/nir: Vectorize all IOJason Ekstrand2019-03-121-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IO scalarization pass that we run to help with linking end up turning some shader I/O such as that for tessellation and geometry shaders into many scalar URB operations rather than one vector one. To alleviate this, we now vectorize the I/O once again. This fixes a 10% performance regression in the GfxBench tessellation test that was caused by scalarizing. Shader-db results on Kaby Lake: total instructions in shared programs: 15224023 -> 15220871 (-0.02%) instructions in affected programs: 342009 -> 338857 (-0.92%) helped: 1236 HURT: 443 total spills in shared programs: 23471 -> 23465 (-0.03%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 31770 -> 31766 (-0.01%) fills in affected programs: 4 -> 0 helped: 1 HURT: 0 Cycles was just a lot of churn do to moves being different places. Most of the pure churn in instructions was +/- one or two instructions in fragment shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early" Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies" Reviewed-by: Connor Abbott <[email protected]>