summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* glsl/nir: add support for lowering bindless images_derefsKarol Herbst2019-04-121-1/+1
| | | | | | | | | | | v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v3) Reviewed-by: Marek Olšák <[email protected]>
* nir/i965/freedreno/vc4: add a bindless bool to type size functionsTimothy Arceri2019-04-124-25/+30
| | | | | | | This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <[email protected]>
* nir: move brw_nir_rewrite_image_intrinsic into common codeKarol Herbst2019-04-122-42/+1
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* anv: store heap address bounds when initializing physical deviceLionel Landwerlin2019-04-112-11/+21
| | | | | | | | | | | | | | | | | | | | We can then reuse those bounds to initialize the VMA heaps at logical device creation. This fixes an issue on EHL which has only 36bits of VMA. We were incorrectly using the fixed 48bits upper bound to initialize the logical device heap, resulting in addresses beyong the device's limits. v2: Don't confuse heap size (limited by system memory) and VMA size (limited by number of addressing bits the platform has) v3: Fix low heap vma_size :( (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: James Xiong <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* intel/common: Support bigger right-shifts with mi_builderJason Ekstrand2019-04-112-3/+20
| | | | Because why not?
* anv/cmd_buffer: Use gen_mi_sub instead of gen_mi_add with a negativeJason Ekstrand2019-04-111-1/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Move mi_memcpy and mi_memset to gen_mi_builderJason Ekstrand2019-04-116-91/+106
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use gen_mi_builder for queriesJason Ekstrand2019-04-111-214/+58
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use gen_mi_builder for conditional renderingJason Ekstrand2019-04-112-70/+41
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use gen_mi_builder for indirect dispatchJason Ekstrand2019-04-111-16/+13
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use gen_mi_builder for indirect draw parametersJason Ekstrand2019-04-111-65/+16
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use gen_mi_builder for computing resolve predicatesJason Ekstrand2019-04-111-93/+35
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use gen_mi_builder for CmdDrawIndirectByteCountJason Ekstrand2019-04-111-102/+22
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/common: Add unit tests for gen_mi_builderJason Ekstrand2019-04-112-0/+661
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/common: Add a MI command builderJason Ekstrand2019-04-111-0/+691
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Remove redundant definitions of INTEL_DEBUGMark Janes2019-04-102-4/+0
| | | | | | INTEL_DEBUG is declared extern and defined in gen_debug.c Reviewed-by: Kenneth Graunke <[email protected]>
* intel/common: move gen_debug to intel/devMark Janes2019-04-1022-22/+22
| | | | | | | | | libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: don't use default pipeline cache for hits for ↵Lionel Landwerlin2019-04-101-1/+1
| | | | | | | | | | | | | VK_EXT_pipeline_creation_feedback If the user didn't provide a pipeline cache and we're using the default internal pipeline cache, then we shouldn't consider a cache hit for VK_EXT_pipeline_creation_feedback as the application did not provide a cache. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 6601e5d6fc68cd ("anv: implement VK_EXT_pipeline_creation_feedback") Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: sort xml files using new scriptLionel Landwerlin2019-04-0910-21155/+21105
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* genxml: add a sorting scriptLionel Landwerlin2019-04-093-0/+203
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: advertise 8 subtexel/mipmap precision bitsJuan A. Suarez Romero2019-04-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far ANV was advertising 4 bits for both subTexelPrecisionBits and mipmapPrecisionBits. But these values were not actually verified. But it seems the right value is actually 8 bits for both cases. Unfortunately Intel PRM does not clarify how many bits the hardware use. For the mipmap case, there is the following reference in PRM Volume 6 (3D Media GPGPU), specifically in LOD Computation Pseudocode: ``` Bias: S4.8 MinLod: U4.8 MaxLod: U4.8 Base: U4.1 MIPCnt: U4 SurfMinLod: U4.8 ResMinLod: U4.8 `` We have other clues, though: - On one side, dEQP-VK.texture.explicit_lod.* tests fail when using 4 bits, but work when using 8 bits. These tests try to mimic the expected behaviour as much real as possible, and they use the reported subTexelPrecisionBits and mipmapPrecisionBits reported to get this. - On the other side, the equivalent driver for Windows is reporting 8 bits for both elements. Not sure if they got to verify it from the PRM or from a diffent source. CC: Jason Ekstrand <[email protected]> CC: Lionel Landwerlin <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Implement VK_NV_compute_shader_derivativesCaio Marcelo de Oliveira Filho2019-04-083-0/+10
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Use NIR_PASS_V when lowering CS intrinsicsCaio Marcelo de Oliveira Filho2019-04-081-3/+4
| | | | | | | | | This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Don't loop when lowering CS intrinsicsCaio Marcelo de Oliveira Filho2019-04-081-15/+10
| | | | | | | | | This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After 060817b2 "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Add support for CS to group invocations in quadsCaio Marcelo de Oliveira Filho2019-04-083-16/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <[email protected]> (v3) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Use TEX_LOGICAL whenever implicit lod is supportedCaio Marcelo de Oliveira Filho2019-04-081-2/+6
| | | | | | | Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/radv: remove restrictions on opt_if_loop_last_continue()Timothy Arceri2019-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <[email protected]> (v1) Acked-by: Samuel Pitoiset <[email protected]>
* intel: add dependency on genxml generated filesLionel Landwerlin2019-04-085-4/+6
| | | | | | | | | | Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Cc: [email protected]
* anv: implement VK_KHR_swapchain revision 70Lionel Landwerlin2019-04-083-3/+103
| | | | | | | | | | | | | | | | | | This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/compiler: use defined size for vector componentsDave Airlie2019-04-031-1/+1
| | | | | | | If we increase vector sizing later it would be nice to avoid tripped over this again. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel: Add support for Comet LakeAnuj Phogat2019-04-011-0/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel/compiler: Use partial redundancy elimination for comparesIan Romanick2019-03-281-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Almost all of the hurt shaders are repeated instances of the same shader in synmark's compilation speed tests. shader-db results: All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256389 (<.01%) instructions in affected programs: 54137 -> 53686 (-0.83%) helped: 288 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74% 95% mean confidence interval for instructions value: -1.76 -1.38 95% mean confidence interval for instructions %-change: -2.47% -1.50% Instructions are helped. total cycles in shared programs: 372286583 -> 372283851 (<.01%) cycles in affected programs: 833829 -> 831097 (-0.33%) helped: 265 HURT: 16 helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4 helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35% HURT stats (abs) min: 2 max: 130 x̄: 24.88 x̃: 8 HURT stats (rel) min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27% 95% mean confidence interval for cycles value: -12.30 -7.15 95% mean confidence interval for cycles %-change: -1.06% -0.64% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5038653 -> 5038495 (<.01%) instructions in affected programs: 13939 -> 13781 (-1.13%) helped: 50 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4 helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83% 95% mean confidence interval for instructions value: -3.73 -2.47 95% mean confidence interval for instructions %-change: -3.16% -1.21% Instructions are helped. total cycles in shared programs: 128118922 -> 128118228 (<.01%) cycles in affected programs: 134906 -> 134212 (-0.51%) helped: 50 HURT: 0 helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18 helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70% 95% mean confidence interval for cycles value: -16.54 -11.22 95% mean confidence interval for cycles %-change: -0.95% -0.53% Cycles are helped. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/genxml: Media instructions and structures for gen11Toni Lönnberg2019-03-281-24/+3450
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen10Toni Lönnberg2019-03-281-24/+3284
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen9Toni Lönnberg2019-03-281-24/+3090
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen8Toni Lönnberg2019-03-281-0/+1572
| | | | | | | v2: Lionel Landwerlin <[email protected]> - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen7.5Toni Lönnberg2019-03-281-1/+1291
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen7Toni Lönnberg2019-03-281-1/+1347
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen6Toni Lönnberg2019-03-281-1/+1003
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Only handle instructions meant for render engine when generatingToni Lönnberg2019-03-282-7/+59
| | | | | | | | | | headers v2: Fixed the check for engine v3: Changed engine into an argument given to the scripts Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Add Elkhart Lake device infoAnuj Phogat2019-03-271-0/+60
| | | | | | | | V2: Fix L3 bank count (Vivek) Fix simulator_id and num_eu_per_subslice (Lionel) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"Jason Ekstrand2019-03-271-2/+0
| | | | | | | | This reverts commit 4e1bbb000cdfe4ba01bee5a6868c54fed7285dae. It turns out that some DXVK apps due to some implementation detail of DXVK or other create and destroy instances in an interleaved way. Freeing the glsl_type memory without being a bit more careful causes use-after-free issues. Looks like we need to try again.
* intel/fs: Make alpha test work with MRT and sample maskDanylo Piliaiev2019-03-251-18/+17
| | | | | | | | | | | | | | | | Fix the order of src0_alpha and sample mask in fb payload. From SKL PRM Volume 7, "Data Payload Register Order for Render Target Write Messages": Type S0A oM sZ oS M2 M3 M4 SIMD8 1 1 0 0 s0A oM R SIMD16 1 1 0 0 1/0s0A 3/2s0A oM It also fixes working of alpha to coverage with sample mask on GEN6 since now they are in correct order. Signed-off-by: Danylo Piliaiev <[email protected]> Signed-off-by: Francisco Jerez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965,iris,anv: Make alpha to coverage work with sample maskDanylo Piliaiev2019-03-255-6/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) | 0x0808 * (m & 2) | 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <[email protected]> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* compiler/nir: add lowering for 16-bit flrpIago Toral Quiroga2019-03-251-0/+1
| | | | | | | | | And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add lowering option for 16-bit fmodIago Toral Quiroga2019-03-251-0/+1
| | | | | | | | | And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* android: static link with libexpat with Android O+Kishore Kadiyala2019-03-252-2/+21
| | | | | | | | | | | | | In Android O, MESA needs to statically link libexpat so that it's in same VNDK namespace. v2: apply change also to anv driver (Tapani) v3: use += in anv change (Eric Engestrom) Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33 Signed-off-by: Kishore Kadiyala <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCTCaio Marcelo de Oliveira Filho2019-03-233-3/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* spirv,nir: lower frexp_exp/frexp_sig inside a new NIR passSamuel Pitoiset2019-03-221-0/+2
| | | | | | | | | | This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: fix build on NougatGurchetan Singh2019-03-214-6/+22
| | | | | | AHardwareBuffer is only available on O and above. Reviewed-by: Tapani Pälli <[email protected]>