summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: Require vertex buffers to come from a 32-bit heapJason Ekstrand2017-06-031-0/+12
| | | | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 39adea9330376a64a4b5e8da98f5e055ebd3331e) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Advertise both 32-bit and 48-bit heaps when we have enough memoryJason Ekstrand2017-06-021-6/+36
| | | | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 50d0eb5096bd9514821a641f25c0b3455c0f8a88) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Refactor memory type setupJason Ekstrand2017-06-021-36/+40
| | | | | | | | | | This makes us walk over the heaps one at a time and add the types for LLC and !LLC to each heap. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 34581fdd4f149894dfa51777a2f7eb289bd08b71) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Make supports_48bit_addresses a heap propertyJason Ekstrand2017-06-022-3/+14
| | | | | | | | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit b83b1af6f6936f36db42a8f8b8e0854d0f9491fd) [Juan A. Suarez: resolve trivial conflicts] Signed-off-by: Juan A. Suarez Romero <[email protected]> Conflicts: src/intel/vulkan/anv_device.c
* anv: Stop setting BO flags in bo_init_newJason Ekstrand2017-06-023-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | The idea behind doing this was to make it easier to set various flags. However, we have enough custom flag settings floating around the driver that this is more of a nuisance than a help. This commit has the following functional changes: 1) The workaround_bo created in anv_CreateDevice loses both flags. This shouldn't matter because it's very small and entirely internal to the driver. 2) The bo created in anv_CreateDmaBufImageINTEL loses the EXEC_OBJECT_ASYNC flag. In retrospect, it never should have gotten EXEC_OBJECT_ASYNC in the first place. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 00df1cd9d6234cdfc9fb2bf3615196ff83a3c956) [Juan A. Suarez: resolve trivial conflicts] Signed-off-by: Juan A. Suarez Romero <[email protected]> Conflicts: src/intel/vulkan/anv_allocator.c src/intel/vulkan/anv_device.c src/intel/vulkan/anv_queue.c
* anv: Add valid_bufer_usage to the memory type metadataJason Ekstrand2017-06-022-8/+26
| | | | | | | | | | | | | | | | Instead of returning valid types as just a number, we now walk the list and check the buffer's usage against the usage flags we store in the new anv_memory_type structure. Currently, valid_buffer_usage == ~0. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit f7736ccf53eaeb66c4270afe0916e2cb29ab8667) [Juan A. Suarez: resolve trivial conflicts] Signed-off-by: Juan A. Suarez Romero <[email protected]> Conflicts: src/intel/vulkan/anv_device.c src/intel/vulkan/anv_private.h
* anv: Determine the type of mapping based on type metadataJason Ekstrand2017-06-022-7/+7
| | | | | | | | | | | | | | | | Before, we were just comparing the type index to 0. Now we actually look the type up in the table and check its properties to determine what kind of mapping we want to do. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 92325a7efc769c32e03031323e21700dc55171e4) [Juan A. Suarez: resolve trivial conflicts] Signed-off-by: Juan A. Suarez Romero <[email protected]> Conflicts: src/intel/vulkan/anv_device.c src/intel/vulkan/anv_private.h
* anv: Set EXEC_OBJECT_ASYNC when availableJason Ekstrand2017-06-028-4/+26
| | | | | | | | | | | | | | | | | | | Reviewed-by: Chad Versace <[email protected]> (cherry picked from commit 35e626bd0e59e7ce9fd97ccef66b2468c09206a4) Signed-off-by: Juan A. Suarez Romero <[email protected]> Squashed with: anv/tests: Create a dummy instance as well as device This fixes crashes caused by 35e626bd0e59e7ce9fd97ccef66b2468c09206a4 which made us start referencing the instance in the allocators. With this commit, the tests now happily pass again. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877 Tested-by: Vinson Lee <[email protected]> (cherry picked from commit 6ef1bd4fa57b36efc7919773fd26c36fd43d2ea9) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: automake: list shared libraries after the static onesEmil Velikov2017-06-011-16/+15
| | | | | | | | | | | | The compiler can discard the shared ones from the link chain, since there is no user (the static libraries) before it on the command line. Cc: [email protected] Reported-by: Laurent Carlier <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> (cherry picked from commit 3e8790bff096a1a56bd1a3046c556a7f93b68ca8) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Set image memory types based on the type countJason Ekstrand2017-06-011-2/+4
| | | | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 10fad58b31ee2354330152ca4072327d228fc2e7) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Set up memory types and heaps during physical device initJason Ekstrand2017-06-012-44/+81
| | | | | | | | | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit c1f4343807d1040bd7b5440aa2f5fccf5f12842d) [Juan A. Suarez: resolve trivial conflicts] Signed-off-by: Juan A. Suarez Romero <[email protected]> Conflicts: src/intel/vulkan/anv_device.c src/intel/vulkan/anv_private.h
* anv: Predicate 48bit support on gen >= 8Jason Ekstrand2017-05-311-1/+6
| | | | | | | | | | | This doesn't matter right now since it only affects whether or not we set the kernel bit but, if we ever do anything else based on it, we'll want it to be correct per-gen. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit eceaf7e2340fca0079300692733206b2af555bd9) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hackJason Ekstrand2017-05-311-28/+0
| | | | | | | | | | | | | | | Up until now, we've been memsetting the auxiliary surface to 0 at BindImageMemory time to ensure that it is properly initialized. However, this isn't correct because apps are allowed to freely alias memory between different images and buffers so long as they properly track whether or not a particular image is valid and, if it isn't, transition from UNINITIALIZED to something else before using it. We now implement those transitions so we can drop the hack. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 4eecd534f0544b62ae831a97708ade007541bd32) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Handle transitioning depth from UNDEFINED to other layoutsJason Ekstrand2017-05-312-19/+19
| | | | | | | Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit cc45c4bb8072b6593812f9b68a7b3d2d00bfb9f0) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv: Handle color layout transitions from the UNINITIALIZED layoutJason Ekstrand2017-05-313-2/+108
| | | | | | | | | | This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start failing due to test bugs. See CL 1031 for a test fix. Reviewed-by: Nanley Chery <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit 75edecf5020a9b833ff7e2929f64ceb11c9df679) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* configure: check once for DRI3 dependenciesEmil Velikov2017-05-311-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we are having the XCB_DRI3 dependencies duplicated, partially. Just do a once-off check and add all of the respective CFLAGS/LIBS where needed. As a nice side effect this helps us solve a couple of FIXMEs. DRI3 is not a thing w/o X11 so disable it in such cases. Cc: [email protected] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (cherry picked from commit acf3d2afab0571b74c0c0d1aee0f631b33fdc7da) Signed-off-by: Juan A. Suarez Romero <[email protected]> squashed with: configure.ac: add xcb-fixes to the XCB DRI3 list The XCB module is used by the VL targets. Thus omitting it can lead to link-time errors due to unresolved symbols. Other DRI3 users such as the Vulkan WSI and the dri3 loader helper do not use an update region in their xcb_present_pixmap() call. We will look into that at a later stage. Fixes: acf3d2afab0 ("configure: check once for DRI3 dependencies") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101110 Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 9a90d6a9d4ee1632aa357a2ac9be150e058e2c10) Signed-off-by: Juan A. Suarez Romero <[email protected]> squashed with: configure.ac: s/xcb-fixes/xcb-xfixes/ Former is not a thing, even if I have a hacked xcb-fixes.pc on my system. Thanks for spotting it Mark! Fixes: 9a90d6a9d4e ("configure.ac: add xcb-fixes to the XCB DRI3 list") Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 48cd1919ff1584c211ec7958864cac2e1cb347cf) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv/formats: Update the three-channel BC1 mappingsNanley Chery2017-05-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The procedure for decompressing an opaque BC1 Vulkan format is dependant on the comparison of two colors stored in the first 32 bits of the compressed block. Here's the specified OpenGL (and Vulkan) behavior for reference: The RGB color for a texel at location (x,y) in the block is given by: RGB0, if color0 > color1 and code(x,y) == 0 RGB1, if color0 > color1 and code(x,y) == 1 (2*RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2 (RGB0+2*RGB1)/3, if color0 > color1 and code(x,y) == 3 RGB0, if color0 <= color1 and code(x,y) == 0 RGB1, if color0 <= color1 and code(x,y) == 1 (RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2 BLACK, if color0 <= color1 and code(x,y) == 3 The sampling operation performed on an opaque DXT1 Intel format essentially hard-codes the comparison result of the two colors as color0 > color1. This means that the behavior is incompatible with OpenGL and Vulkan. This is stated in the SKL PRM, Vol 5: Memory Views: Opaque Textures (DXT1_RGB) Texture format DXT1_RGB is identical to DXT1, with the exception that the One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and the resulting texel color is derived strictly from the Opaque Color Encoding. The alpha channel defaults to 1.0. Programming Note Context: Opaque Textures (DXT1_RGB) The behavior of this format is not compliant with the OGL spec. The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in exactly the same way except the BLACK value must have a transparent alpha channel in the latter. Use the four-channel BC1 Intel formats with the alpha set to 1 to provide the behavior required by the spec. v2 (Kenneth Graunke): - Provide a more detailed commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925 Cc: <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]> (cherry picked from commit 56458cb168bf79ae51ba1efc3acec15874cc34a9)
* Android: correct libz dependencyChih-Wei Huang2017-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | Commit 6facb0c0 ("android: fix libz dynamic library dependencies") unconditionally adds libz as a dependency to all shared libraries. That is unnecessary. Commit 85a9b1b5 introduced libz as a dependency to libmesa_util. So only the shared libraries that use libmesa_util need libz. Fix Android Lollipop build by adding the include path of zlib to libmesa_util explicitly instead of getting the path implicitly from zlib since it doesn't export the include path in Lollipop. Fixes: 6facb0c0 "android: fix libz dynamic library dependencies" Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Rob Herring <[email protected]> (cherry picked from commit bfc0c23843008fd510afa263ebe371bef3346445)
* anv: don't leak DRM devicesGrazvydas Ignotas2017-05-181-0/+1
| | | | | | | | | | After successful drmGetDevices2() call, drmFreeDevices() needs to be called. Fixes: b1fb6e8d "anv: do not open random render node(s)" Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> # radv version (cherry picked from commit 0ef302638f2883789a3b39c2b6cfd20814efa0bb)
* anv: fix possible stack corruptionGrazvydas Ignotas2017-05-181-1/+1
| | | | | | | | | | | drmGetDevices2 takes count and not size. Probably hasn't caused problems yet in practice and was missed as setups with more than 8 DRM devices are not very common. Fixes: b1fb6e8d "anv: do not open random render node(s)" Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> (cherry picked from commit e0aee8b667955675e2e6c647a88048b64bc2796e)
* i965/vec4: load dvec3/4 uniforms first in the push constant bufferSamuel Iglesias Gonsálvez2017-05-181-27/+80
| | | | | | | | | | | | | | | | | | | | | | | | | Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]> (cherry picked from commit e69e5c7006da80af62c9ef08dec215b3b4b30946)
* i965/vec4: fix swizzle and writemask when loading an uniform with constant ↵Samuel Iglesias Gonsálvez2017-05-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 8aa6ada8384a961b37dfefec7f9e40e5a4e27ce7)
* i965/vec4/gs: restore the uniform values which was overwritten by failed ↵Samuel Iglesias Gonsálvez2017-05-181-0/+26
| | | | | | | | | | | | | | | | | | vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]> (cherry picked from commit 354f7f2cb9c7206e12646c79d8ff5becbaffa61b)
* intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4Pohjolainen, Topi2017-05-181-23/+5
| | | | | | | | | | | | | | | | | | | | | | The reasoning Chad gave in the comment for choosing a valign of 4 is entirely bunk. The fact that you have to multiply pitch by 2 is completely unrelated to the halign/valign parameters used for texture layout. (Not completely unrelated. W-tiling is just Y-tiling with a bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled chunks so it makes everything easier if miplevels are always aligned to 8x8.) The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you can't do stencil texturing anyway. v2 (Jason Ekstrand): - Delete most of Chad's comment and add a more descriptive commit message. Signed-off-by: Topi Pohjolainen <[email protected]> Cc: "17.0 17.1" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]> (cherry picked from commit 236f17a9f73935db6cddafd91e53a5fae34aae6e)
* anv: anv_gem_mmap() returns MAP_FAILED as mapping errorSamuel Iglesias Gonsálvez2017-05-082-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take it into account when checking if the mapping failed. v2: - Remove map == NULL and its related comment (Emil) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Fixes: 6f3e3c715a7 ("vk/allocator: Add a BO pool") Fixes: 9919a2d34de ("anv/image: Memset hiz surfaces to 0 when binding memory") Cc: "17.0 17.1" <[email protected]> (cherry picked from commit b546c9d318731b988aa3d8c4e4735cdbb596cfbf) Squashed with: anv: vkBindImageMemory() should return VK_ERROR_OUT_OF_{HOST,DEVICE}_MEMORY on failure According to the spec we get VK_ERROR_OUT_OF_HOST_MEMORY or VK_ERROR_OUT_OF_DEVICE_MEMORY on vkBindImageMemory failure. Fixes returned value changed by b546c9d. Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error") Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.0 17.1" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit 939b015736d5091faeabde4f5a373e6a1612c5ed) Squashed with: anv: fix anv_gem_mmap comment to not mention NULL The function cannot return NULL, update the comment accordingly. Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (cherry picked from commit 9d2aa6e5067752efbc0acbd728bc0bde49aefb61)
* i965/vec4: don't modify regioning parameters to the sources of DF align1 ↵Samuel Iglesias Gonsálvez2017-05-051-8/+1
| | | | | | | | | | | | | | instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit f57e234fdd52331d0aa6656a36efdebea9d11e9d)
* i965/vec4: fix register width for DF VGRF and UNIFORMSamuel Iglesias Gonsálvez2017-05-051-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit aaeb1c99beed39d85c300ebdb8a7bf056ee6717c)
* i965/vec4: fix vertical stride to avoid breaking region parameter ruleSamuel Iglesias Gonsálvez2017-05-051-18/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 7f728bce811fc283e672e3a07b008bb7b52de35e)
* anv/cmd_buffer: Use the device allocator for QueueSubmitJason Ekstrand2017-04-301-3/+3
| | | | | | | | | The command is really operating on a Queue not a command buffer and the nearest object to that with an allocator is VkDevice. Reviewed-by: Chad Versace <[email protected]> Cc: "17.0 17.1" <[email protected]> (cherry picked from commit bd3a9813b92bd2e116b58f0932bc7f1f722a9f63)
* anv: Don't place scratch buffers above the 32-bit boundaryJason Ekstrand2017-04-301-0/+19
| | | | | | | | | | | | | This fixes rendering corruptions in DOOM. Hopefully, it will also make Jenkins a bit more stable as we've been seeing some random failures and GPU hangs ever since turning on 48bit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620 Fixes: 651ec926fc1 "anv: Add support for 48-bit addresses" Tested-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.1" <[email protected]> (cherry picked from commit c43b4bc85eddba8bc31665cfee5928bed8343516)
* intel/fs: Take into account amount of data read in spilling cost heuristic.Francisco Jerez2017-04-301-1/+1
| | | | | | | | | | | | | | | | | | | | Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit 147e71242ce539ff28e282f009c332818c35f5ac with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 58324389be7bc7c5e10093b9cc0a8efa9b4c93a9)
* intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.Francisco Jerez2017-04-301-2/+1
| | | | | | | | | | | This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit ecc19e12dca95d2571d3761dea6dec24b061013c)
* i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().Kenneth Graunke2017-04-301-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 2faf227ec2e22c7a37e0a54783a3f0a0062ac852) Squashed with commit: i965/vec4: Use reads_accumulator_implicitly(), not MACH checks. Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 6b10c37b9c3a73add73f444fe1aee73c9ec82c94)
* anv/cmd_buffer: Disable CCS on BDW input attachmentsNanley Chery2017-04-242-30/+13
| | | | | | | | | | | | | | | | | | The description under RENDER_SURFACE_STATE::RedClearColor says, For Sampling Engine Multisampled Surfaces and Render Targets: Specifies the clear value for the red channel. For Other Surfaces: This field is ignored. This means that the sampler on BDW doesn't support CCS. Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Jordan Justen <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]> (cherry picked from commit d9d793696bf54e970491302605a1efd0aa182d1b)
* anv: blorp: flush memory after copyLionel Landwerlin2017-04-241-2/+2
| | | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit d71efbe5f2a0ff934b8e9eeb96cd680a83bc0259)
* intel/decoder: Fix is_header_field starting condition.Kenneth Graunke2017-04-161-1/+1
| | | | | | | | | | | Starting positions >= 32 are not part of the header, rather than >. Caught by Coverity, which found that "bits <<= field->start" may shift by 32, which has undefined behavior. CID: 1404968 Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add the pci_id into the shader cache UUIDJason Ekstrand2017-04-141-5/+15
| | | | | | | | | | This prevents a user from using a cache created on one hardware generation on a different one. Of course, with Intel hardware, this requires moving their drive from one machine to another but it's still possible and we should prevent it. Reviewed-by: Chad Versace <[email protected]> Cc: [email protected]
* i965: Use correct VertStride on align16 instructions.Matt Turner2017-04-141-10/+34
| | | | | | | | | | | | | | | | | | | | | | In commit c35fa7a, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/dce: improve track of partial flag register writesSamuel Iglesias Gonsálvez2017-04-141-1/+1
| | | | | | | | | | | | | | | This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: don't do horizontal stride on some register file typesSamuel Iglesias Gonsálvez2017-04-141-2/+5
| | | | | | | | | | | | | horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.Matt Turner2017-04-141-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: use vec4_builder to emit instructions in setup_imm_df()Samuel Iglesias Gonsálvez2017-04-142-50/+50
| | | | | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: consider subregister offset in live variablesJuan A. Suarez Romero2017-04-141-2/+2
| | | | | | | | | | | | | | | | | | Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: fix assert to detect SIMD lowered DF instructions in IVBFrancisco Jerez2017-04-141-5/+1
| | | | | | | | | | | On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's typeSamuel Iglesias Gonsálvez2017-04-147-27/+60
| | | | | | | | | | This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: split d2x conversion and data gathering from one opcode to two ↵Samuel Iglesias Gonsálvez2017-04-142-8/+1
| | | | | | | | | | | | | | | | explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYTJuan A. Suarez Romero2017-04-141-7/+19
| | | | | | | | | | | | In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: keep original type when dealing with null registersJuan A. Suarez Romero2017-04-141-0/+2
| | | | | | | | | | | | | | | | Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: split DF instructions and later double its execsize in IVB/BYTSamuel Iglesias Gonsálvez2017-04-143-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYTSamuel Iglesias Gonsálvez2017-04-141-0/+9
| | | | | | | | | The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>