mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel/blorp: Disable sampler state prefetching on Gen11	Kenneth Graunke	2019-06-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sampler state prefetching is broken on Gen11, and WA_160668216 says to disable it. Apparently sampler state prefetching also has basically zero impact on performance, so we don't need to worry there. i965, anv, and iris already handle this correctly, but we missed BLORP. Ideally the kernel should globally disable this by writing SARCHKMD, at which point we wouldn't have to worry about it. But let's be defensive and handle it ourselves too. v2: separate out from BTP workaround in case we change that eventually Reviewed-by: Anuj Phogat <[email protected]> [v1]
*	anv/descriptor_set: Only write texture swizzles if we have an image view	Jason Ekstrand	2019-06-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	When immutable samplers are set we call write_image_view with a NULL image view. This causes issues on IVB where we have to fake texture swizzling. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110999 Fixes: d2aa65eb18 "anv: Emulate texture swizzle in the shader when..."
*	intel/compiler: silence a warning of using different enum type	Tapani Pälli	2019-06-25	1	-1/+1
\| \| \| \| \|	Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	anv: Add HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED in vk_format	Nataraj Deshpande	2019-06-24	3	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED is used, then the platform gralloc module will select a format based on the usage flags provided by the camera device and the other endpoint of the stream. The patch fixes crash in vulkan when the test is run with camera stream set to HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED. Test: android.graphics.cts.CameraVulkanGpuTest#testCameraImportAndRendering on chromebook with camera HAL3. v2: use AHARDWAREBUFFER_FORMAT_IMPLEMENTATION_DEFINED and take AHARDWAREBUFFER_USAGE_CAMERA_MASK in to account (Gurchetan) Fixes: f1654fa7e31 "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <[email protected]> Signed-off-by: Gurchetan Singh <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	anv: Implement "pop-free" clipping	Jason Ekstrand	2019-06-21	2	-4/+86
\| \| \| \| \| \| \| \| \| \|	This is the preferred clipping mode since it doesn't mean your points disappear the moment part of the point crosses over the edge of the viewport and that lines have weird endpoints at viewport edges. We've just never bothered to hook it up until now. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv: Enable the guardband clip test	Jason Ekstrand	2019-06-21	2	-3/+21
\| \| \| \| \| \| \| \| \| \|	In workloads where there is a lot of geometry drawn that crosses over the edge of the viewport, this should substantially improve clipper performance. Not really sure why it's taken 3 years to turn it on but we never got around to it. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	i965,iris: Move guardband calculations to a common location	Jason Ekstrand	2019-06-21	3	-0/+119
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	iris: Implement INTEL_DEBUG=pc for pipe control logging.	Kenneth Graunke	2019-06-20	2	-0/+2
\| \| \| \| \| \| \| \|	This prints a log of every PIPE_CONTROL flush we emit, noting which bits were set, and also the reason for the flush. That way we can see which are caused by hardware workarounds, render-to-texture, buffer updates, and so on. It should make it easier to determine whether we're doing too many flushes and why.
*	anv: only resort to sync fds internally with no syncobj support	Lionel Landwerlin	2019-06-20	2	-8/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can rely on only one kind of synchronization object (drm-syncobj) when it is available. This reduces the number of file descriptors we use in our implementation. This will be required later for timeline semaphores implementation, at this point we won't ever want to use anything else but syncobjs. v2: Only use has_syncobj for semaphores (Jason) v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	isl: tag unreachable path as such	Eric Engestrom	2019-06-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC should be able to figure out that all the possible enum values are exhausted in the switch() and all the branches return from the function, but apparently it doesn't, so let's tell the compiler explicitly. This gets rid of the following warnings in GCC 9: [1/24] Compiling C object 'src/intel/isl/60d23f8@@isl@sta/isl.c.o'. ../src/intel/isl/isl.c: In function ‘isl_surf_init_s’: ../src/intel/isl/isl.c:1569:10: warning: ‘array_pitch_el_rows’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1569 \| surf = (struct isl_surf) { \| ~~~~~~^~~~~~~~~~~~~~~~~~~~~ 1570 \| .dim = info->dim, \| ~~~~~~~~~~~~~~~~~ 1571 \| .dim_layout = dim_layout, \| ~~~~~~~~~~~~~~~~~~~~~~~~~ 1572 \| .msaa_layout = msaa_layout, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1573 \| .tiling = tiling, \| ~~~~~~~~~~~~~~~~~ 1574 \| .format = info->format, \| ~~~~~~~~~~~~~~~~~~~~~~~ 1575 \| \| 1576 \| .levels = info->levels, \| ~~~~~~~~~~~~~~~~~~~~~~~ 1577 \| .samples = info->samples, \| ~~~~~~~~~~~~~~~~~~~~~~~~~ 1578 \| \| 1579 \| .image_alignment_el = image_align_el, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1580 \| .logical_level0_px = logical_level0_px, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1581 \| .phys_level0_sa = phys_level0_sa, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1582 \| \| 1583 \| .size_B = size_B, \| ~~~~~~~~~~~~~~~~~ 1584 \| .alignment_B = base_alignment_B, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1585 \| .row_pitch_B = row_pitch_B, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1586 \| .array_pitch_el_rows = array_pitch_el_rows, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1587 \| .array_pitch_span = array_pitch_span, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1588 \| \| 1589 \| .usage = info->usage, \| ~~~~~~~~~~~~~~~~~~~~~ 1590 \| }; \| ~ ../src/intel/isl/isl.c:1488:24: warning: ‘((void )&phys_total_el+4)’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1488 \| struct isl_extent2d phys_total_el; \| ^~~~~~~~~~~~~ ../src/intel/isl/isl.c:1335:38: warning: ‘phys_total_el’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1335 \| isl_align_div(phys_total_el->w tile_el_scale, \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ ../src/intel/isl/isl.c:1488:24: note: ‘phys_total_el’ was declared here 1488 \| struct isl_extent2d phys_total_el; \| ^~~~~~~~~~~~~ Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Fix vulkan build in meson.	Bas Nieuwenhuizen	2019-06-19	1	-1/+7
\| \| \| \| \| \| \|	Apparently the android part was never ported to meson. CC: <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	anv/image: Set different usage flags for shadow surfaces	Jason Ekstrand	2019-06-19	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \|	For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter because the flags were already more-or-less what we wanted. However, for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT so we were getting W-tiled which isn't what we want for the shadow. By passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now get something that's actually texturable. Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
*	anv: Flush caches in anv_image_copy_to_shadow	Jason Ekstrand	2019-06-19	1	-0/+13
\| \| \| \| \| \| \| \| \|	Copies to a shadow image happen during a VkCmdPipelineBarrier or at subpass transitions. We could potentially be a bit more conservative but these transitions shouldn't happen often and it's better to have our bases covered. Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
*	anv: Fix wrong printf formatter	Kenneth Graunke	2019-06-19	1	-1/+1
\| \| \| \|	%lu is for unsigned long, %zu is for size_t. Just cast the data.
*	anv: write spirv-nir logs back to the application	Lionel Landwerlin	2019-06-19	1	-0/+35
\| \| \| \| \| \| \|	Using the existing VK_EXT_debug_report extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: Make border colors the right size and alignment on HSW	Jason Ekstrand	2019-06-18	2	-12/+48
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	anv: Set STATE_BASE_ADDRESS upper bounds on gen7	Jason Ekstrand	2019-06-17	1	-0/+17
\| \| \| \| \| \| \| \| \|	This should fix floating-point border color on all gen7 HW. Integer is still thoroughly busted on gen7 because it doesn't exist on IVB and it's crazy on HSW. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]>
*	anv:Use VK_EXT_separate_stencil_usage to avoid stencil shadows on gen7	Jason Ekstrand	2019-06-17	4	-2/+16
\| \| \| \| \| \| \| \| \| \|	Whenever stencil texturing is not required (most of the time), we can use VK_EXT_separate_stencil_usage to only create the shadow image when VK_IMAGE_USAGE_SAMPLED_BIT is required for stencil. Of course, this depends on applications to use the extension but hopefully DXVK and similar translators are doing so and that covers most of the apps. Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv: Add stencil texturing support for gen7	Jason Ekstrand	2019-06-17	3	-7/+96
\| \| \| \| \| \| \| \| \| \| \| \|	Intel hardware didn't get support for sampling from W-tiled (required for stencil) images until Broadwell so we can't directly sample from stencil. Instead, if we want to support stencil texturing on gen7 hardware, we have to keep a texture-capable shadow copy around and use BLORP to update when stencil changes. The one thing this commit does not implement is self-dependencies with stencil input attachments. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493 Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv/blorp: Update shadow images when clearing or uploading	Jason Ekstrand	2019-06-17	1	-11/+104
\| \| \| \|	Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv/cmd_buffer: Add a stencil transition helper	Jason Ekstrand	2019-06-17	1	-35/+75
\| \| \| \|	Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv/blorp: Take an aspect in anv_image_copy_to_shadow	Jason Ekstrand	2019-06-17	3	-3/+4
\| \| \| \|	Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv/formats: Re-arrange the way se set some flag bits	Jason Ekstrand	2019-06-17	1	-6/+5
\| \| \| \|	Reviewed-by: Lionel Landwerlin <[email protected]>
*	u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros	Nicolai Hähnle	2019-06-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main motivation for this change is API ergonomics: most operations on dynarrays are really on elements, not on bytes, so it's weird to have grow and resize as the odd operations out. The secondary motivation is memory safety. Users of the old byte-oriented functions would often multiply a number of elements with the element size, which could overflow, and checking for overflow is tedious. With this change, we only need to implement the overflow checks once. The checks are cheap: since eltsize is a compile-time constant and the functions should be inlined, they only add a single comparison and an unlikely branch. v2: - ensure operations are no-op when allocation fails - in util_dynarray_clone, call resize_bytes with a compile-time constant element size v3: - fix iris, lima, panfrost Reviewed-by: Marek Olšák <[email protected]>
*	anv: do not parse genxml data without INTEL_DEBUG=bat	Lionel Landwerlin	2019-06-12	1	-10/+13
\| \| \| \| \| \| \| \|	This significantly slows down the CTS runs. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 32ffd90002b04b ("anv: add support for INTEL_DEBUG=bat") Reviewed-by: Jordan Justen <[email protected]>
*	intel/dump: fix segfault when the app hasn't accessed the device	Lionel Landwerlin	2019-06-12	1	-3/+5
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7	Ville Syrjälä	2019-06-11	3	-140/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modern DXVK requires event support [1], but looks like it only uses vkCmdSetEvent() + vkGetEventStatus(). So we can just borrow the relevant code from gen8, leaving CmdWaitEvents still unimplemented. [1] https://github.com/doitsujin/dxvk/commit/8c3900c533d83d12c970b905183d17a1d3e8df1f v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason) Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Mark source 0 of bcsel as needing Boolean resolve	Ian Romanick	2019-06-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	The other sources of the bcsel behave like the sources of an and or other logical operation. However, source zero behaves differently. It is evaluated as a Boolean, so it needs to be resolved. No shader-db changes, but the tests mentioned in the bug get a couple instructions added back. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857 Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	anv: ignore inline uniform blocks in anv_CmdPushDescriptorSetKHR()	Samuel Iglesias Gonsálvez	2019-06-11	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the Vulkan spec, inline uniform blocks are not allowed to be updated through vkCmdPushDescriptorSetKHR(). These are the spec quotes from "13.2.1. Descriptor Set Layout" that are relevant for this case: "VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies that descriptor sets must not be allocated using this layout, and descriptors are instead pushed by vkCmdPushDescriptorSetKHR." "If flags contains VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all elements of pBindings must not have a descriptorType of VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT". There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid this case but it is implied in the creation of the descriptor set layout as aforementioned. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/gpu_dump: fix argument passing	Lionel Landwerlin	2019-06-09	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	We were dropping "/' around arguments grouped together. This was triggering failures with : $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	intel/blorp: Only double the fast-clear rect alignment on HSW	Jason Ekstrand	2019-06-07	1	-10/+15
\| \| \| \| \| \| \| \| \|	This restriction was accidentally added to the BSpec/PRM as an unrestricted restriction starting with the HSW docs and it was never removed. However, it only ever applied to HSW and actually potentially causes problems on BDW and above where we have mipmapped fast-clears. Reviewed-by: Nanley Chery <[email protected]>
*	anv/cmd_buffer: Initalize the clear color struct for CNL+	Nanley Chery	2019-06-07	1	-13/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On CNL+, the clear color struct is composed of RGBA channel values and fields which are either reserved by the HW or used to control fast-clears. Currently anv initializes the channel values to zero and allows the other fields to be undefined. Satisfy the MBZ field requirements by removing an optimization that doesn't hold true for CNL+ and pulling in the number of dwords to initialize from ISL. Cc: <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	isl: Mark enum isl_channel_select packed so it becomes 1 byte.	Kenneth Graunke	2019-06-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I recently discovered that the following code lead to valgrind errors: struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY; VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle)); which is surprising, because struct isl_swizzle is simply: struct isl_swizzle { enum isl_channel_select r:4; enum isl_channel_select g:4; enum isl_channel_select b:4; enum isl_channel_select a:4; }; and the above code initializes all of them with a C99 initializer. Iván Briano reminded me that C99 initializers don't necessarily zero padding. A quick inspection revealed that sizeof(struct isl_swizzle) was 4 (rather than the expected 2). Ian Romanick suggested changing it to uint16_t, since this is essentially dicing up an unsigned, and that worked. This patch marks enum isl_channel_select packed, changing its size from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes, with no bogus padding fields. This eliminates valgrind undefined memory warnings. These isl_swizzle values become part of our BLORP blit program keys, which are then hashed. This undefined padding was being included in the hashing, possibly leading to issues. I originally saw this error when running KHR-GL45.texture_size_promotion.functional in iris under valgrind. Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op	Gurchetan Singh	2019-06-06	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined flexible YUV format. Most of the times, it's NV12 or YV12. On Intel, NV12 is preferred since it can be used by the display engine. This API adds a dependency between gralloc and buffer consumers, unfortunately. Right now, the code seems to work for i915 gralloc, but not cros_gralloc. Add a preprocessor flag to fix this. TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering Reviewed-by: Tapani Pälli <[email protected]>
*	anv: Fix check for isl_fmt in assert	Nataraj Deshpande	2019-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Checking isl_fmt returned value in assert seems appropriate instead of format variable. Fixes: f1654fa7e31 "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]>
*	intel/compiler: Treat b32csel as potentially producing a Boolean result for ↵	Ian Romanick	2019-06-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	resolve analysis If the 2nd and 3rd source are both Boolean values, we can potentially avoid a resolve by only resolving the result of the b32csel. No changes on any Gen6+ Intel platform. v2: Use ?: instead of cast from bool to unsigned. Suggested by Caio. Iron Lake total instructions in shared programs: 8142729 -> 8142677 (<.01%) instructions in affected programs: 12890 -> 12838 (-0.40%) helped: 26 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.39% Instructions are helped. total cycles in shared programs: 188549632 -> 188549394 (<.01%) cycles in affected programs: 60754 -> 60516 (-0.39%) helped: 25 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8 helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for cycles value: -12.91 -5.40 95% mean confidence interval for cycles %-change: -0.84% -0.23% Cycles are helped. GM45 total instructions in shared programs: 5013119 -> 5013093 (<.01%) instructions in affected programs: 6764 -> 6738 (-0.38%) helped: 13 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.34% Instructions are helped. total cycles in shared programs: 128977804 -> 128977700 (<.01%) cycles in affected programs: 37738 -> 37634 (-0.28%) helped: 13 HURT: 0 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -8.00 -8.00 95% mean confidence interval for cycles %-change: -0.36% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/fs: Improve discard_if code generation	Ian Romanick	2019-06-05	1	-3/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we would blindly emit an sequence like: mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F / (+f0.1) cmp.z.f0.1(16) null<1>D g7<8,8,1>D 0D The first move sets the flags based on the initial execution mask. Later discard sequences contain a predicated compare that can only remove more SIMD channels. Often times the only user of the result from the first compare is the second compare. Instead, generate a sequence like mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F / 15F / (+f0.1) cmp.ge.f0.1(8) null<1>F g5<8,8,1>F 0x41700000F / 15F */ If the results stored in g7 and f0.0 are not used, the comparison will be eliminated. This removes an instruction and potentially reduces register pressure. v2: Major re-write of the commit message (including fixing the assembly code). Suggested by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224434 -> 17198659 (-0.15%) instructions in affected programs: 2908125 -> 2882350 (-0.89%) helped: 18891 HURT: 5 helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02% HURT stats (abs) min: 9 max: 105 x̄: 51.40 x̃: 35 HURT stats (rel) min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56% 95% mean confidence interval for instructions value: -1.39 -1.34 95% mean confidence interval for instructions %-change: -1.79% -1.73% Instructions are helped. total cycles in shared programs: 361468458 -> 361170679 (-0.08%) cycles in affected programs: 38470116 -> 38172337 (-0.77%) helped: 16202 HURT: 1456 helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18 helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18% HURT stats (abs) min: 1 max: 5982 x̄: 87.51 x̃: 28 HURT stats (rel) min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64% 95% mean confidence interval for cycles value: -18.24 -15.49 95% mean confidence interval for cycles %-change: -2.26% -2.14% Cycles are helped. total spills in shared programs: 12147 -> 12176 (0.24%) spills in affected programs: 175 -> 204 (16.57%) helped: 8 HURT: 5 total fills in shared programs: 25262 -> 25292 (0.12%) fills in affected programs: 269 -> 299 (11.15%) helped: 8 HURT: 5 Haswell total instructions in shared programs: 13530316 -> 13502647 (-0.20%) instructions in affected programs: 2507824 -> 2480155 (-1.10%) helped: 18859 HURT: 10 helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1 helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41% HURT stats (abs) min: 5 max: 39 x̄: 25.70 x̃: 31 HURT stats (rel) min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31% 95% mean confidence interval for instructions value: -1.49 -1.44 95% mean confidence interval for instructions %-change: -2.42% -2.34% Instructions are helped. total cycles in shared programs: 377865412 -> 377639034 (-0.06%) cycles in affected programs: 40169572 -> 39943194 (-0.56%) helped: 15550 HURT: 1938 helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18 helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25% HURT stats (abs) min: 1 max: 4862 x̄: 89.17 x̃: 35 HURT stats (rel) min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75% 95% mean confidence interval for cycles value: -14.42 -11.47 95% mean confidence interval for cycles %-change: -2.05% -1.91% Cycles are helped. total spills in shared programs: 26769 -> 26814 (0.17%) spills in affected programs: 826 -> 871 (5.45%) helped: 9 HURT: 10 total fills in shared programs: 38383 -> 38425 (0.11%) fills in affected programs: 834 -> 876 (5.04%) helped: 9 HURT: 10 LOST: 5 GAINED: 10 Ivy Bridge total instructions in shared programs: 12079250 -> 12044139 (-0.29%) instructions in affected programs: 2409680 -> 2374569 (-1.46%) helped: 16135 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2 helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68% 95% mean confidence interval for instructions value: -2.21 -2.14 95% mean confidence interval for instructions %-change: -2.76% -2.67% Instructions are helped. total cycles in shared programs: 180116747 -> 179900405 (-0.12%) cycles in affected programs: 25439823 -> 25223481 (-0.85%) helped: 13817 HURT: 1499 helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18 helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97% HURT stats (abs) min: 1 max: 3684 x̄: 98.99 x̃: 52 HURT stats (rel) min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42% 95% mean confidence interval for cycles value: -15.68 -12.57 95% mean confidence interval for cycles %-change: -1.77% -1.63% Cycles are helped. LOST: 8 GAINED: 10 Sandy Bridge total instructions in shared programs: 10878990 -> 10863659 (-0.14%) instructions in affected programs: 1806702 -> 1791371 (-0.85%) helped: 13023 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1 helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10% 95% mean confidence interval for instructions value: -1.18 -1.17 95% mean confidence interval for instructions %-change: -1.68% -1.62% Instructions are helped. total cycles in shared programs: 154082878 -> 153862810 (-0.14%) cycles in affected programs: 20199374 -> 19979306 (-1.09%) helped: 12048 HURT: 510 helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18 helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52% HURT stats (abs) min: 1 max: 448 x̄: 54.39 x̃: 16 HURT stats (rel) min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17% 95% mean confidence interval for cycles value: -17.97 -17.08 95% mean confidence interval for cycles %-change: -1.84% -1.75% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 8155075 -> 8142729 (-0.15%) instructions in affected programs: 949495 -> 937149 (-1.30%) helped: 5810 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.59% -2.48% Instructions are helped. total cycles in shared programs: 188584610 -> 188549632 (-0.02%) cycles in affected programs: 17274446 -> 17239468 (-0.20%) helped: 3881 HURT: 90 helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6 helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30% HURT stats (abs) min: 2 max: 10 x̄: 2.80 x̃: 2 HURT stats (rel) min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07% 95% mean confidence interval for cycles value: -9.35 -8.27 95% mean confidence interval for cycles %-change: -0.85% -0.77% Cycles are helped. GM45 total instructions in shared programs: 5019308 -> 5013119 (-0.12%) instructions in affected programs: 489028 -> 482839 (-1.27%) helped: 2912 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.54% -2.39% Instructions are helped. total cycles in shared programs: 129002592 -> 128977804 (-0.02%) cycles in affected programs: 12669152 -> 12644364 (-0.20%) helped: 2759 HURT: 37 helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4 helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31% HURT stats (abs) min: 2 max: 10 x̄: 3.62 x̃: 4 HURT stats (rel) min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04% 95% mean confidence interval for cycles value: -9.53 -8.20 95% mean confidence interval for cycles %-change: -0.79% -0.70% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu	Ian Romanick	2019-06-05	2	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the same as the need_dest parameter to prepare_alu_destination_and_sources. This allows us to not change the register that is expected to hold an result if an instruction is re-emitted. This is particularly a problem if the re-emitted instruction is a partial write. A later patch will use this feature. No shader-db changes on any Intel platform. v2: Don't do the Boolean resolve when there is no destination. If the ALU instruction didn't write a register, there's nothing to resolve. This replaces an earlier patch "intel/fs: Allocate dummy destination register when need_dest is false". Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/fs: Allow cmod propagation across reads and writes of different flags	Ian Romanick	2019-06-05	2	-6/+272
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This also helps a later patch (intel/fs: Improve discard_if code generation) on about 200 shaders. v2: Document that other instruction sequences are also valid in subtract_merge_with_compare_intervening_mismatch_flag_write. Suggested by Caio. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17224434 (<.01%) instructions in affected programs: 296 -> 292 (-1.35%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.04% -0.81% Instructions are helped. total cycles in shared programs: 361468455 -> 361468458 (<.01%) cycles in affected programs: 2862 -> 2865 (0.10%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31% HURT stats (abs) min: 3 max: 4 x̄: 3.50 x̃: 3 HURT stats (rel) min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -4.34 5.84 95% mean confidence interval for cycles %-change: -0.70% 0.90% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/fs: Fix flag_subreg handling in cmod propagation	Ian Romanick	2019-06-05	2	-0/+196
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were two errors. First, the pass could propagate conditional modifiers from an instruction that writes on flag register to an instruction that writes a different flag register. For example, cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F could be come cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F Second, if an instruction writes f0.1 has it's condition propagated, the modified instruction will incorrectly write flag f0.0. For example, linterp(16) vgrf6:F, g2:F, attr0:F cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F (-f0.1) discard_jump(16) (null):UD could become linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F (-f0.1) discard_jump(16) (null):UD None of these cases will occur currently. The only time we use f0.1 is for generating discard intrinsics. In all those cases, we generate a squence like: cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d (-f0.1) discard_jump(16) (null):UD Due to the mixed types and incompatible conditions, this sequence would never see any cmod propagation. The next patch will change this. No shader-db changes on any Intel platform. v2: Fix typo in comment in test case subtract_delete_compare_other_flag. Noticed by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/fs: Add missing tests for cmod_propagate_not	Ian Romanick	2019-06-05	1	-0/+278
\| \| \| \| \| \| \| \|	Tests like this should have been added in 4467040cb65 ("i965/fs: Propagate conditional modifiers from not instructions"). Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/blorp: Handle SINT/UINT clamping on blits.	Kenneth Graunke	2019-06-05	2	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes blorp_blit handle SINT<->UINT blit value clamping. After reading the source's integer data (which is expanded to 32-bit), we either IMAX with 0 (for SINT -> UINT, to clamp negative numbers) or UMIN with (1 << 31) - 1 (for UINT -> SINT, to clamp positive numbers outside of the representable range). Such blits are not allowed by the OpenGL or Vulkan APIs directly: The Vulkan 1.1 spec for vkCmdBlitImage says: "Integer formats can only be converted to other integer formats with the same signedness." The GL 4.5 spec for glBlitFramebuffer says: "An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: [...] * The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. * The read buffer contains signed integer values and any draw buffer does not contain signed integer values." However, they are useful for other operations, such as texture upload and download, which typically are implemented via blorp_blit(). i965 has code to fall back in this case (which the next commit will delete), and Gallium expects blit() to handle this case for texture upload. Fixes the following tests on iris: - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore Reviewed-by: Jason Ekstrand <[email protected]>
*	anv/pipeline: Move lowering of nir_var_mem_global later	Caio Marcelo de Oliveira Filho	2019-06-05	1	-3/+3
\| \| \| \| \| \|	This let deref optimizations apply to globals before lowering them. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Combine lower_fmod16/32 back into a single lower_fmod.	Kenneth Graunke	2019-06-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
*	nir: Drop lower_fmod64 option.	Kenneth Graunke	2019-06-05	1	-1/+0
\| \| \| \| \| \| \| \|	nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <[email protected]>
*	intel/perf: improve dynamic loading config detection	Lionel Landwerlin	2019-06-05	1	-15/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're currently trying to detect dynamic loading config support by trying to remove to test config (hard coded in the i915 driver) and checking we get ENOENT. This can fail if the test config was updated in Mesa but not yet in i915. A better way to do this is to pick an invalid ID and check for ENOENT. Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/nir: Take nir_shader*s in brw_nir_link_shaders	Jason Ekstrand	2019-06-05	3	-39/+37
\| \| \| \| \| \| \|	Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is enabled, we can just take a single pointer and not a pointer to pointer. Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/nir: Stop returning the shader from helpers	Jason Ekstrand	2019-06-05	9	-51/+45
\| \| \| \| \| \| \| \|	Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/perf: fix EuThreadsCount value in performance equations	Lionel Landwerlin	2019-06-05	1	-2/+1
\| \| \| \| \| \| \| \| \|	EuThreadsCount is supposed to be the number of threads per EU, not the total number of threads in the whole device. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 1fc7b951278428 ("i965: Add Gen8+ INTEL_performance_query support") Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/tools: use C99 print conversion specifier for 32 bit builds	Mark Janes	2019-06-05	3	-4/+4
\| \| \| \| \| \| \| \| \| \|	Fixes formatting errors for 32 bit compilations, eg: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=] Reviewed-by: Lionel Landwerlin <[email protected]>