aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv: support VkExternalFormatANDROID in vkCreateSamplerYcbcrConversionTapani Pälli2018-12-191-4/+26
| | | | | | | | | | | | If external format is used, we store the external format identifier in conversion to be used later when creating VkImageView. v2: rebase to b43f955037c changes v3: added assert, ignore components when creating external format conversion (Lionel) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/android: support creating images from external formatTapani Pälli2018-12-196-3/+172
| | | | | | | | | | | | | | | | | | Since we don't know the exact format at creation time, some initialization is done only when bound with memory in vkBindImageMemory. v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if image has external format v3: refactor prepare_ahw_image, support vkBindImageMemory2, calculate stride correctly for rgb(x) surfaces, rename as 'resolve_ahw_image' v4: rebase to b43f955037c changes v5: add some assertions to verify input correctness (Lionel) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/android: add ahardwarebuffer external memory propertiesTapani Pälli2018-12-191-0/+41
| | | | | | | | | | | | | v2: have separate memory properties for android, set usage flags for buffers correctly v3: code cleanup (Jason) + limit maxArrayLayers to 1 for AHardwareBuffer based images v4: rebase to b43f955037c changes Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/android: support import/export of AHardwareBuffer objectsTapani Pälli2018-12-195-2/+201
| | | | | | | | | | | | | v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB) v3: properly handle usage bits when creating from image v4: refactor, code cleanup (Jason) v5: rebase to b43f955037c changes, initialize bo flags as ANV_BO_EXTERNAL (Lionel) v6: add assert that anv_bo_cache_import succeeds, add comment about multi-bo support to clarify current implementation (Lionel) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: refactor, remove else block in AllocateMemoryTapani Pälli2018-12-191-30/+34
| | | | | | | | This makes it cleaner to introduce more cases where we import memory from different types of external memory buffers. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add anv_ahw_usage_from_vk_usage helper functionTapani Pälli2018-12-193-0/+40
| | | | | | | v2: rebase to b43f955037c changes Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/android: add GetAndroidHardwareBufferPropertiesANDROIDTapani Pälli2018-12-191-0/+119
| | | | | | | | | | | | | | | | Use the anv_format address in formats table as implementation-defined external format identifier for now. When adding YUV format support this might need to change. v2: code cleanup (Jason) v3: set anv_format address as identifier v4: setup suggestedYcbcrModel and suggested[X|Y]ChromaOffset as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL v5: set linear tiling for GPU_DATA_BUFFER usage, add comment about multi-bo support to clarify current implementation (Lionel) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add from/to helpers with android and vulkan formatsTapani Pälli2018-12-191-0/+50
| | | | | | | | | v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason) v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define for now to avoid direct dependency to minigbm headers Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: make anv_get_image_format_features publicTapani Pälli2018-12-192-11/+16
| | | | | | | This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: refactor make_surface to use data from anv_imageTapani Pälli2018-12-192-37/+46
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: add create_flags as part of anv_imageTapani Pälli2018-12-192-0/+2
| | | | | | | | This will make it possible for next patch to rip anv_image_create_info out from make_surface function. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/algebraic: Don't put quotes around floating point literalsIan Romanick2018-12-182-5/+13
| | | | | | | | | | | | | | | | | | The quotation marks around 1.0 cause it to be treated as a string instead of a floating point value. The generator then treats it as an arbitrary variable replacement, so any iand involving a ('ineg', ('b2i', a)) matches. v2: Remove misleading comment about sized literals (suggested by Timothy). Add assertion that the name of a varible is entierly alphabetic (suggested by Jason). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Timothy Arceri <[email protected]> [v1] Reviewed-by: Timothy Arceri <[email protected]> [v1] Fixes: 6bcd2af0863 ("nir/algebraic: Add some optimizations for D3D-style Booleans") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
* meson: Fix libsensors detection.Vinson Lee2018-12-181-1/+1
| | | | | | Fixes: 5e71efef44b9 ("meson: Add lmsensors support") Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: Fix typo.Vinson Lee2018-12-181-1/+1
| | | | | | Fixes: 6b4c7047d571 ("meson: build gallium nine state_tracker") Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nir: Add a new lowering option to lower 3D surfaces from txd to txl.Sagar Ghuge2018-12-182-1/+8
| | | | | | | | | Tested on gen9. v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* meson: add etnaviv to the tools optionChristian Gmeiner2018-12-183-3/+4
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* specs: Bump GLX_MESA_query_renderer to version 9Adam Jackson2018-12-181-3/+9
| | | | | | | | Note that we have an official GL extension number, pick the appropriate section of the GLX spec to modify, and add changelog. Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* specs: Remove GLX_RENDERER_ID_MESA from GLX_MESA_query_rendererAdam Jackson2018-12-181-19/+0
| | | | | | | | | | | | | | | | | | | | | | This has not even had an attempt at implementation. If you asked for renderer 0 - which, the spec implies, should always work - then dri2_convert_glx_attribs would fail, we'd silently fall back to creating an indirect context, and xserver would also not recognize the attribute and would throw BadValue at you. The API would be difficult to use in any case, since there's no way to enumerate how many renderers the screen has. I'd be tempted to add that by defining: glXQueryRendererIntegerMESA(dpy, screen, /* renderer = */ -1, 0, &value); to return the number of renderers, but a new entrypoint might be cleaner. Still, better to not specify it at all than to lie about it. Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* specs: Remove GLES profile interaction text from GLX_MESA_query_rendererAdam Jackson2018-12-181-12/+0
| | | | | | | | | | | | | | | | In one place we say, if GLES isn't supported then the profile version will be 0.0. Then later we say, if the GLES profile extension isn't supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not mentioned in the spec. A strict reading of the latter would mean that GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token, and the query should instead return False. The implementation does not check for the GLES profile extensions, and the additional complexity doesn't seem worth it. Removing the interaction text makes the spec match the implementation. Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* freedreno/ir3: Make imageStore use num components from image formatEduardo Lima Mitev2018-12-181-2/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | emit_intrinsic_store_image() is always using 4 components when collecting registers for the value. When image has less than 4 components (e.g, r32f, rg32i, etc) this results in extra mov instructions. This patch uses the actual number of components from the image format. For example, in a shader like: layout (r32f, binding=0) writeonly uniform imageBuffer u_image; ... void main(void) { ... imageStore (u_image, some_offset, vec4(1.0)); ... } instruction count is reduced in at least 3 instructions (note image format is r32f, 1 component only). This obviously reduces register pressure as well. v2: - Added support for image formats from NV_image_format extension (Ilia Mirkin). - Return 4 components by default instead of asserting. (Rob Clark). v3: Added more missing formats (Ilia Mirkin). v4: Added a debug message for unknown image formats (Rob Clark). Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir/dead_write_vars: Get modes directly from derefsJason Ekstrand2018-12-181-2/+1
| | | | | | | | | Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes clear_unused_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <[email protected]>
* nir/copy_prop_vars: Get modes directly from derefsJason Ekstrand2018-12-181-6/+2
| | | | | | | | | Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes apply_barrier_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <[email protected]>
* nir/lower_wpos_center: Look at derefs for modesJason Ekstrand2018-12-181-2/+4
| | | | | | | | | This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
* nir/lower_io_to_scalar: Look at derefs for modesJason Ekstrand2018-12-181-3/+6
| | | | | | | | | This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
* nir/lower_io_arrays_to_elements: Look at derefs for modesJason Ekstrand2018-12-181-5/+8
| | | | | | | | | This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
* nir/linking_helpers: Look at derefs for modesJason Ekstrand2018-12-181-12/+11
| | | | | | | | | This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <[email protected]>
* nir/propagate_invariant: Skip unknown varsJason Ekstrand2018-12-181-1/+1
| | | | | | | | If we can't find the variable from the deref, just assume it isn't invariant and continue on. This can happen if, for instance, we're writing to a deref that points into an SSBO. Reviewed-by: Timothy Arceri <[email protected]>
* Revert "nir/lower_indirect: Bail early if modes == 0"Ian Romanick2018-12-181-3/+0
| | | | | | | | | | | | | | | | | "There's no point in walking the program if we're never going to actually lower anything." Except we might lower compacted local arrays. In that case, modes will be 0, but there is still lowering to be done. This reverts commit 7f75cf2a9408b9af562e033ef6c1d1fd15141421. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081 Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Tested-by: Clayton Craft <[email protected]> Cc: Kenneth Graunke <[email protected]>
* st/dri: replace format conversion functions with single mapping tableLucas Stach2018-12-182-352/+149
| | | | | | | | | | | | | | | | Each time I have to touch the buffer import/export functions in the dri state tracker I get lost in the maze of functions converting between DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format. Rip it out and replace by a single table, which defines the correspondence between the different representations. Also this now stores all the known representations in the __DRIimageRec, to avoid the loss of information we currently have when importing a buffer with a fourcc, which doesn't have a corresponding dri format. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/dri: allow both render and sampler compatible dma-buf formatsLucas Stach2018-12-181-12/+18
| | | | | | | | | | | | | | Currently all the EGL APIs are missing a way to specify how an imported dma-buf is intended to be used. Demanding the format to be both usable for sampling and rendering artificially restricts the list of formats a driver is able to import. Looking at how the Intel driver implements those DRI2 image APIs it doesn't distinguish between render or sampler compatible formats. So this patch aligns behavior between Intel and Gallium based drivers. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* etnaviv: use surface format directlyLucas Stach2018-12-182-9/+4
| | | | | | | | There is no need to do the detour over the resource behind the surface to get the format. Use the surface format directly. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Philipp Zabel <[email protected]>
* meson: Add toggle for glx-directDylan Baker2018-12-182-3/+7
| | | | | | | | GNU Hurd needs to turn off glx-direct, rather than special case it, we'll just add a toggle. CC: 18.3 <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* meson: Add support for gnu hurdDylan Baker2018-12-181-4/+2
| | | | | CC: 18.3 <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* meson: remove duplicate definitionDylan Baker2018-12-181-2/+0
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* meson: Fix ppc64 little endian detectionDylan Baker2018-12-181-2/+7
| | | | | | | | | | | Old versions of meson returned ppc64le as the cpu_family for little endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't work in that case. This generalizes the check to work for both old and new versions of meson. Fixes: 34bbb24ce7702658cdc4e9d34a650e169716c39e ("meson: Add support for ppc assembly/optimizations") Reviewed-by: Eric Engestrom <[email protected]>
* anv: Bump the patch version to 96Jason Ekstrand2018-12-181-1/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: Don't override subslice count to 4 on Gen11.Kenneth Graunke2018-12-171-1/+1
| | | | | | | | Gen9-10 have fewer than 4 subslices per slice, so they need this to be rounded up. Gen11 isn't documented as needing this hack, and it can also have more than 4 subslices, so the hack actually can break things. Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: More peephole_select for pre-Gen6Ian Romanick2018-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No shader-db changes on any Gen6+ platform. All of the shaders with cycles hurt by more than ~2% are from Master of Orion. All of the shaders have instructions helped. It looks like the pass enables some control flow to be converted to bcsels, then the scheduler does dumb things. These are new shaders (just added before doing this shader-db run), so there's probably some low-hanging fruit. Iron Lake total instructions in shared programs: 8214327 -> 8213684 (<.01%) instructions in affected programs: 84469 -> 83826 (-0.76%) helped: 114 HURT: 26 helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61% 95% mean confidence interval for instructions value: -5.87 -3.32 95% mean confidence interval for instructions %-change: -2.32% -1.17% Instructions are helped. total cycles in shared programs: 187736850 -> 187749314 (<.01%) cycles in affected programs: 506750 -> 519214 (2.46%) helped: 104 HURT: 36 helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16 helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63% HURT stats (abs) min: 4 max: 1402 x̄: 409.67 x̃: 40 HURT stats (rel) min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39% 95% mean confidence interval for cycles value: 28.32 149.74 95% mean confidence interval for cycles %-change: -0.07% 1.61% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 5044014 -> 5043652 (<.01%) instructions in affected programs: 46751 -> 46389 (-0.77%) helped: 63 HURT: 13 helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52% 95% mean confidence interval for instructions value: -6.54 -2.99 95% mean confidence interval for instructions %-change: -3.04% -1.28% Instructions are helped. total cycles in shared programs: 128143042 -> 128150188 (<.01%) cycles in affected programs: 324564 -> 331710 (2.20%) helped: 57 HURT: 19 helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32 helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81% HURT stats (abs) min: 10 max: 1400 x̄: 468.21 x̃: 60 HURT stats (rel) min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70% 95% mean confidence interval for cycles value: 6.90 181.15 95% mean confidence interval for cycles %-change: -0.52% 1.59% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-179-17/+40
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: More peephole selectIan Romanick2018-12-171-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results: The one shader hurt for instructions is a compute shader that had both spills and fills hurt. v2: Fix typo in comment noticed by Caio. v3: Fix inverted condition in brw_nir.c. Noticed by Lionel. Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15072761 -> 15047884 (-0.17%) instructions in affected programs: 895539 -> 870662 (-2.78%) helped: 3623 HURT: 1 helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4 helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20% HURT stats (abs) min: 92 max: 92 x̄: 92.00 x̃: 92 HURT stats (rel) min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92% 95% mean confidence interval for instructions value: -7.10 -6.63 95% mean confidence interval for instructions %-change: -4.03% -3.82% Instructions are helped. total cycles in shared programs: 369738930 -> 369535732 (-0.05%) cycles in affected programs: 68027851 -> 67824653 (-0.30%) helped: 2609 HURT: 1035 helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77 helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47% HURT stats (abs) min: 1 max: 33336 x̄: 261.04 x̃: 20 HURT stats (rel) min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47% 95% mean confidence interval for cycles value: -96.43 -15.09 95% mean confidence interval for cycles %-change: -6.07% -5.36% Cycles are helped. total spills in shared programs: 10158 -> 10159 (<.01%) spills in affected programs: 166 -> 167 (0.60%) helped: 1 HURT: 1 total fills in shared programs: 22105 -> 22116 (0.05%) fills in affected programs: 837 -> 848 (1.31%) helped: 4 HURT: 1 Ivy Bridge total instructions in shared programs: 12021190 -> 11990256 (-0.26%) instructions in affected programs: 910561 -> 879627 (-3.40%) helped: 3344 HURT: 18 helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6 helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31% HURT stats (abs) min: 2 max: 20 x̄: 7.89 x̃: 6 HURT stats (rel) min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70% 95% mean confidence interval for instructions value: -9.49 -8.91 95% mean confidence interval for instructions %-change: -5.32% -4.98% Instructions are helped. total cycles in shared programs: 179077826 -> 178570196 (-0.28%) cycles in affected programs: 63205667 -> 62698037 (-0.80%) helped: 2767 HURT: 620 helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88 helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09% HURT stats (abs) min: 1 max: 31255 x̄: 152.27 x̃: 11 HURT stats (rel) min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58% 95% mean confidence interval for cycles value: -173.94 -125.81 95% mean confidence interval for cycles %-change: -7.68% -6.97% Cycles are helped. Sandy Bridge total instructions in shared programs: 10852569 -> 10843758 (-0.08%) instructions in affected programs: 235803 -> 226992 (-3.74%) helped: 800 HURT: 0 helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8 helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36% 95% mean confidence interval for instructions value: -11.93 -10.10 95% mean confidence interval for instructions %-change: -4.99% -4.39% Instructions are helped. total cycles in shared programs: 154732047 -> 154608941 (-0.08%) cycles in affected programs: 4063110 -> 3940004 (-3.03%) helped: 606 HURT: 253 helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62 helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81% HURT stats (abs) min: 1 max: 1966 x̄: 59.36 x̃: 11 HURT stats (rel) min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67% 95% mean confidence interval for cycles value: -170.49 -116.13 95% mean confidence interval for cycles %-change: -2.61% -1.65% Cycles are helped. No change on Iron Lake or GM45. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-179-18/+47
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/vec4: Propagate conditional modifiers from more compares to other comparesIan Romanick2018-12-171-3/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is a CMP.NZ that compares a single component (via a .zzzz swizzle, for example) with 0, it can propagate its conditional modifier back to a previous CMP that writes only that component. The specific case that I saw was: cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF ... cmp.nz.f0(8) null<1>D g42<4>.xD 0D In this case we can just delete the second CMP. No changes on Broadwell or Skylake because they do not use the vec4 backend. Also no changes on GM45 or Iron Lake. Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown) total instructions in shared programs: 10856676 -> 10852569 (-0.04%) instructions in affected programs: 228322 -> 224215 (-1.80%) helped: 1331 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4 helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83% 95% mean confidence interval for instructions value: -3.19 -2.99 95% mean confidence interval for instructions %-change: -1.93% -1.83% Instructions are helped. total cycles in shared programs: 154788865 -> 154732047 (-0.04%) cycles in affected programs: 2485892 -> 2429074 (-2.29%) helped: 1097 HURT: 59 helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64 helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22% HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2 HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71% 95% mean confidence interval for cycles value: -51.04 -47.26 95% mean confidence interval for cycles %-change: -3.40% -3.07% Cycles are helped. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/fs: Eliminate unary op on operand of compare-with-zeroIan Romanick2018-12-172-17/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and scalar parts. In the VS part, adding the new pattern was not helpful. The pattern that is removed is really old, and it has been handled by NIR for ages. All Gen7+ platforms had similar results. (Broadwell shown) total instructions in shared programs: 14715715 -> 14715709 (<.01%) instructions in affected programs: 474 -> 468 (-1.27%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.40% -1.15% Instructions are helped. total cycles in shared programs: 559569911 -> 559569809 (<.01%) cycles in affected programs: 5963 -> 5861 (-1.71%) helped: 6 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85% 95% mean confidence interval for cycles value: -18.15 -15.85 95% mean confidence interval for cycles %-change: -1.95% -1.51% Cycles are helped. Iron Lake and Sandy Bridge had similar results. (Iron Lake shown) total instructions in shared programs: 7780915 -> 7780913 (<.01%) instructions in affected programs: 246 -> 244 (-0.81%) helped: 2 HURT: 0 total cycles in shared programs: 177876108 -> 177876106 (<.01%) cycles in affected programs: 3636 -> 3634 (-0.06%) helped: 1 HURT: 0 GM45 total instructions in shared programs: 4799152 -> 4799151 (<.01%) instructions in affected programs: 126 -> 125 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 122052654 -> 122052652 (<.01%) cycles in affected programs: 3640 -> 3638 (-0.05%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/vec4/dce: Don't narrow the write mask if the flags are usedIan Romanick2018-12-174-10/+208
| | | | | | | | | | | | | | | | | | | In an instruction sequence like cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD The other fields of vgrf17 may be unused, but the CMP still needs to generate the other flag bits. To my surprise, nothing in shader-db or any test suite appears to hit this. However, I have a change to brw_vec4_cmod_propagation that creates cases where this can happen. This fix prevents a couple dozen regressions in that patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
* i965/vec4: Silence unused parameter warnings in vec4 compiler testsIan Romanick2018-12-173-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::dst_reg* copy_propagation_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_copy_propagation.cpp:57:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg *make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual void copy_propagation_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_copy_propagation.cpp:77:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::vec4_instruction* copy_propagation_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_copy_propagation.cpp:82:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction *emit_urb_write_opcode(bool complete) ^~~~~~~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::dst_reg* register_coalesce_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_register_coalesce.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg *make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual void register_coalesce_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_register_coalesce.cpp:80:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::vec4_instruction* register_coalesce_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_register_coalesce.cpp:85:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction *emit_urb_write_opcode(bool complete) ^~~~~~~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::dst_reg* cmod_propagation_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg *make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual void cmod_propagation_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:85:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::vec4_instruction* cmod_propagation_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:90:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction *emit_urb_write_opcode(bool complete) ^~~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: Fix multiview depth clearsBas Nieuwenhuizen2018-12-171-8/+21
| | | | | | | | We were not using the view mask for depth clears, causing only the first view to be cleared. Fixes: 2e86f6b2597 "radv: Add multiview clears." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Remove redundant format check.Bas Nieuwenhuizen2018-12-171-4/+0
| | | | | | | | | The switch directly after the check has a default case that returns NULL too, so the effective return value is not changed. Also this check is wrong once we start dealing with formats introduced by an extension (e.g. YUV formats). Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: Fix clamping of uints for image store lowering.Eric Anholt2018-12-171-1/+1
| | | | | | | | | | I botched some copy-and-paste and clamped to signed int max instead of uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on skl. Fixes: d3e046e76c06 ("nir: Pull some of intel's image load/store format conversion to nir_format.h") Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Fix the argument type for vir_BRANCH().Eric Anholt2018-12-171-1/+1
| | | | | Apparently this has been spewing warnings for Jason's clang, but not my gcc.
* vc4: Reuse nir_format_convert.h in our blend lowering.Eric Anholt2018-12-171-33/+3
| | | | | These helpers came along after and have effectively the same implementation.