summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* u_blit: (trivial) u_blit.h needs to include p_defines.hRoland Scheidegger2018-03-101-0/+1
| | | | | | | (For the pipe_tex_filter enum) Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* travis: bump libxcb version to 1.13Christian Gmeiner2018-03-101-2/+2
| | | | | | | | | Fixes following dependency problem: Native dependency xcb-dri3 found: NO found '1.11' but need: '>= 1.13' Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Fixes: c80c08e22603 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
* mesa: Make gl_vertex_array contain pointers to first order VAO members.Mathias Fröhlich2018-03-1027-437/+480
| | | | | | | | | | | | | | | Instead of keeping a copy of the vertex array content in struct gl_vertex_array only keep pointers to the first order information originaly in the VAO. For that represent the current values by struct gl_array_attributes and struct gl_vertex_buffer_binding. v2: Change comments. Remove gl... prefix from variables except in the i965 directory where it was like that before. Reindent because of that. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* draw: fix alpha value for very short aa linesRoland Scheidegger2018-03-102-2/+24
| | | | | | | | The logic would not work correctly for line lengths smaller than 1.0, even a degenerated line with length 0 would still produce a fragment with anyhwere between alpha 0.0 and 0.5. Reviewed-by: Brian Paul <[email protected]>
* intel/vulkan: Hard code CS scratch_ids_per_subslice for CherryviewJordan Justen2018-03-091-17/+28
| | | | | | | | | | Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs seems to help with a GPU hang seen on synmark CSDof. Cc: <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Hard code CS scratch_ids_per_subslice for CherryviewJordan Justen2018-03-091-17/+27
| | | | | | | | | | | | | Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs seems to help with a GPU hang seen on synmark CSDof. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 Cc: <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Eero Tamminen <[email protected]>
* st/dri: fix OpenGL-OpenCL interop for GL_TEXTURE_BUFFERMarek Olšák2018-03-091-24/+34
| | | | | | | | Tested by our OpenCL team. Fixes: 9c499e6759b26c5e "st/mesa: don't invoke st_finalize_texture & st_convert_sampler for TBOs" Acked-by: Alex Deucher <[email protected]>
* radeonsi: add a workaround for GFX9 hang with init_config alignmentMarek Olšák2018-03-091-1/+2
| | | | | Fixes: 75c5d25f0f34cd702 "radeonsi: align command buffer starting address to fix some Raven hangs" Cc: 17.3 18.0 <[email protected]>
* ac/gpu_info: print ib_start_alignment, add assertionMarek Olšák2018-03-091-0/+2
|
* meson: Use system_has_kms_drm in default driver selectionGreg V2018-03-091-3/+5
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* broadcom/vc4: Add an accelerated path to turn raster R8/RG88 into tiled.Eric Anholt2018-03-093-0/+211
| | | | | Drawing a 1080p YV12 video stream generated by MMAL goes from 10.5 FPS to 36.
* gallium: Add a util_blitter path for using a custom VS and FS.Eric Anholt2018-03-092-0/+69
| | | | | | | | | Like the r600 paths to use other custom states, we pass in a couple of parameters to customize the innards of the blitter. It's up to the caller to wrap other state necessary for its shaders (for example, constant buffers for the uniforms the shader uses). Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Allow binding non-zero constant buffers.Eric Anholt2018-03-095-5/+53
| | | | | We're going to use UBO loads for implementing YUV linear-to-T-format blits.
* broadcom: Remove our defines of DRM_FORMAT_MOD_INVALID.Eric Anholt2018-03-092-8/+0
| | | | The imported drm_fourcc.h handles it now.
* broadcom: Suppress compiler warnings about enum pipe_tex_filter.Eric Anholt2018-03-092-0/+2
|
* egl/x11: Re-allocate buffers if format is suboptimalLouis-Francis Ratté-Boulianne2018-03-096-7/+44
| | | | | | | | | | If PresentCompleteNotify event says the pixmap was presented with mode PresentCompleteModeSuboptimalCopy, it means the pixmap could possibly have been flipped instead if allocated with a different format/modifier. Signed-off-by: Louis-Francis Ratté-Boulianne <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* egl/x11: Support DRI3 v1.1Louis-Francis Ratté-Boulianne2018-03-097-63/+383
| | | | | | | | | | | | Add support for DRI3 v1.1, which allows pixmaps to be backed by multi-planar buffers, or those with format modifiers. This is both for allocating render buffers, as well as EGLImage imports from a native pixmap (EGL_NATIVE_PIXMAP_KHR). Signed-off-by: Louis-Francis Ratté-Boulianne <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi/x11: Return VK_SUBOPTIMAL_KHR for X11Louis-Francis Ratté-Boulianne2018-03-093-5/+63
| | | | | | | | | When it is detected that a window could have been flipped but has been copied because of suboptimal format/modifier. The Vulkan client should then re-create the swapchain. Signed-off-by: Louis-Francis Ratté-Boulianne <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi/x11: Add support for DRI3 v1.2Daniel Stone2018-03-093-19/+164
| | | | | | | | Adds support for multiple planes and buffer modifiers. v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers" v12: Multi-planar/modifier support is now DRI3 v1.2; also update release versions
* autotools: include all meson.build filesDylan Baker2018-03-091-0/+2
| | | | | | | | | | Otherwise SWR cannot be built with meson from an autotools generated tarball, such as the 18.0.0-rc4 tarball. Fixes: 16bf81383080 ("meson/swr: re-shuffle generated files") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: George Kyriazis <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/mesa: gl_program::info.system_values_read is a 64-bit-fieldMichel Dänzer2018-03-092-7/+7
| | | | | | | | We were dropping the upper 32 bits, which caused assertion failures in some compute shader piglit tests with radeonsi since the commit below. Fixes: 752e96970303 ("compiler: Add two new system values for subgroups") Reviewed-by: Marek Olšák <[email protected]>
* swr/rast: Refactor memory gather operationsGeorge Kyriazis2018-03-092-6/+4
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add KNOB_DISABLE_SPLIT_DRAWGeorge Kyriazis2018-03-092-8/+26
| | | | | | | | | | This is useful for archrast data collection. This greatly speeds up the post processing script since there is significantly less events generated. Finally, this is a simpler option to communicate to users than having them directly adjust MAX_PRIMS_PER_DRAW and MAX_TESS_PRIMS_PER_DRAW. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add VPOPCNTGeorge Kyriazis2018-03-092-0/+9
| | | | | | Supports popcnt on vector masks (e.g. <8 x i1>) Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add tracking for stream out topologyGeorge Kyriazis2018-03-094-5/+8
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add split draw and other state information to DrawInfoEvent.George Kyriazis2018-03-094-32/+22
| | | | | | Removed specific split draw events. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Refactor api and worker event handlers.George Kyriazis2018-03-092-35/+52
| | | | | | | | | In the API event handler we want to share information between the core layer and the API. Specifically, around associating various ids with different kinds of events. For example, associate render pass id with draw ids, or command buffer ids with draw ids. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add support for generalized late and early z/stencil statsGeorge Kyriazis2018-03-092-0/+73
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Rasterized Subspans stats supportGeorge Kyriazis2018-03-094-0/+30
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Added commentGeorge Kyriazis2018-03-091-0/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* vulkan/wsi: clean up cleanup pathEric Engestrom2018-03-091-7/+7
| | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Keith Packard <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radv: Fix the autotools build take 2.Bas Nieuwenhuizen2018-03-091-1/+1
| | | | | | Forgot to remove a word.... Fixes: 04ffabf17a "radv: Fix autotools build."
* etnaviv: allow mixing different bit depths for color and depth surfacesLucas Stach2018-03-091-1/+1
| | | | | | | | Vivante hardware supports this just fine. There is no reason why this shouldn't be advertised as a valid combination. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* autotools: Add tegra to AM_DISTCHECK_CONFIGURE_FLAGSThierry Reding2018-03-091-1/+1
| | | | | | | | This allows the driver to be built on a make distcheck and makes sure that it properly builds when a distribution tarball is made. Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* tegra: Initial supportThierry Reding2018-03-0926-4/+2770
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tegra K1 and later use a GPU that can be driven by the Nouveau driver. But the GPU is a pure render node and has no display engine, hence the scanout needs to happen on the Tegra display hardware. The GPU and the display engine each have a separate DRM device node exposed by the kernel. To make the setup appear as a single device, this driver instantiates a Nouveau screen with each instance of a Tegra screen and forwards GPU requests to the Nouveau screen. For purposes of scanout it will import buffers created on the GPU into the display driver. Handles that userspace requests are those of the display driver so that they can be used to create framebuffers. This has been tested with some GBM test programs, as well as kmscube and weston. All of those run without modifications, but I'm sure there is a lot that can be improved. Some fixes contributed by Hector Martin <[email protected]>. Changes in v2: - duplicate file descriptor in winsys to avoid potential issues - require nouveau when building the tegra driver - check for nouveau driver name on render node - remove unneeded dependency on libdrm_tegra - remove zombie references to libudev - add missing headers to C_SOURCES variable - drop unneeded tegra/ prefix for includes - open device files with O_CLOEXEC - update copyrights Changes in v3: - properly unwrap resources in ->resource_copy_region() - support vertex buffers passed by user pointer - allocate custom stream and const uploader - silence error message on pre-Tegra124 - support X without explicit PRIME Changes in v4: - ship Meson build files in distribution tarball - drop duplicate driver_tegra dependency Reviewed-by: Emil Velikov <[email protected]> Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Dmitry Osipenko <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* nouveau: Add framebuffer modifier supportThierry Reding2018-03-096-5/+146
| | | | | | | | | | | | | | | | | | | | | | | This adds support for framebuffer modifiers to Nouveau. This will be used by the Tegra driver to share metadata about the format of buffers (such as the tiling mode or compression). Changes in v2: - remove unused parameters to nouveau_buffer_create() - move format modifier query code to nvc0 backend - restrict format modifiers to 2D textures - implement ->query_dmabuf_modifiers() Changes in v4: - add UAPI include path on meson builds Changes in v5: - remove unnecessary includes Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* nouveau/nvc0: Extract common tile mode macroThierry Reding2018-03-091-6/+9
| | | | | | | | | | | Add a new macro that can be used to extract the tiling mode from a tile_mode value. This is will be used to determine the number of GOBs used in block linear mode. Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* drm/tegra: Sanitize format modifiersThierry Reding2018-03-091-17/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing format modifier definitions were merged prematurely, and recent work has unveiled that the definitions are suboptimal in several ways: - The format specifiers, except for one, are not Tegra specific, but the names don't reflect that. - The number space is split into two, reserving 32 bits for some "parameter" which most of the modifiers are not going to have. - Symbolic names for the modifiers are not using the standard DRM_FORMAT_MOD_* prefix, which makes them awkward to use. - The vendor prefix NV is somewhat ambiguous. Fortunately, nobody's started using these modifiers, so we can still fix the above issues. Do so by using the standard prefix. Also, remove TEGRA from the name of those modifiers that exist on NVIDIA GPUs as well. In case of the block linear modifiers, make the "parameter" smaller (4 bits, though only 6 values are valid) and don't let that leak into any of the other modifiers. Finally, also use the more canonical NVIDIA instead of the ambiguous NV prefix. This is based on commit 268892cb63a822315921a8dab48ac3e4abf7dd03 from Linux v4.16-rc1. Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* drm/fourcc: Fix fourcc_mod_code() definitionThierry Reding2018-03-091-1/+1
| | | | | | | | | | | Avoid a compiler warnings when the val parameter is an expression. This is based on commit 5843f4e02fbe86a59981e35adc6cabebee46fdc0 from Linux v4.16-rc1. Acked-by: Emil Velikov <[email protected]> Tested-by: Andre Heider <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
* radv: Fix autotools build.Bas Nieuwenhuizen2018-03-091-10/+7
| | | | | | | Forgot it again .... Fixes: b6347807a9 "radv: Generate icd files." Acked-by: Samuel Pitoiset <[email protected]>
* ac/nir: set number of channels for packed mrt exportsSamuel Pitoiset2018-03-091-0/+5
| | | | | | | | Bit 0 enables VSRC0 (R in low bits, G high) and bit 2 enables VSRC1 (B in low bits, A high). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Update version to 1.1.70.Bas Nieuwenhuizen2018-03-091-2/+2
| | | | | | Turns out they did not reset the patch number on release. Reviewed-by: Dave Airlie <[email protected]>
* radv: Generate icd files.Bas Nieuwenhuizen2018-03-094-25/+70
| | | | | | | | If the api version is too low, the loader clamps the application requested version to the advertized version, which messes with which extensions are enabled. Reviewed-by: Dave Airlie <[email protected]>
* nir: Don't i2b a value that is already BooleanIan Romanick2018-03-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bunch of shaders have sequences like: i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0)))) Other optimizations (and NIR's typeless nature) reduce this to i2b(x == y) which is silly. Skylake total instructions in shared programs: 14498698 -> 14497948 (<.01%) instructions in affected programs: 74480 -> 73730 (-1.01%) helped: 277 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2 helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68% 95% mean confidence interval for instructions value: -3.35 -2.06 95% mean confidence interval for instructions %-change: -1.74% -1.16% Instructions are helped. total cycles in shared programs: 532015500 -> 531999238 (<.01%) cycles in affected programs: 5943878 -> 5927616 (-0.27%) helped: 251 HURT: 74 helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14 helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53% HURT stats (abs) min: 1 max: 4550 x̄: 214.04 x̃: 15 HURT stats (rel) min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33% 95% mean confidence interval for cycles value: -158.51 58.43 95% mean confidence interval for cycles %-change: -1.07% -0.04% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4753 -> 4735 (-0.38%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. Haswell and Broadwell had simliar results. (Broadwell shown) total instructions in shared programs: 14791877 -> 14791127 (<.01%) instructions in affected programs: 77326 -> 76576 (-0.97%) helped: 278 HURT: 1 helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49% 95% mean confidence interval for instructions value: -3.33 -2.05 95% mean confidence interval for instructions %-change: -1.70% -1.13% Instructions are helped. total cycles in shared programs: 558250067 -> 558252872 (<.01%) cycles in affected programs: 5806328 -> 5809133 (0.05%) helped: 235 HURT: 83 helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16 helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51% HURT stats (abs) min: 1 max: 10590 x̄: 265.19 x̃: 20 HURT stats (rel) min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54% 95% mean confidence interval for cycles value: -89.87 107.51 95% mean confidence interval for cycles %-change: -1.06% -0.32% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4735 -> 4717 (-0.38%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. total fills in shared programs: 83111 -> 83110 (<.01%) fills in affected programs: 28 -> 27 (-3.57%) helped: 1 HURT: 0 Ivy Bridge total instructions in shared programs: 11774173 -> 11773436 (<.01%) instructions in affected programs: 70819 -> 70082 (-1.04%) helped: 267 HURT: 0 helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2 helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63% 95% mean confidence interval for instructions value: -3.51 -2.01 95% mean confidence interval for instructions %-change: -1.94% -1.21% Instructions are helped. total cycles in shared programs: 257153833 -> 257148932 (<.01%) cycles in affected programs: 585341 -> 580440 (-0.84%) helped: 167 HURT: 100 helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16 helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88% HURT stats (abs) min: 1 max: 200 x̄: 25.95 x̃: 16 HURT stats (rel) min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65% 95% mean confidence interval for cycles value: -33.25 -3.46 95% mean confidence interval for cycles %-change: -1.47% -0.54% Cycles are helped. total loops in shared programs: 3416 -> 3398 (-0.53%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. LOST: 2 GAINED: 0 Sandy Bridge total instructions in shared programs: 10499306 -> 10499094 (<.01%) instructions in affected programs: 6051 -> 5839 (-3.50%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2 helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45% 95% mean confidence interval for instructions value: -7.66 -2.20 95% mean confidence interval for instructions %-change: -5.47% -3.12% Instructions are helped. total cycles in shared programs: 145862568 -> 145861370 (<.01%) cycles in affected programs: 61733 -> 60535 (-1.94%) helped: 36 HURT: 2 helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35 helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81% HURT stats (abs) min: 18 max: 102 x̄: 60.00 x̃: 60 HURT stats (rel) min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48% 95% mean confidence interval for cycles value: -41.28 -21.77 95% mean confidence interval for cycles %-change: -6.16% -3.00% Cycles are helped. total loops in shared programs: 1803 -> 1785 (-1.00%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. LOST: 4 GAINED: 0 No changes on Iron Lake of GM45. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965/vec4: Allow CSE on subset VF constant loadsIan Romanick2018-03-081-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Rewrite the code that generates the VF mask. Suggested by Ken. No changes on other platforms. Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13059891 -> 13059884 (<.01%) instructions in affected programs: 431 -> 424 (-1.62%) helped: 7 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -3.39% -0.71% Instructions are helped. total cycles in shared programs: 409260032 -> 409260018 (<.01%) cycles in affected programs: 4228 -> 4214 (-0.33%) helped: 7 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28% 95% mean confidence interval for cycles value: -2.00 -2.00 95% mean confidence interval for cycles %-change: -1.15% 0.07% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Relax writemask condition in CSEIan Romanick2018-03-081-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the previously seen instruction generates more fields than the new instruction, still allow CSE to happen. This doesn't do much, but it also enables a couple more shaders in the next patch. It helped quite a bit in another change series that I have (at least for now) abandoned. v2: Add some extra comentary about the parameters to instructions_match. Suggested by Ken. No changes on Skylake, Broadwell, Iron Lake or GM45. Ivy Bridge and Haswell had similar results. (Ivy Bridge shown) total instructions in shared programs: 11780295 -> 11780294 (<.01%) instructions in affected programs: 302 -> 301 (-0.33%) helped: 1 HURT: 0 total cycles in shared programs: 257308315 -> 257308313 (<.01%) cycles in affected programs: 2074 -> 2072 (-0.10%) helped: 1 HURT: 0 Sandy Bridge total instructions in shared programs: 10506687 -> 10506686 (<.01%) instructions in affected programs: 335 -> 334 (-0.30%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Merge CMP and SEL into CSEL on Gen8+Ian Romanick2018-03-082-0/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Fix several problems handling inverted predicates. Add a much bigger comment around the BRW_CONDITIONAL_NZ case. v3: Allow uniforms and shader inputs as sources for the original SEL and CMP instructions. This enables a LOT more shaders to receive CSEL merging (5816 vs 8564 on SKL). v4: Report progress. Broadwell and Skylake had similar results. (Broadwell shown) helped: 8527 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 1 helped stats (rel) min: 0.03% max: 17.80% x̄: 1.12% x̃: 0.70% 95% mean confidence interval for instructions value: -2.51 -2.36 95% mean confidence interval for instructions %-change: -1.15% -1.10% Instructions are helped. total cycles in shared programs: 559442317 -> 558288357 (-0.21%) cycles in affected programs: 372699860 -> 371545900 (-0.31%) helped: 6748 HURT: 1450 helped stats (abs) min: 1 max: 32000 x̄: 182.41 x̃: 12 helped stats (rel) min: <.01% max: 66.08% x̄: 3.42% x̃: 0.70% HURT stats (abs) min: 1 max: 2538 x̄: 53.08 x̃: 14 HURT stats (rel) min: <.01% max: 96.72% x̄: 3.32% x̃: 0.90% 95% mean confidence interval for cycles value: -179.01 -102.51 95% mean confidence interval for cycles %-change: -2.37% -2.08% Cycles are helped. LOST: 0 GAINED: 6 No changes on earlier platforms. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v1] Reviewed-by: Kenneth Graunke <[email protected]> [v3] Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add infrastructure for generating CSEL instructions.Kenneth Graunke2018-03-088-1/+34
| | | | | | | | | | | | | | | | v2 (idr): Don't allow CSEL with a non-float src2. v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt). Don't reset the access mode afterwards (suggested by Samuel and Matt). Add support for CSEL not modifying the flags to more places (requested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v3] Reviewed-by: Matt Turner <[email protected]>
* nir: Narrow some dot product operationsIan Romanick2018-03-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On vector platforms, this helps elide some constant loads. v2: Reorder the transformations. No changes on Broadwell or Skylake. Haswell total instructions in shared programs: 13093793 -> 13060163 (-0.26%) instructions in affected programs: 1277532 -> 1243902 (-2.63%) helped: 13216 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.57 -2.49 95% mean confidence interval for instructions %-change: -3.65% -3.54% Instructions are helped. total cycles in shared programs: 409580819 -> 409268463 (-0.08%) cycles in affected programs: 71730652 -> 71418296 (-0.44%) helped: 9898 HURT: 2352 helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16 helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50% HURT stats (abs) min: 2 max: 276 x̄: 23.25 x̃: 6 HURT stats (rel) min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97% 95% mean confidence interval for cycles value: -33.19 -17.80 95% mean confidence interval for cycles %-change: -4.50% -4.26% Cycles are helped. total fills in shared programs: 82059 -> 82052 (<.01%) fills in affected programs: 21 -> 14 (-33.33%) helped: 7 HURT: 0 Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown) total instructions in shared programs: 11811851 -> 11780605 (-0.26%) instructions in affected programs: 1155007 -> 1123761 (-2.71%) helped: 12304 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.56 -2.48 95% mean confidence interval for instructions %-change: -3.71% -3.59% Instructions are helped. total cycles in shared programs: 257618409 -> 257316805 (-0.12%) cycles in affected programs: 71999580 -> 71697976 (-0.42%) helped: 9155 HURT: 2380 helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16 helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62% HURT stats (abs) min: 2 max: 290 x̄: 21.14 x̃: 4 HURT stats (rel) min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33% 95% mean confidence interval for cycles value: -34.32 -17.97 95% mean confidence interval for cycles %-change: -4.55% -4.29% Cycles are helped. GM45 and Iron Lake had nearly identical results (Iron Lake shown) total instructions in shared programs: 7886750 -> 7879944 (-0.09%) instructions in affected programs: 373781 -> 366975 (-1.82%) helped: 3715 HURT: 47 helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1 helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06% HURT stats (abs) min: 1 max: 6 x̄: 2.55 x̃: 2 HURT stats (rel) min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35% 95% mean confidence interval for instructions value: -1.85 -1.77 95% mean confidence interval for instructions %-change: -2.91% -2.73% Instructions are helped. total cycles in shared programs: 178114636 -> 178095452 (-0.01%) cycles in affected programs: 7227666 -> 7208482 (-0.27%) helped: 3349 HURT: 301 helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4 helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63% HURT stats (abs) min: 2 max: 42 x̄: 9.13 x̃: 10 HURT stats (rel) min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50% 95% mean confidence interval for cycles value: -5.52 -4.99 95% mean confidence interval for cycles %-change: -0.81% -0.73% Cycles are helped. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v1]
* i965: perf: consolidate unmapping oa perf bo outside accumulationLionel Landwerlin2018-03-081-4/+3
| | | | | | | | | Do this in one place outside the only caller of the accumulation function. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>