aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* meson: don't install windows headers on non-windows platformsMarc Dietrich2018-02-011-1/+3
| | | | | | | | Only dive into the windows subdir if windows platform is selected. Signed-off-by: Marc Dietrich <[email protected]> Fixes: 5ef75cb02b2b4db5506b8 "meson: build src/glx/windows" Reviewed-by: Eric Engestrom <[email protected]>
* radeonsi: use ac_build_buffer_load_format for image buffer loadsMarek Olšák2018-02-011-4/+10
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: use ac_build_buffer_load_format for image buffer loadsMarek Olšák2018-02-011-8/+13
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add glc parameter to ac_build_buffer_load_formatMarek Olšák2018-02-015-5/+7
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: load the right number of components for VS inputs and TBOsMarek Olšák2018-02-014-5/+54
| | | | | | | | | | | | | | | | | | | | | | | The supported counts are 1, 2, 4. (3=4) The following snippet loads float, vec2, vec3, and vec4: Before: buffer_load_format_x v9, v4, s[0:3], 0 idxen ; E0002000 80000904 buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ; E00C2000 80020005 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ; E00C2000 80010507 After: buffer_load_format_x v10, v4, s[0:3], 0 idxen ; E0002000 80000A04 buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ; E0042000 80020805 buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ; E00C2000 80010307 Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: remove unused si_shader_context membersMarek Olšák2018-02-012-11/+0
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* glx/apple: locate dispatch table functions to wrap by nameJon Turney2018-02-013-7/+22
| | | | | | | | | | | | | Avoid reaching into the dispatch table internals (and thus having to deal with the complexities of remap etc.) by identifying functions to wrap by name. See: https://lists.freedesktop.org/archives/mesa-dev/2015-June/086721.html et seq. https://bugs.freedesktop.org/show_bug.cgi?id=90311 Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glx/apple: include util/debug.h for env_var_as_boolean prototypeJon Turney2018-02-012-0/+2
| | | | | | | | mesa/src/glx/glxcmds.c:1295:21: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration] mesa/src/glx/apple/apple_visual.c:85:28: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration] Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* osx: ld doesn't support --build-idJon Turney2018-02-012-1/+14
| | | | | Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* configure: Default to gbm=no on osxJon Turney2018-02-011-2/+2
| | | | | Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: remove usage of alloca in externalobjects.c v4Andres Rodriguez2018-02-011-12/+48
| | | | | | | | | | | | Don't want an overly large numBufferBarriers/numTextureBarriers to blow up the stack. v2: handle malloc errors v3: fix patch v4: initialize texObjs/bufObjs Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Andres Rodriguez <[email protected]>
* radv: do not insert shaders in cache when it's disabledSamuel Pitoiset2018-02-011-5/+24
| | | | | | | | | | | | | When the application doesn't provide its own pipeline cache, the driver uses a in-memory cache but it shouldn't insert any entries when the cache is explicitely disabled by the user. Found while running my experimental pipeline-db tool with a ton of shaders, the memory footprint was just huge, and sometimes the process was even killed... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use separate bindings for graphics and compute descriptorsSamuel Pitoiset2018-02-013-53/+125
| | | | | | | | | | | | | The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store the bind point when creating descriptors with templatesSamuel Pitoiset2018-02-012-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* r600/eg: make sure we allow vpm bit on other CF ops.Dave Airlie2018-02-011-0/+1
| | | | | | | the vpm bit wasn't being applied to the push/pop instructions. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium/st/clover: remove unused PIPE_SHADER_IR_LLVMTimothy Arceri2018-02-017-19/+5
| | | | | | This has been unused since 100796c15c3a. Acked-by: Marek Olšák <[email protected]>
* r600/sb: just add some missing debug bitsDave Airlie2018-02-011-0/+15
| | | | Signed-off-by: Dave Airlie <[email protected]>
* r600: fix buffer resinfo opcode translation.Dave Airlie2018-02-012-2/+2
| | | | | | | | | The vtx operations never got translated, so things worked by 0 being equal to 0, translate them so we can use the proper buffer resinfo code. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/glsl_to_nir: add more nir opts to st_nir_opts()Timothy Arceri2018-02-011-16/+20
| | | | | | | | | | | | | | | | All of the current gallium nir driver use these optimisations but they do so in their backends. Having these called in the backend only can cause a number of problems: - Shader compile times are greater because the opts need to do significant passes over all shader variants. - The shader cache is partially defeated due to the significant optimisation passes over variants. - We might miss out on nir linking optimisation opportunities. Adding these passes to st_nir_opts() alleviates these problems. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: perform 2 uploads with dual slot *64*PASSTHRU formats on gen<8Andres Gomez2018-01-311-5/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The emission of vertex attributes corresponding to dvec3 and dvec4 vertex shader input variables was not correct when the <size> passed to the VertexAttribL* commands was <= 2. In 61a8a55f557 ("i965/gen8: Fix vertex attrib upload for dvec3/4 shader inputs"), for gen8+ we needed to determine if the attrib was dual slot to emit 128 or 256-bit, independently of the VAO size. Similarly, for gen < 8 we also need to determine whether the attrib is dual slot to force the emission of 256-bits through 2 uploads. Additionally, we make use of the ISL_FORMAT_R32_FLOAT format in this second upload to fill these unspecified components with zeros, as we also do for gen8+. Fixes the following test on Haswell: KHR-GL46.vertex_attrib_binding.basic-inputL-case1 v2: Added more inline comments to explain why we are using ISL_FORMAT_R32_FLOAT and its consequences, as requested by Alejandro and Antía. Fixes: 75968a668e4 ("i965/gen7: expose OpenGL 4.2 on Haswell when supported") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103006 Cc: Alejandro Piñeiro <[email protected]> Cc: Juan A. Suarez Romero <[email protected]> Cc: Antia Puentes <[email protected]> Cc: Rafael Antognolli <[email protected]> Cc: Kenneth Graunke <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Antia Puentes <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make texture validation code use texture objects, not units.Kenneth Graunke2018-01-312-16/+17
| | | | | | | | This requires moving the _MaxLevel handling up to the callers. Another user of intel_finalize_mipmap_tree will be added later that depends on _MaxLevel not being modified. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Pass tObj into intel_update_max_level instead of intel_obj.Kenneth Graunke2018-01-311-3/+3
| | | | | | | We want both anyway, but this will simplify things a tiny bit in an upcoming patch. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Delete more misleading comments.Kenneth Graunke2018-01-311-3/+0
| | | | | | | brw_bo_wait_rendering used to take a brw_context pointer for perf_debug messages about stalls. Chris eliminated that in 833108ac14ade91f54cc6e. This message about passing NULL to avoid those warnings is no longer relevant, and just adds confusion. So, drop it.
* docs/features: mark EXT_semaphore(_fd) as DONE v2Andres Rodriguez2018-01-312-3/+4
| | | | | | | | Support for these extensions is available in radeonsi. v2: also updated relnotes Signed-off-by: Andres Rodriguez <[email protected]>
* st/mesa: whitespace, formatting fixes in st_glsl_to_tgsi.cppBrian Paul2018-01-311-104/+169
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* st/mesa: s/int/GLenum/ in st_glsl_to_tgsi.cppBrian Paul2018-01-311-5/+6
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: use opcode local var to simplify some codeBrian Paul2018-01-311-4/+2
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: s/unsigned/VGPU10_OPCODE_TYPE/Brian Paul2018-01-311-10/+11
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* radv: do not dump meta shader statsSamuel Pitoiset2018-01-312-21/+18
| | | | | | | That's quite useless and that pollutes the output. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix emission of ffract for 64-bitSamuel Pitoiset2018-01-311-7/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* meson: dedup gallium-xa logicEric Engestrom2018-01-311-15/+10
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: dedup gallium-va logicEric Engestrom2018-01-311-20/+18
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: dedup gallium-omx logicEric Engestrom2018-01-311-19/+19
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: dedup gallium-xvmc logicEric Engestrom2018-01-311-20/+18
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* meson: dedup gallium-vdpau logicEric Engestrom2018-01-311-22/+19
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* Revert "mesa: add missing RGB9_E5 format in _mesa_base_fbo_format"Antia Puentes2018-01-311-3/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit 513c2263cbff45edb105c7b46e58f316e06746ab. _mesa_base_fbo_format_ is used to validate the internalformat passed to RenderbufferStorage, which in the OpenGL 4.6 is said: "An INVALID_ENUM error is generated if internalformat is not one of the color-renderable, depth-renderable, or stencil-renderable formats defined in section 9.4." RGB9_E5 format is not renderable, as stated in the same specification (Bug 9338). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104794 Cc: Juan A. Suarez Romero <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Juan A. Suarez <[email protected]>
* winsys/radeon: Compute is_displayable in surf_drm_to_winsysMichel Dänzer2018-01-311-0/+3
| | | | | | | | It was always 0, breaking (at least) DRI3 with Xwayland. Bugzilla: https://bugs.freedesktop.org/104306 Fixes: 5f2073be3282 ("ac/surface: add ac_surface::is_displayable") Reviewed-by: Marek Olšák <[email protected]>
* radv: remove predication on cache flushesMatthew Nicholls2018-01-314-18/+13
| | | | | | | | | This can lead to a situation where cache flushes could get conditionally disabled while still clearing the flush_bits, and thus flushes due to application pipeline barriers may never get executed. Fixes: a6c2001ace (radv: add support for cmd predication.) Signed-off-by: Dave Airlie <[email protected]>
* mesa: fix broken glGet*(GL_POLYGON_MODE) queryBrian Paul2018-01-302-3/+3
| | | | | | | | | This reverts part of the patch which introduced the GLenum16 change. Fixes a conform regression found by Roland. Fixes: f96a69f916aed405 ("mesa: replace GLenum with GLenum16 in common structures (v4)") Reviewed-by: Roland Scheidegger <[email protected]>
* virgl: also remove dimension on indirect.Dave Airlie2018-01-311-1/+0
| | | | | | | | | This fixes some dEQP tests that generated bad shaders. Fixes: b6f6ead19 (virgl: drop const dimensions on first block.) Reviewed-by: Gurchetan Singh <[email protected]> Tested-by: Gurchetan Singh <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: remove DBG_PRECOMPILEMarek Olšák2018-01-313-51/+0
| | | | | | it's useless and shader-db stats only report the main shader part. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print shader-db stats for main parts, not final binariesMarek Olšák2018-01-313-13/+23
| | | | | | This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move max_simd_waves computation into a separate functionMarek Olšák2018-01-312-12/+23
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: fix glGet MAX_VERTEX_ATTRIB queriesMarek Olšák2018-01-311-3/+3
| | | | | | Broken by f96a69f916aed40519e755d0460a83940a587 Reviewed-by: Brian Paul <[email protected]>
* anv/cmd_buffer: Re-emit the pipeline at every subpassJason Ekstrand2018-01-301-0/+11
| | | | | | | | | | If we ever hit this edge-case, it can theoretically cause problem for CNL because we could end up changing render targets without re-emitting 3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the edge case. Cc: [email protected] Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Distribute binary operations with constants into bcselIan Romanick2018-01-301-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was specifically designed to simplify 1+mix(0, a-1, condition) to mix(1, a, condition) by pushing the 1+ inside. Skylake, Broadwell, and Haswell had similar results. Skylake shown. total instructions in shared programs: 14521753 -> 14521716 (<.01%) instructions in affected programs: 10619 -> 10582 (-0.35%) helped: 51 HURT: 14 helped stats (abs) min: 1 max: 12 x̄: 1.43 x̃: 1 helped stats (rel) min: 0.20% max: 3.58% x̄: 1.01% x̃: 0.95% HURT stats (abs) min: 1 max: 11 x̄: 2.57 x̃: 1 HURT stats (rel) min: 0.22% max: 1.75% x̄: 1.20% x̃: 1.32% 95% mean confidence interval for instructions value: -1.31 0.17 95% mean confidence interval for instructions %-change: -0.80% -0.27% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533000205 -> 533003533 (<.01%) cycles in affected programs: 110610 -> 113938 (3.01%) helped: 43 HURT: 28 helped stats (abs) min: 6 max: 440 x̄: 27.12 x̃: 16 helped stats (rel) min: 0.39% max: 4.84% x̄: 1.60% x̃: 1.67% HURT stats (abs) min: 2 max: 3066 x̄: 160.50 x̃: 14 HURT stats (rel) min: 0.08% max: 77.78% x̄: 5.16% x̃: 0.62% 95% mean confidence interval for cycles value: -43.81 137.56 95% mean confidence interval for cycles %-change: -1.47% 3.60% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 10018840 -> 10018713 (<.01%) instructions in affected programs: 9431 -> 9304 (-1.35%) helped: 51 HURT: 3 helped stats (abs) min: 1 max: 80 x̄: 2.76 x̃: 1 helped stats (rel) min: 0.20% max: 16.43% x̄: 1.16% x̃: 0.81% HURT stats (abs) min: 1 max: 12 x̄: 4.67 x̃: 1 HURT stats (rel) min: 0.22% max: 1.33% x̄: 0.59% x̃: 0.22% 95% mean confidence interval for instructions value: -5.36 0.66 95% mean confidence interval for instructions %-change: -1.66% -0.46% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 87571944 -> 87572785 (<.01%) cycles in affected programs: 117234 -> 118075 (0.72%) helped: 42 HURT: 23 helped stats (abs) min: 2 max: 114 x̄: 51.90 x̃: 30 helped stats (rel) min: 0.11% max: 11.01% x̄: 4.45% x̃: 2.74% HURT stats (abs) min: 1 max: 2341 x̄: 131.35 x̃: 10 HURT stats (rel) min: 0.06% max: 37.11% x̄: 2.75% x̃: 0.61% 95% mean confidence interval for cycles value: -61.05 86.93 95% mean confidence interval for cycles %-change: -3.47% -0.33% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10542933 -> 10542844 (<.01%) instructions in affected programs: 11487 -> 11398 (-0.77%) helped: 52 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 1.96 x̃: 1 helped stats (rel) min: 0.08% max: 8.16% x̄: 0.90% x̃: 0.72% HURT stats (abs) min: 1 max: 11 x̄: 4.33 x̃: 1 HURT stats (rel) min: 0.22% max: 1.22% x̄: 0.55% x̃: 0.22% 95% mean confidence interval for instructions value: -3.17 -0.07 95% mean confidence interval for instructions %-change: -1.13% -0.52% Instructions are helped. total cycles in shared programs: 146098397 -> 146097094 (<.01%) cycles in affected programs: 128140 -> 126837 (-1.02%) helped: 47 HURT: 8 helped stats (abs) min: 2 max: 333 x̄: 29.21 x̃: 18 helped stats (rel) min: 0.13% max: 5.04% x̄: 1.18% x̃: 0.95% HURT stats (abs) min: 1 max: 16 x̄: 8.75 x̃: 9 HURT stats (rel) min: 0.08% max: 0.43% x̄: 0.30% x̃: 0.34% 95% mean confidence interval for cycles value: -37.49 -9.90 95% mean confidence interval for cycles %-change: -1.22% -0.71% Cycles are helped. Iron Lake total instructions in shared programs: 7886711 -> 7886509 (<.01%) instructions in affected programs: 10425 -> 10223 (-1.94%) helped: 50 HURT: 2 helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1 helped stats (rel) min: 0.34% max: 15.38% x̄: 1.12% x̃: 0.54% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.86% max: 0.91% x̄: 0.89% x̃: 0.89% 95% mean confidence interval for instructions value: -8.05 0.28 95% mean confidence interval for instructions %-change: -1.83% -0.26% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 178115324 -> 178114612 (<.01%) cycles in affected programs: 765726 -> 765014 (-0.09%) helped: 39 HURT: 1 helped stats (abs) min: 2 max: 276 x̄: 18.31 x̃: 8 helped stats (rel) min: <.01% max: 8.47% x̄: 0.39% x̃: 0.04% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -32.07 -3.53 95% mean confidence interval for cycles %-change: -0.86% 0.10% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 4857762 -> 4857661 (<.01%) instructions in affected programs: 5523 -> 5422 (-1.83%) helped: 25 HURT: 1 helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1 helped stats (rel) min: 0.34% max: 13.61% x̄: 1.04% x̃: 0.52% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.86% max: 0.86% x̄: 0.86% x̃: 0.86% 95% mean confidence interval for instructions value: -9.99 2.22 95% mean confidence interval for instructions %-change: -2.01% 0.08% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 122179674 -> 122179194 (<.01%) cycles in affected programs: 530162 -> 529682 (-0.09%) helped: 22 HURT: 1 helped stats (abs) min: 2 max: 292 x̄: 21.91 x̃: 7 helped stats (rel) min: <.01% max: 8.65% x̄: 0.44% x̃: 0.04% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -46.56 4.82 95% mean confidence interval for cycles %-change: -1.20% 0.36% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* nir: Rearrange logic op-compounded integer comparesIan Romanick2018-01-301-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Skylake and Broadwell had similar results. Skylake shown. total instructions in shared programs: 14521769 -> 14521753 (<.01%) instructions in affected programs: 8782 -> 8766 (-0.18%) helped: 16 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.12% max: 0.40% x̄: 0.20% x̃: 0.18% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.23% -0.16% Instructions are helped. total cycles in shared programs: 533000376 -> 533000205 (<.01%) cycles in affected programs: 447035 -> 446864 (-0.04%) helped: 9 HURT: 9 helped stats (abs) min: 2 max: 40 x̄: 35.78 x̃: 40 helped stats (rel) min: 0.02% max: 0.18% x̄: 0.10% x̃: 0.09% HURT stats (abs) min: 1 max: 52 x̄: 16.78 x̃: 10 HURT stats (rel) min: <.01% max: 1.11% x̄: 0.29% x̃: 0.12% 95% mean confidence interval for cycles value: -25.07 6.07 95% mean confidence interval for cycles %-change: -0.08% 0.27% Inconclusive result (value mean confidence interval includes 0). No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* nir: Rearrange and-compounded float comparesIan Romanick2018-01-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If both comparisons are used as sources for instructions other than the iand, this transformation is detrimental. If the non-identical value in both compares is constant, the fmin or fmax will be constant-folded away, so the transformation is always a win. It is interesting to me that on Iron Lake only 81 shaders have instruction counts changed, but 726 shaders have cycle counts changed. shader-db results: Skylake total instructions in shared programs: 14525728 -> 14521017 (-0.03%) instructions in affected programs: 1164726 -> 1160015 (-0.40%) helped: 1692 HURT: 5 helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2 helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33% HURT stats (abs) min: 1 max: 12 x̄: 3.20 x̃: 1 HURT stats (rel) min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86% 95% mean confidence interval for instructions value: -3.52 -2.03 95% mean confidence interval for instructions %-change: -0.86% -0.74% Instructions are helped. total cycles in shared programs: 533115449 -> 532991404 (-0.02%) cycles in affected programs: 119401803 -> 119277758 (-0.10%) helped: 1145 HURT: 467 helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18 helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42% HURT stats (abs) min: 1 max: 1590 x̄: 92.15 x̃: 15 HURT stats (rel) min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39% 95% mean confidence interval for cycles value: -122.16 -31.74 95% mean confidence interval for cycles %-change: -0.94% -0.57% Cycles are helped. total spills in shared programs: 9597 -> 9534 (-0.66%) spills in affected programs: 403 -> 340 (-15.63%) helped: 1 HURT: 1 total fills in shared programs: 13904 -> 13790 (-0.82%) fills in affected programs: 1627 -> 1513 (-7.01%) helped: 2 HURT: 1 LOST: 0 GAINED: 2 Broadwell total instructions in shared programs: 14816966 -> 14812590 (-0.03%) instructions in affected programs: 1499885 -> 1495509 (-0.29%) helped: 1672 HURT: 15 helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33% HURT stats (abs) min: 1 max: 21 x̄: 9.20 x̃: 8 HURT stats (rel) min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53% 95% mean confidence interval for instructions value: -3.14 -2.05 95% mean confidence interval for instructions %-change: -0.85% -0.73% Instructions are helped. total cycles in shared programs: 559353622 -> 559345595 (<.01%) cycles in affected programs: 139893703 -> 139885676 (<.01%) helped: 921 HURT: 697 helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18 helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87% HURT stats (abs) min: 1 max: 2370 x̄: 178.03 x̃: 38 HURT stats (rel) min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14% 95% mean confidence interval for cycles value: -59.64 49.72 95% mean confidence interval for cycles %-change: -1.02% -0.66% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78902 -> 78861 (-0.05%) spills in affected programs: 2418 -> 2377 (-1.70%) helped: 1 HURT: 11 total fills in shared programs: 83782 -> 83678 (-0.12%) fills in affected programs: 3515 -> 3411 (-2.96%) helped: 2 HURT: 11 LOST: 0 GAINED: 5 Haswell and Ivy Bridge had similar results. Haswell shown. total instructions in shared programs: 9033898 -> 9032010 (-0.02%) instructions in affected programs: 308064 -> 306176 (-0.61%) helped: 921 HURT: 4 helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1 helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23% 95% mean confidence interval for instructions value: -2.21 -1.87 95% mean confidence interval for instructions %-change: -0.88% -0.68% Instructions are helped. total cycles in shared programs: 84628949 -> 84620520 (<.01%) cycles in affected programs: 2164913 -> 2156484 (-0.39%) helped: 518 HURT: 359 helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20 helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01% HURT stats (abs) min: 1 max: 586 x̄: 36.43 x̃: 8 HURT stats (rel) min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40% 95% mean confidence interval for cycles value: -15.17 -4.05 95% mean confidence interval for cycles %-change: -0.77% -0.32% Cycles are helped. LOST: 0 GAINED: 4 Sandy Bridge total instructions in shared programs: 10544860 -> 10542933 (-0.02%) instructions in affected programs: 360019 -> 358092 (-0.54%) helped: 931 HURT: 4 helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1 helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33% 95% mean confidence interval for instructions value: -2.23 -1.89 95% mean confidence interval for instructions %-change: -0.76% -0.58% Instructions are helped. total cycles in shared programs: 146106820 -> 146098397 (<.01%) cycles in affected programs: 3435047 -> 3426624 (-0.25%) helped: 572 HURT: 329 helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15 helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33% HURT stats (abs) min: 1 max: 1714 x̄: 30.93 x̃: 6 HURT stats (rel) min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19% 95% mean confidence interval for cycles value: -16.85 -1.85 95% mean confidence interval for cycles %-change: -0.39% -0.01% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 7886925 -> 7886711 (<.01%) instructions in affected programs: 25763 -> 25549 (-0.83%) helped: 75 HURT: 6 helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1 helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53% HURT stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 1 HURT stats (rel) min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86% 95% mean confidence interval for instructions value: -3.69 -1.60 95% mean confidence interval for instructions %-change: -2.54% -0.57% Instructions are helped. total cycles in shared programs: 178116888 -> 178115324 (<.01%) cycles in affected programs: 5858790 -> 5857226 (-0.03%) helped: 484 HURT: 242 helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6 helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06% HURT stats (abs) min: 2 max: 76 x̄: 4.07 x̃: 2 HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03% 95% mean confidence interval for cycles value: -2.76 -1.55 95% mean confidence interval for cycles %-change: -0.12% 0.01% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 4857870 -> 4857762 (<.01%) instructions in affected programs: 13994 -> 13886 (-0.77%) helped: 39 HURT: 5 helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48% HURT stats (abs) min: 1 max: 16 x̄: 4.00 x̃: 1 HURT stats (rel) min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86% 95% mean confidence interval for instructions value: -3.86 -1.05 95% mean confidence interval for instructions %-change: -2.61% 0.04% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 122180744 -> 122179674 (<.01%) cycles in affected programs: 3686646 -> 3685576 (-0.03%) helped: 273 HURT: 141 helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6 helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06% HURT stats (abs) min: 2 max: 76 x̄: 3.66 x̃: 2 HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02% 95% mean confidence interval for cycles value: -3.42 -1.75 95% mean confidence interval for cycles %-change: -0.15% 0.03% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* nir: Separate a weird compare with zero to two compares with zeroIan Romanick2018-01-301-0/+2
| | | | | | | | | | | min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0). No shader-db changes, but it does prevent 6 to 12 instruction regressions in the next patch on all measured Intel platforms. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* nir: Simplify min and max of b2fIan Romanick2018-01-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Rebase on almost 2 years. Require that one of the arguments to fmin or fmax be used only once. This prevents some regressions. shader-db results: Skylake and Broadwell had similar results. Skylake shown. total instructions in shared programs: 14526021 -> 14525913 (<.01%) instructions in affected programs: 4613 -> 4505 (-2.34%) helped: 31 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4 helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42% total cycles in shared programs: 533118710 -> 533118403 (<.01%) cycles in affected programs: 34334 -> 34027 (-0.89%) helped: 24 HURT: 0 helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14 helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03% No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Elie Tournier <[email protected]>