mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	v3d: refactor some code from v3d40_vir_emit_image_load_store	Alejandro Piñeiro	2019-07-12	1	-33/+29
\| \| \| \| \| \| \|	And moved to new auxiliar method v3d40_image_load_store_tmu_op, equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: use inc/dec tmu operation with atomic sub/add of 1	Alejandro Piñeiro	2019-07-12	2	-6/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Among other things, this avoid the need of loading 1/-1 constants (so one less operation). The removed comment suggest the option of adding support on NIR for inc/dec. Intel just uses an auxiliar method to get which hw operation is needed, so no lowering is needed. And at the same time, being so small, seems unreasonable to try to add a general one on NIR itself. It is more easy to just adapt the method here (that is what the patch does right now). It is worth to note that we are not getting any change on shader-db stats because all those methods are used on the usual shader-db set with shaders needing GLSL > 4.2. In general there aren't too many GLSL ES 3.1 tests. As an alternative, we captured the GLES3/GLSL31/GLS32 used on vk-gl-cts, even if that is not a real life usage of shaders. With those we get the following: total instructions in shared programs: 1217022 -> 1217013 (<.01%) instructions in affected programs: 117 -> 108 (-7.69%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1 helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09% 95% mean confidence interval for instructions value: -2.07 -0.93 95% mean confidence interval for instructions %-change: -10.54% -5.64% Instructions are helped. Note that the shaders helped are really low because most of the vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute shaders. Although right now there is a branch around with CS support, the usual is doing the stats against master. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: remove redefinition of tmu operations on nir_to_vir	Alejandro Piñeiro	2019-07-12	1	-38/+21
\| \| \| \| \| \| \| \| \| \| \| \|	They are already defined, although is a slightly different format on the generated packet headers, so it was needed to change how it is used on nir_to_vir. In addition to allow to remove some duplicated headers, it will allow to define just one get_op_for_atomic_add aux method later to support using inc/dec instead of add of 1/-1. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: tweak initial comment on pack generator script	Alejandro Piñeiro	2019-07-12	1	-1/+1
\| \| \| \| \| \| \|	As the files it mentions to use as reference has slightly different names. Reviewed-by: Eric Anholt <[email protected]>
*	glsl/link_varyings: Fix hash table leak	Yevhenii Kolesnikov	2019-07-12	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \|	Hash tables were not destroyed at return. v2: Use ralloc_context (Eric Anholt) Signed-off-by: Yevhenii Kolesnikov <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	iris: Simplify devinfo access in calculate_result_on_gpu()	Kenneth Graunke	2019-07-12	1	-3/+2
\| \| \| \|	We have devinfo, no need for screen->devinfo.
*	v3d: remove unused definitions	Iago Toral Quiroga	2019-07-12	1	-7/+0
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	v3d: move implementation of some intrinsics to separate helpers	Iago Toral Quiroga	2019-07-12	1	-78/+90
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	v3d: emit correct lowering for logic ops with RGB10A2 render targets	Iago Toral Quiroga	2019-07-12	1	-12/+64
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	v3d: emit correct lowering for logic ops with integer render targets	Iago Toral Quiroga	2019-07-12	2	-9/+47
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	v3d: add lowering for OpenGL logic operations	Iago Toral Quiroga	2019-07-12	4	-0/+279
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements support for OpenGL logic operations by emitting code to read from the TLB if needed and blending the fragment output accordingly. It is similar to VC4's blend lowering pass, but exclusive to logic operations, since blending is otherwise supported in hardware. The pass doesn't handle MSAA targets yet. Fixes the following piglit tests: spec/!opengl 1.0/gl-1.0-logicop/* spec/!opengl 1.1/gl-1.1-xor spec/!opengl 1.1/gl-1.1-xor-copypixels It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which is rendered via glamor using the XOR logic operation. v2: fix checks for allowed variable location and maximum render target (Eric) Reviewed-by: Eric Anholt <[email protected]>
*	v3d: acquire scoreboard lock before first tlb read	Iago Toral Quiroga	2019-07-12	4	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <[email protected]>
*	v3d: implement tile buffer color read intrinsic	Iago Toral Quiroga	2019-07-12	1	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \|	We will be emitting this intrinsic to signal TLB color loads when we implement OpenGL logic operations, where we need to blend the fragment shader color output with the existing color in the render target. Per-sample TLB reads are not supported yet. v2: fix the offset into the color_reads array (Eric). Reviewed-by: Eric Anholt <[email protected]>
*	nir: add a new v3d-specific intrinsic for tile buffer color reads	Iago Toral Quiroga	2019-07-12	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	This is intended to be used, for example, with OpenGL logic operations. It takes a render target as source and a sample index in the base index for MSAA color reads. v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric). Reviewed-by: Eric Anholt <[email protected]>
*	v3d: fix size of color_reads and sample_colors arrays	Iago Toral Quiroga	2019-07-12	1	-2/+2
\| \| \| \| \| \| \| \| \|	We need to scale the size of these arrays to consider up to V3D_MAX_DRAW_BUFFERS render targets and 4 components per color. v2: we want to store each color component separately, so scale by 4 too. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: add color formats and swizzles to the fragment shader key	Iago Toral Quiroga	2019-07-12	2	-0/+20
\| \| \| \| \| \| \|	We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: add helpers to emit ldtlb and ldtlbu signals	Iago Toral Quiroga	2019-07-12	1	-0/+24
\| \| \| \| \| \| \| \| \| \|	The ldtlbu version will read an implicit uniform with the TLB read specifier and should be used for the first read in a sequence of TLB reads (unless the default configuration is valid, in which case we can use ldtlb). The ldtlb version is used for any subsequent TLB read in the sequence. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: handle tlb read dependency tracking as if they were writes	Iago Toral Quiroga	2019-07-12	1	-1/+1
\| \| \| \| \| \|	Tile buffer reads are emitted as ordered sequences and cannot be reordered. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions	Iago Toral Quiroga	2019-07-12	1	-0/+3
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	v3d: tlb loads cannot be removed	Iago Toral Quiroga	2019-07-12	1	-0/+2
\| \| \| \| \| \| \|	Loads from the tile buffer are emitted in ordered sequences so we cannot eliminate or reorder any of them. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: the ldtlbu signal reads an implicit uniform	Iago Toral Quiroga	2019-07-12	1	-0/+1
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	v3d: handle ldtlb and ldtlbu signals during disassembly	Iago Toral Quiroga	2019-07-12	1	-0/+2
\| \| \| \| \| \| \| \|	We already have code to print these signals but the early return in the code that checks if any signals are present present was missing the checks for them, so it would skip printing them unless they were paired with other signals. Reviewed-by: Eric Anholt <[email protected]>
*	radv: report shader stage name when dumping LLVM IR	Samuel Pitoiset	2019-07-12	1	-4/+17
\| \| \| \| \| \| \|	For debugging purposes. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: tidy up radv_get_shader_name() and add NGG stages	Samuel Pitoiset	2019-07-12	3	-12/+32
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/gfx10: update OVERWRITE_COMBINER_{MRT_SHARING,WATERMARK}	Samuel Pitoiset	2019-07-12	1	-16/+4
\| \| \| \| \| \| \|	DCC related, mirror RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]
*	radv/gfx10: do not set alignment on the ngg_emit pointer	Samuel Pitoiset	2019-07-12	1	-1/+0
\| \| \| \| \| \| \|	This is invalid and this fixes a crash in LLVM. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/gfx10: fix exporting clip/cull distances for GS	Samuel Pitoiset	2019-07-12	1	-1/+2
\| \| \| \| \| \| \|	This fixes dEQP-VK.clipping.user_defined.clip_distance.geom. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/gfx10: fix exporting the subpass view index for GS	Samuel Pitoiset	2019-07-12	1	-1/+15
\| \| \| \| \| \| \|	This fixes dEQP-VK.multiview.geometry. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	mesa: save/restore SSO flag when using ARB_get_program_binary	Timothy Arceri	2019-07-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Without this the restored program will fail the pipeline validation checks when we attempt to use an SSO program. Fixes: c20fd744fef1 ("mesa: Add Mesa ARB_get_program_binary helper functions") Reviewed-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111010
*	pan/midgard: Correct component count clamping PSIZ	Alyssa Rosenzweig	2019-07-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Kind of a funky corner case that does not (as far as I know) apply to organic shaders from GLES but does pop up in generated shaders from the fixed-function desktop pipeline. Fixes: bb483a91663f ("panfrost: Clamp point size") Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost: Remove unused display target field	Alyssa Rosenzweig	2019-07-11	2	-4/+0
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/ci: Update expectations	Alyssa Rosenzweig	2019-07-11	1	-2/+1
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	radv: only enable the GS copy shader stage if GS is enabled	Samuel Pitoiset	2019-07-11	1	-1/+1
\| \| \| \| \| \|	Ooops. Signed-off-by: Samuel Pitoiset <[email protected]>
*	freedreno: Add dependency on the xml build to the winsys.	Eric Anholt	2019-07-11	1	-1/+4
\| \| \| \| \| \| \| \|	The screen header includes the common xml, and otherwise we might race to build before it's done. Fixes: e03259974e2f ("freedreno: Generate headers from xml files") Reviewed-by: Kristian H. Kristensen <[email protected]>
*	iris: Disable SIMD32 when using a 16x MSAA framebuffer.	Kenneth Graunke	2019-07-11	1	-7/+24
\| \| \| \|	We weren't doing this documented workaround because it's sorta painful.
*	nir/algebraic: Recognize open-coded flrp(a, b, a)	Ian Romanick	2019-07-11	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. All Gen6-Gen9 platforms had similar results. (Skylake shown) total instructions in shared programs: 15041996 -> 15041184 (<.01%) instructions in affected programs: 71776 -> 70964 (-1.13%) helped: 312 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3 helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28% 95% mean confidence interval for instructions value: -2.66 -2.55 95% mean confidence interval for instructions %-change: -1.89% -1.61% Instructions are helped. total cycles in shared programs: 354303333 -> 354301807 (<.01%) cycles in affected programs: 433742 -> 432216 (-0.35%) helped: 206 HURT: 78 helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8 helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82% HURT stats (abs) min: 1 max: 220 x̄: 35.95 x̃: 10 HURT stats (rel) min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56% 95% mean confidence interval for cycles value: -10.68 -0.06 95% mean confidence interval for cycles %-change: -0.99% -0.12% Cycles are helped. Reviewed-by: Matt Turner <[email protected]>
*	nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form	Ian Romanick	2019-07-11	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Convert the pattern directly to flrp. There were negligible improvements on Gen4 and Gen5, and Gen11 was actually hurt. I believe the problem is this optimization conflicts with the (1-x)*y => ffma(-x, y, y) optimization on Gen11. Skylake total instructions in shared programs: 15046487 -> 15041996 (-0.03%) instructions in affected programs: 194681 -> 190190 (-2.31%) helped: 880 HURT: 20 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17% 95% mean confidence interval for instructions value: -5.25 -4.73 95% mean confidence interval for instructions %-change: -5.11% -4.36% Instructions are helped. total cycles in shared programs: 354340839 -> 354303333 (-0.01%) cycles in affected programs: 1753622 -> 1716116 (-2.14%) helped: 786 HURT: 182 helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22 helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84% HURT stats (abs) min: 1 max: 440 x̄: 37.99 x̃: 9 HURT stats (rel) min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32% 95% mean confidence interval for cycles value: -45.90 -31.59 95% mean confidence interval for cycles %-change: -3.09% -2.50% Cycles are helped. All Gen6-Gen8 platforms had similar results. (Broadwell shown) total instructions in shared programs: 15055907 -> 15051466 (-0.03%) instructions in affected programs: 196370 -> 191929 (-2.26%) helped: 871 HURT: 26 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12% 95% mean confidence interval for instructions value: -5.21 -4.69 95% mean confidence interval for instructions %-change: -4.99% -4.24% Instructions are helped. total cycles in shared programs: 387729170 -> 387699745 (<.01%) cycles in affected programs: 1816409 -> 1786984 (-1.62%) helped: 788 HURT: 172 helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22 helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76% HURT stats (abs) min: 1 max: 404 x̄: 45.59 x̃: 14 HURT stats (rel) min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43% 95% mean confidence interval for cycles value: -35.69 -25.61 95% mean confidence interval for cycles %-change: -2.88% -2.40% Cycles are helped. total fills in shared programs: 34712 -> 34710 (<.01%) fills in affected programs: 7 -> 5 (-28.57%) helped: 1 HURT: 0 LOST: 0 GAINED: 2 Reviewed-by: Matt Turner <[email protected]>
*	nir/algebraic: Reassociate fadd into fmul in DPH-like pattern	Ian Romanick	2019-07-11	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> 1220190 (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <[email protected]>
*	nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)	Ian Romanick	2019-07-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. v3: Add a couple more patterns that just move the negation around. No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. Skylake total instructions in shared programs: 15279687 -> 15256058 (-0.15%) instructions in affected programs: 4344440 -> 4320811 (-0.54%) helped: 23455 HURT: 18 helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1 HURT stats (rel) min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34% 95% mean confidence interval for instructions value: -1.01 -1.00 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 355593755 -> 355339981 (-0.07%) cycles in affected programs: 162089552 -> 161835778 (-0.16%) helped: 20467 HURT: 7158 helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6 helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58% HURT stats (abs) min: 1 max: 4814 x̄: 47.46 x̃: 11 HURT stats (rel) min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98% 95% mean confidence interval for cycles value: -10.39 -7.98 95% mean confidence interval for cycles %-change: -0.57% -0.47% Cycles are helped. total spills in shared programs: 8843 -> 8835 (-0.09%) spills in affected programs: 190 -> 182 (-4.21%) helped: 2 HURT: 0 total fills in shared programs: 21738 -> 21738 (0.00%) fills in affected programs: 372 -> 372 (0.00%) helped: 1 HURT: 1 LOST: 12 GAINED: 22 Broadwell total instructions in shared programs: 15290523 -> 15266818 (-0.16%) instructions in affected programs: 4314738 -> 4291033 (-0.55%) helped: 23391 HURT: 11 helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 189 x̄: 18.09 x̃: 1 HURT stats (rel) min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50% 95% mean confidence interval for instructions value: -1.04 -0.99 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 388911660 -> 388830827 (-0.02%) cycles in affected programs: 172903324 -> 172822491 (-0.05%) helped: 15601 HURT: 13269 helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6 helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55% HURT stats (abs) min: 1 max: 14904 x̄: 28.21 x̃: 6 HURT stats (rel) min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60% 95% mean confidence interval for cycles value: -4.20 -1.40 95% mean confidence interval for cycles %-change: -0.17% -0.08% Cycles are helped. total spills in shared programs: 23110 -> 23069 (-0.18%) spills in affected programs: 656 -> 615 (-6.25%) helped: 3 HURT: 1 total fills in shared programs: 34399 -> 34398 (<.01%) fills in affected programs: 905 -> 904 (-0.11%) helped: 3 HURT: 1 LOST: 6 GAINED: 23 Haswell total instructions in shared programs: 13465303 -> 13441142 (-0.18%) instructions in affected programs: 3726999 -> 3702838 (-0.65%) helped: 22139 HURT: 347 helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12% 95% mean confidence interval for instructions value: -1.08 -1.07 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 376271308 -> 376273090 (<.01%) cycles in affected programs: 167496811 -> 167498593 (<.01%) helped: 13206 HURT: 13281 helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8 helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80% HURT stats (abs) min: 1 max: 3828 x̄: 35.32 x̃: 8 HURT stats (rel) min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61% 95% mean confidence interval for cycles value: -1.33 1.47 95% mean confidence interval for cycles %-change: 0.22% 0.36% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23158 -> 23134 (-0.10%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 34580 -> 34550 (-0.09%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 23 GAINED: 13 Ivy Bridge total instructions in shared programs: 12034154 -> 12014301 (-0.16%) instructions in affected programs: 3636209 -> 3616356 (-0.55%) helped: 18771 HURT: 459 helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11% 95% mean confidence interval for instructions value: -1.04 -1.02 95% mean confidence interval for instructions %-change: -0.86% -0.84% Instructions are helped. total cycles in shared programs: 180186960 -> 180175147 (<.01%) cycles in affected programs: 44652745 -> 44640932 (-0.03%) helped: 12979 HURT: 11033 helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6 helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74% HURT stats (abs) min: 1 max: 4811 x̄: 37.61 x̃: 9 HURT stats (rel) min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69% 95% mean confidence interval for cycles value: -2.29 1.31 95% mean confidence interval for cycles %-change: 0.11% 0.26% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3623 -> 3599 (-0.66%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 4061 -> 4031 (-0.74%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 17 GAINED: 18 Sandy Bridge total instructions in shared programs: 10853968 -> 10834932 (-0.18%) instructions in affected programs: 3769957 -> 3750921 (-0.50%) helped: 17944 HURT: 204 helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93% 95% mean confidence interval for instructions value: -1.05 -1.04 95% mean confidence interval for instructions %-change: -0.81% -0.78% Instructions are helped. total cycles in shared programs: 153894864 -> 153885988 (<.01%) cycles in affected programs: 50643925 -> 50635049 (-0.02%) helped: 9361 HURT: 10534 helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4 helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22% HURT stats (abs) min: 1 max: 1371 x̄: 16.42 x̃: 5 HURT stats (rel) min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27% 95% mean confidence interval for cycles value: -1.27 0.38 95% mean confidence interval for cycles %-change: -0.03% 0.04% Inconclusive result (value mean confidence interval includes 0). LOST: 6 GAINED: 24 Reviewed-by: Matt Turner <[email protected]>
*	nir: intel/vec4: Add flag to disable some algebraic optimizations	Ian Romanick	2019-07-11	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A couple patches later in this series use the flag to avoid a few thousand shader-db regresions on all vec4 platforms. I'm not particularly enamored with the name of this flag. However, I suspect the Intel vec4 backend is the only backend that will benefit from it. Specifically, the cases where this helps are all cases where we want to prevent nir_opt_algebraic from rearranging instructions to create 3-source instructions, such as ffma and flrp, with additional immediate value or uniform sources. The earlier commit "intel/vec4: Try to emit a single load for multiple 3-src instruction operands" solves most of the problems caused by additional immediate values, but the restrictions on register strides that cause problems for uniforms and shader inputs persist. Reviewed-by: Matt Turner <[email protected]>
*	intel/vec4: Try to emit immediate sources for MOV	Ian Romanick	2019-07-11	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per the comment in vec4_visitor::nir_emit_load_const, further improvement is possible in this area. That case would be more complicated as I think we'd want to check that all users of the nir_load_const_instr result intended to use the value as float. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on eeebeb211f1 ("intel/vec4: Try emitting non-scalar immediates"). This commit is about twice as helpful since b04beaf41d2 ("intel/vec4: Try both sources as candidates for being immediates"). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13478598 -> 13474068 (-0.03%) instructions in affected programs: 589452 -> 584922 (-0.77%) helped: 2773 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83% 95% mean confidence interval for instructions value: -1.67 -1.60 95% mean confidence interval for instructions %-change: -0.98% -0.94% Instructions are helped. total cycles in shared programs: 376386916 -> 376369392 (<.01%) cycles in affected programs: 16871628 -> 16854104 (-0.10%) helped: 2293 HURT: 523 helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2 helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36% HURT stats (abs) min: 2 max: 316 x̄: 26.99 x̃: 14 HURT stats (rel) min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43% 95% mean confidence interval for cycles value: -7.87 -4.58 95% mean confidence interval for cycles %-change: -0.52% -0.34% Cycles are helped. Sandy Bridge total instructions in shared programs: 10860328 -> 10857675 (-0.02%) instructions in affected programs: 335907 -> 333254 (-0.79%) helped: 1639 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70% 95% mean confidence interval for instructions value: -1.67 -1.57 95% mean confidence interval for instructions %-change: -0.89% -0.84% Instructions are helped. total cycles in shared programs: 153942720 -> 153934120 (<.01%) cycles in affected programs: 5604818 -> 5596218 (-0.15%) helped: 1494 HURT: 97 helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2 helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18% HURT stats (abs) min: 2 max: 160 x̄: 32.02 x̃: 20 HURT stats (rel) min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56% 95% mean confidence interval for cycles value: -6.45 -4.36 95% mean confidence interval for cycles %-change: -0.32% -0.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139378 -> 8137267 (-0.03%) instructions in affected programs: 265616 -> 263505 (-0.79%) helped: 1148 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1 helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62% 95% mean confidence interval for instructions value: -1.90 -1.78 95% mean confidence interval for instructions %-change: -0.90% -0.83% Instructions are helped. total cycles in shared programs: 188541756 -> 188537540 (<.01%) cycles in affected programs: 9807004 -> 9802788 (-0.04%) helped: 1143 HURT: 4 helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2 helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18% 95% mean confidence interval for cycles value: -3.80 -3.55 95% mean confidence interval for cycles %-change: -0.14% -0.12% Cycles are helped. Reviewed-by: Matt Turner <[email protected]>
*	intel/vec4: Try to emit a VF source in try_immediate_source	Ian Romanick	2019-07-11	1	-12/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit is also a pre-requisite for the next commit. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on eeebeb211f1 ("intel/vec4: Try emitting non-scalar immediates"). This change is a lot less helpful since that commit landed (previously helped 1934 shaders on HSW) because, apparently, a lot of the cases helped by that commit were things like vector loads of { 1.0, 1.0, 1.0 } that were also helped by this commit. Haswell total instructions in shared programs: 13480095 -> 13478598 (-0.01%) instructions in affected programs: 229534 -> 228037 (-0.65%) helped: 1006 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09% 95% mean confidence interval for instructions value: -1.54 -1.43 95% mean confidence interval for instructions %-change: -1.15% -1.07% Instructions are helped. total cycles in shared programs: 376385734 -> 376386916 (<.01%) cycles in affected programs: 14101380 -> 14102562 (<.01%) helped: 941 HURT: 56 helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2 helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 115.50 x̃: 32 HURT stats (rel) min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44% 95% mean confidence interval for cycles value: -2.06 4.43 95% mean confidence interval for cycles %-change: -0.47% -0.39% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 12048004 -> 12046589 (-0.01%) instructions in affected programs: 217072 -> 215657 (-0.65%) helped: 934 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11% 95% mean confidence interval for instructions value: -1.57 -1.46 95% mean confidence interval for instructions %-change: -1.18% -1.10% Instructions are helped. total cycles in shared programs: 180285854 -> 180287608 (<.01%) cycles in affected programs: 14103824 -> 14105578 (0.01%) helped: 871 HURT: 53 helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2 helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 123.66 x̃: 32 HURT stats (rel) min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46% 95% mean confidence interval for cycles value: -1.60 5.39 95% mean confidence interval for cycles %-change: -0.46% -0.37% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10861227 -> 10860328 (<.01%) instructions in affected programs: 92969 -> 92070 (-0.97%) helped: 624 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95% 95% mean confidence interval for instructions value: -1.52 -1.36 95% mean confidence interval for instructions %-change: -1.09% -1.01% Instructions are helped. total cycles in shared programs: 153944316 -> 153942720 (<.01%) cycles in affected programs: 1640956 -> 1639360 (-0.10%) helped: 601 HURT: 15 helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2 helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08% HURT stats (abs) min: 2 max: 72 x̄: 36.13 x̃: 36 HURT stats (rel) min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00% 95% mean confidence interval for cycles value: -3.44 -1.74 95% mean confidence interval for cycles %-change: -0.18% -0.09% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139924 -> 8139378 (<.01%) instructions in affected programs: 69776 -> 69230 (-0.78%) helped: 322 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1 helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54% 95% mean confidence interval for instructions value: -1.88 -1.51 95% mean confidence interval for instructions %-change: -0.85% -0.72% Instructions are helped. total cycles in shared programs: 188542864 -> 188541756 (<.01%) cycles in affected programs: 3031532 -> 3030424 (-0.04%) helped: 320 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2 helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -3.85 -3.07 95% mean confidence interval for cycles %-change: -0.06% -0.05% Cycles are helped. Reviewed-by: Matt Turner <[email protected]>
*	intel/vec4: Try to emit a single load for multiple 3-src instruction operands	Ian Romanick	2019-07-11	2	-4/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a 3-source instruction uses immediate values 1.0 and -1.0, just load 1.0 into a register. Use the negation source modifier to get -1.0. This has trivial impact now, but it prevents a few thousand regressions on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)" All Gen6 and Gen7 platforms had similar results. (Haswell shown) total instructions in shared programs: 13487412 -> 13487406 (<.01%) instructions in affected programs: 541 -> 535 (-1.11%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.33% -0.97% Instructions are helped. total cycles in shared programs: 376402564 -> 376402500 (<.01%) cycles in affected programs: 10348 -> 10284 (-0.62%) helped: 10 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2 helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29% 95% mean confidence interval for cycles value: -11.72 0.08 95% mean confidence interval for cycles %-change: -1.20% -0.36% Inconclusive result (value mean confidence interval includes 0). No shader-db changes on any other Intel platform. Reviewed-by: Matt Turner <[email protected]>
*	intel/vec4: Refactor operand fixing for ffma and flrp	Ian Romanick	2019-07-11	2	-8/+16
\| \| \| \|	Reviewed-by: Matt Turner <[email protected]>
*	panfrost: Wire up GLES2-class polygon offset	Alyssa Rosenzweig	2019-07-11	1	-0/+11
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/decode: Depth units/factor are identical to GL	Alyssa Rosenzweig	2019-07-11	2	-10/+2
\| \| \| \| \| \| \|	I'm not sure why I thoughtt here was an off-by-one, other than maybe bad data collection. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	etnaviv: remove dead translate_ts_sampler_format(..) declaration	Christian Gmeiner	2019-07-11	1	-3/+0
\| \| \| \| \| \| \|	Fixes: 66411521ea9 ("etnaviv: combine translate_ts_sampler_format/translate_msaa_format") Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
*	intel/fs: Add support for SLM fence in Gen11	Caio Marcelo de Oliveira Filho	2019-07-11	6	-12/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen11 SLM is not on L3 anymore, so now the hardware has two separate fences. Add a way to control which fence types to use. At this time, we don't have enough information in NIR to control the visibility of the memory being fenced, so for now be conservative and assume that fences will need a stall. With more information later we'll be able to reduce those. Fixes Vulkan CTS tests in ICL: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp The whole set of supported tests in dEQP-VK.memory_model.* group should be passing in ICL now. v2: Pass BTI around instead of having an enum. (Jason) Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets transformed into two. (Jason) List tests fixed. (Lionel) v3: For clarity, split the decision of which fences to emit from the emission code. (Jason) Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
*	Revert "panfrost/midgard: Use _safe iterator"	Tomeu Vizoso	2019-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	This reverts commit 812ce2ce9e5655613eae740926176509897122fa. We massively regress with the reverted patch. So in the meantime, take it out. Signed-off-by: Tomeu Vizoso <[email protected]>
*	panfrost: Don't lie about Z/S formats	Alyssa Rosenzweig	2019-07-11	4	-6/+34
\| \| \| \| \| \| \| \| \| \| \|	Only Z24S8 is properly supported right now, so let's be careful. Fixes a number of issues relating to improper Z/S handling. The most obvious is depth buffers with incorrect strides, which manifests in truly bizarre ways and can happen commonly with FBOs. Fixes WebGL (Aquarium runs, etc). Signed-off-by: Alyssa Rosenzweig <[email protected]>