aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAlyssa Rosenzweig <[email protected]>2020-05-11 10:02:49 -0400
committerMarge Bot <[email protected]>2020-06-30 16:21:33 +0000
commit54d7907c276f2e5ef428ead58721fd82e4b26f40 (patch)
treee51f5df08cf40b3ef5293481692010c01d6b01c5
parent3e3958c44f78e882468a092557ec6b0b1404bc54 (diff)
nir: Propagate *2*16 conversions into vectors
If we have code like: ('f2f16', ('vec2', ('f2f32', 'a@16'), '#b@32')) We would like to eliminate the conversions, but the existing rules can't see into the the (heterogenous) vector. So instead of trying to eliminate in one pass, we add opts to propagate the f2f16 into the vector. Even if nothing further happens, this is often a win since then the created vector is smaller (half2 instead of float2). Hence the above gets transformed to ('vec2', ('f2f16', ('f2f32', 'a@16')), ('f2f16', '#b@32')) Then the existing f2f16(f2f32) rule will kick in for the first component and constant folding will for the second and we'll be left with ('vec2', 'a@16', '#b@16') ...eliminating all conversions. v2: Predicate on !options->vectorize_vec2_16bit. As discussed, this optimization helps greatly on true vector architectures (like Midgard) but wreaks havoc on more modern SIMD-within-a-register architectures (like Bifrost and modern AMD). So let's predicate on that. v3: Extend for integers as well and add a comment explaining the transforms. Results on Midgard (unfortunately a true SIMD architecture): total instructions in shared programs: 51359 -> 50963 (-0.77%) instructions in affected programs: 4523 -> 4127 (-8.76%) helped: 53 HURT: 0 helped stats (abs) min: 1 max: 86 x̄: 7.47 x̃: 6 helped stats (rel) min: 1.71% max: 28.00% x̄: 9.66% x̃: 7.34% 95% mean confidence interval for instructions value: -10.58 -4.36 95% mean confidence interval for instructions %-change: -11.45% -7.88% Instructions are helped. total bundles in shared programs: 25825 -> 25670 (-0.60%) bundles in affected programs: 2057 -> 1902 (-7.54%) helped: 53 HURT: 0 helped stats (abs) min: 1 max: 26 x̄: 2.92 x̃: 2 helped stats (rel) min: 2.86% max: 30.00% x̄: 8.64% x̃: 8.33% 95% mean confidence interval for bundles value: -3.93 -1.92 95% mean confidence interval for bundles %-change: -10.69% -6.59% Bundles are helped. total quadwords in shared programs: 41359 -> 41055 (-0.74%) quadwords in affected programs: 3801 -> 3497 (-8.00%) helped: 57 HURT: 0 helped stats (abs) min: 1 max: 57 x̄: 5.33 x̃: 4 helped stats (rel) min: 1.92% max: 21.05% x̄: 8.22% x̃: 6.67% 95% mean confidence interval for quadwords value: -7.35 -3.32 95% mean confidence interval for quadwords %-change: -9.54% -6.90% Quadwords are helped. total registers in shared programs: 3849 -> 3807 (-1.09%) registers in affected programs: 167 -> 125 (-25.15%) helped: 32 HURT: 1 helped stats (abs) min: 1 max: 3 x̄: 1.34 x̃: 1 helped stats (rel) min: 20.00% max: 50.00% x̄: 26.35% x̃: 20.00% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67% 95% mean confidence interval for registers value: -1.54 -1.00 95% mean confidence interval for registers %-change: -29.41% -20.69% Registers are helped. total threads in shared programs: 2471 -> 2520 (1.98%) threads in affected programs: 49 -> 98 (100.00%) helped: 25 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.96 x̃: 2 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for threads value: 1.88 2.04 95% mean confidence interval for threads %-change: 100.00% 100.00% Threads are [helped]. total spills in shared programs: 168 -> 168 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 186 -> 186 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4999>
-rw-r--r--src/compiler/nir/nir_opt_algebraic.py36
1 files changed, 36 insertions, 0 deletions
diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
index f4b621cbd65..575024c04bf 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -1777,6 +1777,42 @@ for op in ['frcp', 'frsq', 'fsqrt', 'fexp2', 'flog2', 'fsign', 'fsin', 'fcos']:
(('bcsel', a, (op, b), (op + '(is_used_once)', c)), (op, ('bcsel', a, b, c))),
]
+# This section contains optimizations to propagate downsizing conversions of
+# constructed vectors into vectors of downsized components. Whether this is
+# useful depends on the SIMD semantics of the backend. On a true SIMD machine,
+# this reduces the register pressure of the vector itself and often enables the
+# conversions to be eliminated via other algebraic rules or constant folding.
+# In the worst case on a SIMD architecture, the propagated conversions may be
+# revectorized via nir_opt_vectorize so instruction count is minimally
+# impacted.
+#
+# On a machine with SIMD-within-a-register only, this actually
+# counterintuitively hurts instruction count. These machines are the same that
+# require vectorize_vec2_16bit, so we predicate the optimizations on that flag
+# not being set.
+#
+# Finally for scalar architectures, there should be no difference in generated
+# code since it all ends up scalarized at the end, but it might minimally help
+# compile-times.
+
+for i in range(2, 4 + 1):
+ for T in ('f', 'u', 'i'):
+ vec_inst = ('vec' + str(i),)
+
+ indices = ['a', 'b', 'c', 'd']
+ suffix_in = tuple((indices[j] + '@32') for j in range(i))
+
+ to_16 = '{}2{}16'.format(T, T)
+ to_mp = '{}2{}mp'.format(T, T)
+
+ out_16 = tuple((to_16, indices[j]) for j in range(i))
+ out_mp = tuple((to_mp, indices[j]) for j in range(i))
+
+ optimizations += [
+ ((to_16, vec_inst + suffix_in), vec_inst + out_16, '!options->vectorize_vec2_16bit'),
+ ((to_mp, vec_inst + suffix_in), vec_inst + out_mp, '!options->vectorize_vec2_16bit')
+ ]
+
# This section contains "late" optimizations that should be run before
# creating ffmas and calling regular optimizations for the final time.
# Optimizations should go here if they help code generation and conflict