| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Previously, this could have made the resource divergent in code like
that which is genereated by nir_lower_non_uniform_access.
Fixes: da8ed68a ('nir: replace nir_move_load_const() with nir_opt_sink()')
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, for code like:
loop {
loop {
a = load_ubo()
}
use(a)
}
adjust_block_for_loops() would return the block before the first loop.
Now we compute the range of allowed blocks and then walk the dominance
tree directly, guaranteeing directly that we always choose a block that
dominates all the uses and is dominated by the definition.
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
anywhere else
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These can appear after loop unrolling.
v2: stylistic changes
v2: replace state->mem_ctx with state->shader
v2: add bounds checking
v3: use nir_intrinsic_range() for bounds checking
v3: fix issue where partially out-of-bounds reads are replaced with undefs
v4: fix merge conflicts during rebase
v5: split into two commits
v6: set constant_data to NULL after freeing (fixes nir_sweep()/Iris)
v7: don't remove the constant data if there are no constant loads
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Connor Abbott <[email protected]> (v6)
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Useful for load_constant folding.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
We only have the subgroup variant in NIR (equivalent to clockARB), so
only support that for now.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Working on the algebraic implementation, I was being driven nuts by my
editor not highlighting and handling indentation for the C code. It turns
out that it's basically not pass-specific code, and we can move it over to
the relevant .c file. Replaces 30KB of code with 34KB of data on my i965
build. No perf diff on shader-db (n=3)
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6cb ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Having passes generate these is just making more work for copy
propagation (and thus probably calling more optimization passes)
later. Noticed while trying to debug nir_opt_algebraic()
top-to-bottom having O(n^2) behavior due to not finding new matches in
replacement code.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Similar to the previous commit, we should also initialize
needs_helper_invocations here.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This matches what we do for uses_sample_qualifier, and what we
do in ir_set_program_inouts.cpp as well.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
From EXT_demote_to_helper_invocation, implemented with the existing
nir_intrinsic_is_helper_invocation.
Such builtin is necessary when using `demote` because we can't
redefine the value of gl_HelperInvocation (since it is an input
variable).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
When the EXT_demote_to_helper_invocation extension is enabled,
`demote` is treated as a keyword, and produces an ir_demote.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To represent the new `demote` keyword when using
EXT_demote_to_helper_invocation extension. Most of the changes are to
include it in the visitors.
Demote is not considered a control flow, so also include an empty
visit member function in ir_control_flow_visitor.
Only NIR actually supports `demote`, so assert the translations for
TGSI and Mesa's gl_program -- since the demote is not expected to
appear for those.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
These optimizations are already covered after lowering.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.
This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prerequisite to avoid following radv linking error happening with aco
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so
...
external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178:
error: undefined reference to 'nir_divergence_analysis'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: df86c5f ("nir: add divergence analysis pass.")
Signed-off-by: Mauro Rossi <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glsl 4.4 spec section '5.9 expressions':
"The operator is multiply (*), where both operands are matrices or one operand is a vector and the
other a matrix. A right vector operand is treated as a column vector and a left vector operand as a
row vector. In all these cases, it is required that the number of columns of the left operand is equal
to the number of rows of the right operand. Then, the multiply (*) operation does a linear
algebraic multiply, yielding an object that has the same number of rows as the left operand and the
same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations”
explains in more detail how vectors and matrices are operated on."
This fix disallows a multiplication of incompatible matrices like:
mat4x3(..) * mat4x3(..)
mat4x2(..) * mat4x2(..)
mat3x2(..) * mat3x2(..)
....
CC: <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664
Signed-off-by: Andrii Simiklit <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We include shader_enums.h from freedreno's compiler for both GL and
Vulkan, and the main/config.h include resulted in polluting the
namespace with things like MAX_VIEWPORTS that other Vulkan drivers use
as their driver-specific maximums.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16328255 -> 16315391 (-0.08%)
instructions in affected programs: 218318 -> 205454 (-5.89%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 72 x̄: 13.02 x̃: 10
helped stats (rel) min: 0.33% max: 16.04% x̄: 6.27% x̃: 4.88%
95% mean confidence interval for instructions value: -13.69 -12.35
95% mean confidence interval for instructions %-change: -6.55% -5.99%
Instructions are helped.
total cycles in shared programs: 363683977 -> 363615417 (-0.02%)
cycles in affected programs: 1475193 -> 1406633 (-4.65%)
helped: 923
HURT: 36
helped stats (abs) min: 1 max: 624 x̄: 75.78 x̃: 48
helped stats (rel) min: 0.08% max: 13.89% x̄: 5.20% x̃: 5.08%
HURT stats (abs) min: 1 max: 179 x̄: 38.58 x̃: 4
HURT stats (rel) min: 0.06% max: 16.56% x̄: 3.33% x̃: 0.29%
95% mean confidence interval for cycles value: -75.88 -67.10
95% mean confidence interval for cycles %-change: -5.10% -4.66%
Cycles are helped.
Sandy Bridge
total instructions in shared programs: 10785779 -> 10785654 (<.01%)
instructions in affected programs: 13855 -> 13730 (-0.90%)
helped: 67
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.87 x̃: 1
helped stats (rel) min: 0.20% max: 3.45% x̄: 0.97% x̃: 0.78%
95% mean confidence interval for instructions value: -2.47 -1.26
95% mean confidence interval for instructions %-change: -1.13% -0.81%
Instructions are helped.
total cycles in shared programs: 153704799 -> 153704481 (<.01%)
cycles in affected programs: 101509 -> 101191 (-0.31%)
helped: 38
HURT: 13
helped stats (abs) min: 1 max: 38 x̄: 12.53 x̃: 16
helped stats (rel) min: 0.07% max: 2.69% x̄: 0.87% x̃: 0.53%
HURT stats (abs) min: 1 max: 36 x̄: 12.15 x̃: 7
HURT stats (rel) min: 0.06% max: 2.53% x̄: 0.73% x̃: 0.44%
95% mean confidence interval for cycles value: -10.24 -2.24
95% mean confidence interval for cycles %-change: -0.75% -0.17%
Cycles are helped.
LOST: 2
GAINED: 0
No shader-db change on Iron Lake or GM45.
|
|
|
|
|
|
|
| |
This allows the reslut of mov and bcsel to be separately interpreted as
float or int depending on the use.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders are hurt by this change because now a
load_const(0x00000000) is not recognized as eq_zero when loaded as a
float. This behavior is restored in a later patch (nir/range-analysis:
Use types to provide better ranges from bcsel and mov).
v2: Add a comment about reinterpretation of int/uint/bool. Suggested by
Caio. Rewrite condition the check for types being float versus checking
for types not being all the things that aren't float.
Fixes: 405de7ccb6c ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16327543 -> 16328255 (<.01%)
instructions in affected programs: 55928 -> 56640 (1.27%)
helped: 0
HURT: 208
HURT stats (abs) min: 1 max: 16 x̄: 3.42 x̃: 3
HURT stats (rel) min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12%
95% mean confidence interval for instructions value: 3.06 3.79
95% mean confidence interval for instructions %-change: 1.17% 1.46%
Instructions are HURT.
total cycles in shared programs: 363682759 -> 363683977 (<.01%)
cycles in affected programs: 325758 -> 326976 (0.37%)
helped: 44
HURT: 133
helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5
helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29%
HURT stats (abs) min: 1 max: 157 x̄: 20.28 x̃: 14
HURT stats (rel) min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73%
95% mean confidence interval for cycles value: 0.38 13.39
95% mean confidence interval for cycles %-change: -0.06% 0.96%
Inconclusive result (%-change mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs: 10787433 -> 10787443 (<.01%)
instructions in affected programs: 1842 -> 1852 (0.54%)
helped: 0
HURT: 10
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.36% 1.10%
Instructions are HURT.
total cycles in shared programs: 153724543 -> 153724563 (<.01%)
cycles in affected programs: 8407 -> 8427 (0.24%)
helped: 1
HURT: 3
helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18
helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98%
HURT stats (abs) min: 4 max: 18 x̄: 12.67 x̃: 16
HURT stats (rel) min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72%
95% mean confidence interval for cycles value: -21.31 31.31
95% mean confidence interval for cycles %-change: -1.11% 1.46%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on Iron Lake or GM45.
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When handling two variables with overlapping locations, we process the
one with lower location first, and then extend the location ->
driver_location map to guarantee that it's contiguous for the second
variable too. But the loop had the wrong bound, so we weren't extending
the map 100%, which could lead to problems later such as an incorrect
num_inputs. The loop index i is an index into the slots of the variable,
so we need to stop at the final slot of the variable (var_size) instead
of the number of unassigned slots.
This fixes
spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-interleave-range
on radeonsi NIR.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, we'll incorrectly round off huge values to the nearest
representable double instead of keeping it at the exact value as
we're supposed to.
Found by inspecting compiler-warnings.
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: 85faf5082f ("glsl: Add 64-bit integer support for constant expressions")
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This can happen with loops with unreachable exits which are later
optimized away.
Fixes assertion in dEQP-VK.graphicsfuzz.unreachable-loops with RADV.
Cc: [email protected]
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some piglit tests on radeonsi NIR where a varying is
initialized to a constant array in the vertex shader. Varying packing
after nir_lower_io_to_temporaries creates writemasked stores which
persist after pulling the constant initialization down into the fragment
shader.
While we're here, rewrite handle_constant_store() to do the loop over
components outside the switch, so that we don't have to duplicate the
writemask checking for every bitsize.
Fixes: 1235850522c ("nir: Add a large constants optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
It confuses radeonsi.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a3268599f3c9, I attempted to fix nir_repair_ssa for unreachable
blocks. However, that commit missed the possibility that the use is in
a block which, itself, is unreachable. In this case, we can end up in
an infinite loop trying to replace a def with itself. Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it. Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def. That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.
Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I observed this pattern in several shaders in Hand of Fate 2 while
investigating bugzilla #111490. This also led to the related
bugzilla #111578. The shaders from HoF2 are *not* in shader-db.
Reviewed-by: Kenneth Graunke <[email protected]>
Skylake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 16222621 -> 16205419 (-0.11%)
instructions in affected programs: 798418 -> 781216 (-2.15%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35
helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09%
95% mean confidence interval for instructions value: -33.22 -29.56
95% mean confidence interval for instructions %-change: -3.11% -2.56%
Instructions are helped.
total cycles in shared programs: 364676209 -> 363345763 (-0.36%)
cycles in affected programs: 112810504 -> 111480058 (-1.18%)
helped: 546
HURT: 7
helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340
helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08%
HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43
HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35%
95% mean confidence interval for cycles value: -2884.33 -1927.41
95% mean confidence interval for cycles %-change: -1.59% -1.21%
Cycles are helped.
total spills in shared programs: 8870 -> 8514 (-4.01%)
spills in affected programs: 1230 -> 874 (-28.94%)
helped: 161
HURT: 0
total fills in shared programs: 21901 -> 21348 (-2.52%)
fills in affected programs: 2120 -> 1567 (-26.08%)
helped: 155
HURT: 5
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 14994910 -> 14975495 (-0.13%)
instructions in affected programs: 839033 -> 819618 (-2.31%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49
helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22%
95% mean confidence interval for instructions value: -37.46 -33.40
95% mean confidence interval for instructions %-change: -3.12% -2.70%
Instructions are helped.
total cycles in shared programs: 386032453 -> 384450722 (-0.41%)
cycles in affected programs: 117807357 -> 116225626 (-1.34%)
helped: 547
HURT: 6
helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926
helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31%
HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29
HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65%
95% mean confidence interval for cycles value: -3060.28 -2660.27
95% mean confidence interval for cycles %-change: -1.59% -1.37%
Cycles are helped.
total spills in shared programs: 23372 -> 21869 (-6.43%)
spills in affected programs: 11730 -> 10227 (-12.81%)
helped: 352
HURT: 0
total fills in shared programs: 34747 -> 35351 (1.74%)
fills in affected programs: 11013 -> 11617 (5.48%)
helped: 3
HURT: 347
Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956420 -> 11956126 (<.01%)
instructions in affected programs: 14898 -> 14604 (-1.97%)
helped: 98
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -2.18% -1.98%
Instructions are helped.
total cycles in shared programs: 178791217 -> 178790792 (<.01%)
cycles in affected programs: 149763 -> 149338 (-0.28%)
helped: 91
HURT: 7
helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16
helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18%
HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322
HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41%
95% mean confidence interval for cycles value: -18.94 10.27
95% mean confidence interval for cycles %-change: -1.28% 0.49%
Inconclusive result (value mean confidence interval includes 0).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
|
|
|
|
|
|
|
|
| |
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Refactor the code to avoid calling a lot of time to auxiliary functions
when it is not really needed.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
New added cases "stole" the previous break.
Fixes: 420ad0a1a3d ("spirv: check support for SPV_KHR_float_controls capabilities")
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.
We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.
The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.
There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:
total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1
total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0
Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.
|
|
|
|
|
|
|
|
|
|
|
| |
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.
Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
Until now, it was using the floating point version of fmin/fmax,
instead of the double version.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- Replace hard coded value with DBL_MIN (Connor).
v3:
- Have into account the FLOAT_CONTROLS_DENORM_PRESERVE_FP64
flag (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Connor Abbott <[email protected]> [v2]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to VK_KHR_shader_float_controls:
"Denormalized values obtained via unpacking an integer into a vector
of values with smaller bit width and interpreting those values as
floating-point numbers must: be flushed to zero, unless the entry
point is declared with the code:DenormPreserve execution mode."
v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).
v3:
- Adapt to use the new NIR lowering framework (Andres).
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Connor Abbott <[email protected]> [v2]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
execution mode
If FLOAT_CONTROLS_SIGNED_ZERO_INF_NAN_PRESERVE or
FLOAT_CONTROLS_DENORM_FLUSH_TO_ZERO are enabled, do not apply the
inexact optimizations so the VK_KHR_shader_float_controls execution
mode is respected.
v2:
- Do not apply inexact optimizations if SHADER_DENORM_FLUSH_TO_ZERO is
enabled (Andres).
v3:
- Updated to renamed shader info member (Andres).
v4:
- Directly access execution mode instead of dragging it by parameter (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Connor Abbott <[email protected]> [v1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.
For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.
Suggested-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
float16 destinations
v2:
- Move the op-code specific knowledge to nir_opcodes.py even if it
means a rount trip conversion (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|