| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of only lowering system from variables, lower most to intrinsics
and let the lowering framework immediately lower the intrinsic. This
will result in a bit more instruction churn but it means that NIR code
builders can just use intrinsics instead of everything having to go
through variables.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of having context-aware builder functions, just provide lowering
for the system value intrinsics and let nir_shader_lower_instructions
handle the recursion for us. This makes everything a bit simpler and
means that the lowering can also be used if something comes in as a
system value intrinsic rather than a load_deref.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
The stride was already overriden when using
lower_workgroup_access_to_offsets, so elaborate a bit the commentary
there.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use alignment to calculate the stride associated with the pointer
types. That stride is used when the pointers are casted to arrays.
Note that size alone is not sufficient, e.g. struct { vec2 a; vec1 b;
} will have element an element size of 12 bytes, but the stride needs
to be 16 bytes to respect the 8 byte alignment.
Fixes: 050eb6389a8 "spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup"
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need this when doing full software 64-bit emulation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b3 "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071
Fixes: 2a74296f24ba "nir: add opt_if_loop_terminator()"
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Right now we don't have cache support for SPIR-V shaders (from
ARB_gl_spirv). Right now they are properly skipped because they fall
on the ff shader code path (no key, no name), but it would be better
to update current comments, and add some guards.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocate UniformDataDefaults and fill in the data defaults when
linking a SPIR-V program. Among other things, this allows program
serialization to work.
It allows the following piglit test (when run on SPIR-V mode) to pass:
spec/arb_get_program_binary/execution/uniform-after-restore.shader_test
v2: use memcpy to initialize UniformDataDefaults
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
| |
and output variable names
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the ARB_program_interface_query specification:
"For the property TOP_LEVEL_ARRAY_SIZE, a single integer
identifying the number of active array elements of the top-level
shader storage block member containing to the active variable is
written to <params>. If the top-level block member is not
declared as an array, the value one is written to <params>. If
the top-level block member is an array with no declared size, the
value zero is written to <params>."
"For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
identifying the stride between array elements of the top-level
shader storage block member containing the active variable is
written to <params>. For top-level block members declared as
arrays, the value written is the difference, in basic machine
units, between the offsets of the active variable for consecutive
elements in the top-level array. For top-level block members not
declared as an array, zero is written to <params>."
v2: move top_level_array_size and stride into nir_link_uniforms_state
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_gl_spirv points that the offset must be explicit, however this is
true for 'root' types. For complex types, like struct members or
arrays of arraya, it needs to be computed.
We are not using the offset stored in the gl_buffer_variables during
the uniform blocks linking because currently we do not have a way to
relate a gl_buffer_variable with its corresponding gl_uniform_storage.
The GLSL path uses the name for that, but we can not rely on that
because names are optional in SPIR-V.
Notice that uniforms non-backed by a buffer object will have an offset
equal to -1, like in the GLSL path.
v2: add offset and var_is_in_block as per-variable state in
nir_link_uniforms_state (Arcady)
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
| |
v2: use link_util_should_add_buffer_variable() (Arcady)
Signed-off-by: Arcady Goldmints-Orlov <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: added TODO comment hinting possible future refactoring of
nir_build_program_resource_list and build_program_resource_list,
to avoid code duplication (Alejandro, to explicitly reflect a
valid concern from Timothy during the review).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
squashed on this one (Timothy)
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.
v2: add missing break (Timothy)
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.
With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:
../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.
Fixes: 158370ed2a0 ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8fd ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Hash tables were not destroyed at return.
v2: Use ralloc_context (Eric Anholt)
Signed-off-by: Yevhenii Kolesnikov <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.
v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.
v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64. Should that occur, the cases
can be added back.
All Gen6-Gen9 platforms had similar results. (Skylake shown)
total instructions in shared programs: 15041996 -> 15041184 (<.01%)
instructions in affected programs: 71776 -> 70964 (-1.13%)
helped: 312
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3
helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28%
95% mean confidence interval for instructions value: -2.66 -2.55
95% mean confidence interval for instructions %-change: -1.89% -1.61%
Instructions are helped.
total cycles in shared programs: 354303333 -> 354301807 (<.01%)
cycles in affected programs: 433742 -> 432216 (-0.35%)
helped: 206
HURT: 78
helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8
helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82%
HURT stats (abs) min: 1 max: 220 x̄: 35.95 x̃: 10
HURT stats (rel) min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56%
95% mean confidence interval for cycles value: -10.68 -0.06
95% mean confidence interval for cycles %-change: -0.99% -0.12%
Cycles are helped.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.
v2: Convert the pattern directly to flrp. There were negligible
improvements on Gen4 and Gen5, and Gen11 was actually hurt. I believe
the problem is this optimization conflicts with the (1-x)*y =>
ffma(-x, y, y) optimization on Gen11.
Skylake
total instructions in shared programs: 15046487 -> 15041996 (-0.03%)
instructions in affected programs: 194681 -> 190190 (-2.31%)
helped: 880
HURT: 20
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17%
95% mean confidence interval for instructions value: -5.25 -4.73
95% mean confidence interval for instructions %-change: -5.11% -4.36%
Instructions are helped.
total cycles in shared programs: 354340839 -> 354303333 (-0.01%)
cycles in affected programs: 1753622 -> 1716116 (-2.14%)
helped: 786
HURT: 182
helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22
helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84%
HURT stats (abs) min: 1 max: 440 x̄: 37.99 x̃: 9
HURT stats (rel) min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32%
95% mean confidence interval for cycles value: -45.90 -31.59
95% mean confidence interval for cycles %-change: -3.09% -2.50%
Cycles are helped.
All Gen6-Gen8 platforms had similar results. (Broadwell shown)
total instructions in shared programs: 15055907 -> 15051466 (-0.03%)
instructions in affected programs: 196370 -> 191929 (-2.26%)
helped: 871
HURT: 26
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12%
95% mean confidence interval for instructions value: -5.21 -4.69
95% mean confidence interval for instructions %-change: -4.99% -4.24%
Instructions are helped.
total cycles in shared programs: 387729170 -> 387699745 (<.01%)
cycles in affected programs: 1816409 -> 1786984 (-1.62%)
helped: 788
HURT: 172
helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22
helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76%
HURT stats (abs) min: 1 max: 404 x̄: 45.59 x̃: 14
HURT stats (rel) min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43%
95% mean confidence interval for cycles value: -35.69 -25.61
95% mean confidence interval for cycles %-change: -2.88% -2.40%
Cycles are helped.
total fills in shared programs: 34712 -> 34710 (<.01%)
fills in affected programs: 7 -> 5 (-28.57%)
helped: 1
HURT: 0
LOST: 0
GAINED: 2
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moving the add to the other end of the sequence allows it to be fused
into an FMA.
Ice Lake
total instructions in shared programs: 17173074 -> 16933147 (-1.40%)
instructions in affected programs: 7938745 -> 7698818 (-3.02%)
helped: 35583
HURT: 90
helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6
helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45%
HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1
HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77%
95% mean confidence interval for instructions value: -6.80 -6.65
95% mean confidence interval for instructions %-change: -5.32% -5.22%
Instructions are helped.
total cycles in shared programs: 360881386 -> 359533568 (-0.37%)
cycles in affected programs: 189489144 -> 188141326 (-0.71%)
helped: 27250
HURT: 6707
helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16
helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35%
HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14
HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27%
95% mean confidence interval for cycles value: -44.70 -34.68
95% mean confidence interval for cycles %-change: -2.75% -2.65%
Cycles are helped.
total spills in shared programs: 8943 -> 8829 (-1.27%)
spills in affected programs: 625 -> 511 (-18.24%)
helped: 6
HURT: 3
total fills in shared programs: 21815 -> 21719 (-0.44%)
fills in affected programs: 1653 -> 1557 (-5.81%)
helped: 7
HURT: 10
LOST: 11
GAINED: 3
Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15271996 -> 15040882 (-1.51%)
instructions in affected programs: 7193699 -> 6962585 (-3.21%)
helped: 33985
HURT: 30
helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6
helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85%
HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3
HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72%
95% mean confidence interval for instructions value: -6.87 -6.72
95% mean confidence interval for instructions %-change: -5.59% -5.48%
Instructions are helped.
total cycles in shared programs: 355520785 -> 354253799 (-0.36%)
cycles in affected programs: 185869148 -> 184602162 (-0.68%)
helped: 25824
HURT: 6287
helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16
helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41%
HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14
HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28%
95% mean confidence interval for cycles value: -44.70 -34.21
95% mean confidence interval for cycles %-change: -2.87% -2.76%
Cycles are helped.
total spills in shared programs: 8835 -> 8818 (-0.19%)
spills in affected programs: 613 -> 596 (-2.77%)
helped: 5
HURT: 2
total fills in shared programs: 21738 -> 21744 (0.03%)
fills in affected programs: 1348 -> 1354 (0.45%)
helped: 5
HURT: 11
LOST: 0
GAINED: 12
Haswell
total instructions in shared programs: 13447102 -> 13381508 (-0.49%)
instructions in affected programs: 3770735 -> 3705141 (-1.74%)
helped: 11999
HURT: 29
helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3
helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87%
HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3
HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82%
95% mean confidence interval for instructions value: -5.71 -5.19
95% mean confidence interval for instructions %-change: -2.39% -2.30%
Instructions are helped.
total cycles in shared programs: 376342236 -> 375690458 (-0.17%)
cycles in affected programs: 155699021 -> 155047243 (-0.42%)
helped: 8397
HURT: 2876
helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18
helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49%
HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22
HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41%
95% mean confidence interval for cycles value: -67.64 -48.00
95% mean confidence interval for cycles %-change: -0.99% -0.74%
Cycles are helped.
total spills in shared programs: 23134 -> 23184 (0.22%)
spills in affected programs: 1675 -> 1725 (2.99%)
helped: 13
HURT: 11
total fills in shared programs: 34550 -> 34686 (0.39%)
fills in affected programs: 1421 -> 1557 (9.57%)
helped: 13
HURT: 11
LOST: 0
GAINED: 11
Ivy Bridge
total instructions in shared programs: 12019642 -> 11987285 (-0.27%)
instructions in affected programs: 1532236 -> 1499879 (-2.11%)
helped: 5522
HURT: 110
helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3
helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88%
HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3
HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15%
95% mean confidence interval for instructions value: -6.25 -5.24
95% mean confidence interval for instructions %-change: -2.43% -2.26%
Instructions are helped.
total cycles in shared programs: 180214667 -> 179761900 (-0.25%)
cycles in affected programs: 31448723 -> 30995956 (-1.44%)
helped: 7191
HURT: 2838
helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17
helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40%
HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24
HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51%
95% mean confidence interval for cycles value: -53.34 -36.95
95% mean confidence interval for cycles %-change: -0.81% -0.53%
Cycles are helped.
total spills in shared programs: 3599 -> 3642 (1.19%)
spills in affected programs: 1180 -> 1223 (3.64%)
helped: 12
HURT: 2
total fills in shared programs: 4031 -> 4162 (3.25%)
fills in affected programs: 876 -> 1007 (14.95%)
helped: 12
HURT: 2
LOST: 6
GAINED: 5
Sandy Bridge
total instructions in shared programs: 10850686 -> 10822890 (-0.26%)
instructions in affected programs: 1247986 -> 1220190 (-2.23%)
helped: 4699
HURT: 102
helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3
helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88%
HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3
HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10%
95% mean confidence interval for instructions value: -6.10 -5.47
95% mean confidence interval for instructions %-change: -2.42% -2.30%
Instructions are helped.
total cycles in shared programs: 154044149 -> 153920095 (-0.08%)
cycles in affected programs: 26037392 -> 25913338 (-0.48%)
helped: 5974
HURT: 2521
helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16
helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84%
HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20
HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85%
95% mean confidence interval for cycles value: -16.31 -12.90
95% mean confidence interval for cycles %-change: -0.56% -0.45%
Cycles are helped.
total spills in shared programs: 2876 -> 2957 (2.82%)
spills in affected programs: 592 -> 673 (13.68%)
helped: 6
HURT: 35
total fills in shared programs: 3157 -> 3134 (-0.73%)
fills in affected programs: 402 -> 379 (-5.72%)
helped: 6
HURT: 0
LOST: 5
GAINED: 11
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64. Should that occur, the cases
can be added back.
v3: Add a couple more patterns that just move the negation around.
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.
Skylake
total instructions in shared programs: 15279687 -> 15256058 (-0.15%)
instructions in affected programs: 4344440 -> 4320811 (-0.54%)
helped: 23455
HURT: 18
helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1
HURT stats (rel) min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34%
95% mean confidence interval for instructions value: -1.01 -1.00
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.
total cycles in shared programs: 355593755 -> 355339981 (-0.07%)
cycles in affected programs: 162089552 -> 161835778 (-0.16%)
helped: 20467
HURT: 7158
helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6
helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58%
HURT stats (abs) min: 1 max: 4814 x̄: 47.46 x̃: 11
HURT stats (rel) min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98%
95% mean confidence interval for cycles value: -10.39 -7.98
95% mean confidence interval for cycles %-change: -0.57% -0.47%
Cycles are helped.
total spills in shared programs: 8843 -> 8835 (-0.09%)
spills in affected programs: 190 -> 182 (-4.21%)
helped: 2
HURT: 0
total fills in shared programs: 21738 -> 21738 (0.00%)
fills in affected programs: 372 -> 372 (0.00%)
helped: 1
HURT: 1
LOST: 12
GAINED: 22
Broadwell
total instructions in shared programs: 15290523 -> 15266818 (-0.16%)
instructions in affected programs: 4314738 -> 4291033 (-0.55%)
helped: 23391
HURT: 11
helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs) min: 1 max: 189 x̄: 18.09 x̃: 1
HURT stats (rel) min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50%
95% mean confidence interval for instructions value: -1.04 -0.99
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.
total cycles in shared programs: 388911660 -> 388830827 (-0.02%)
cycles in affected programs: 172903324 -> 172822491 (-0.05%)
helped: 15601
HURT: 13269
helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6
helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55%
HURT stats (abs) min: 1 max: 14904 x̄: 28.21 x̃: 6
HURT stats (rel) min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60%
95% mean confidence interval for cycles value: -4.20 -1.40
95% mean confidence interval for cycles %-change: -0.17% -0.08%
Cycles are helped.
total spills in shared programs: 23110 -> 23069 (-0.18%)
spills in affected programs: 656 -> 615 (-6.25%)
helped: 3
HURT: 1
total fills in shared programs: 34399 -> 34398 (<.01%)
fills in affected programs: 905 -> 904 (-0.11%)
helped: 3
HURT: 1
LOST: 6
GAINED: 23
Haswell
total instructions in shared programs: 13465303 -> 13441142 (-0.18%)
instructions in affected programs: 3726999 -> 3702838 (-0.65%)
helped: 22139
HURT: 347
helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12%
95% mean confidence interval for instructions value: -1.08 -1.07
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.
total cycles in shared programs: 376271308 -> 376273090 (<.01%)
cycles in affected programs: 167496811 -> 167498593 (<.01%)
helped: 13206
HURT: 13281
helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8
helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80%
HURT stats (abs) min: 1 max: 3828 x̄: 35.32 x̃: 8
HURT stats (rel) min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61%
95% mean confidence interval for cycles value: -1.33 1.47
95% mean confidence interval for cycles %-change: 0.22% 0.36%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 23158 -> 23134 (-0.10%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0
total fills in shared programs: 34580 -> 34550 (-0.09%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0
LOST: 23
GAINED: 13
Ivy Bridge
total instructions in shared programs: 12034154 -> 12014301 (-0.16%)
instructions in affected programs: 3636209 -> 3616356 (-0.55%)
helped: 18771
HURT: 459
helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11%
95% mean confidence interval for instructions value: -1.04 -1.02
95% mean confidence interval for instructions %-change: -0.86% -0.84%
Instructions are helped.
total cycles in shared programs: 180186960 -> 180175147 (<.01%)
cycles in affected programs: 44652745 -> 44640932 (-0.03%)
helped: 12979
HURT: 11033
helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6
helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74%
HURT stats (abs) min: 1 max: 4811 x̄: 37.61 x̃: 9
HURT stats (rel) min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69%
95% mean confidence interval for cycles value: -2.29 1.31
95% mean confidence interval for cycles %-change: 0.11% 0.26%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 3623 -> 3599 (-0.66%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0
total fills in shared programs: 4061 -> 4031 (-0.74%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0
LOST: 17
GAINED: 18
Sandy Bridge
total instructions in shared programs: 10853968 -> 10834932 (-0.18%)
instructions in affected programs: 3769957 -> 3750921 (-0.50%)
helped: 17944
HURT: 204
helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60%
HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel) min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93%
95% mean confidence interval for instructions value: -1.05 -1.04
95% mean confidence interval for instructions %-change: -0.81% -0.78%
Instructions are helped.
total cycles in shared programs: 153894864 -> 153885988 (<.01%)
cycles in affected programs: 50643925 -> 50635049 (-0.02%)
helped: 9361
HURT: 10534
helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4
helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22%
HURT stats (abs) min: 1 max: 1371 x̄: 16.42 x̃: 5
HURT stats (rel) min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27%
95% mean confidence interval for cycles value: -1.27 0.38
95% mean confidence interval for cycles %-change: -0.03% 0.04%
Inconclusive result (value mean confidence interval includes 0).
LOST: 6
GAINED: 24
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.
I'm not particularly enamored with the name of this flag. However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it. Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.
The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
The members of gl_DepthRangeParameters are declared to be highp in
GLSL ES specs.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Adds a third constructor to glsl_struct_field which has an extra
parameter to specify the precision.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There are two constructors for glsl_struct_field with different
parameters. Instead of repeating them for both constructors, this
patch adds a convenience macro. This will make it easier to add a
third constructor in a later patch.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.
v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
gl_MaxViewPorts to always be highp. (Eric Anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drivers only use lower_io for modes where pointers don't have a
meaningful value, and dereferences can always be traced back to a
variable. But there can be other modes, like global mode with
VK_EXT_buffer_device_address, where pointers cannot be traced back to a
variable, and lower_io would segfault on loads/stores of these since
nir_deref_instr_get_variable() would return NULL.
Just use the mode on the deref itself to filter out these modes before
we try to get the variable.
Fixes: 118a66df990 ("radv: Use NIR barycentric coordinates")
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles. Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition. The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.
Shader-db results on Kaby Lake:
total loops in shared programs: 4388 -> 4364 (-0.55%)
loops in affected programs: 29 -> 5 (-82.76%)
helped: 29
HURT: 5
Shader-db results on Haswell:
total loops in shared programs: 4370 -> 4373 (0.07%)
loops in affected programs: 2 -> 5 (150.00%)
helped: 2
HURT: 5
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure. This makes their callers
significantly simpler.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code. In all these cases, it can be very tricky to do properly because
of swizzles. This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
None of the current code knows what to do with swizzles. Take the safe
option for now and bail if we see one. This does have a small shader-db
impact but it is at least safe.
Shader-db results on Kaby Lake:
total loops in shared programs: 4364 -> 4388 (0.55%)
loops in affected programs: 5 -> 29 (480.00%)
helped: 5
HURT: 29
Shader-db results on Haswell:
total loops in shared programs: 4373 -> 4370 (-0.07%)
loops in affected programs: 5 -> 2 (-60.00%)
helped: 5
HURT: 2
Fixes: 6772a17acc8ee "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means. Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way. We also use the new
constant constructors to build the correct size constants.
Fixes: 6772a17acc8ee "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions. Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true. If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.
Fixes: 9e6b39e1d521 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <[email protected]>
|