| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a hang in shadertoy for radeonsi where a buffer was initialized with:
value -= value
with value being undefined.
In this case LLVM replace the operation with an assignment to NaN.
Cc: 19.1 19.2 <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111241
Reviewed-by: Marek Olšák <[email protected]>
(cherry picked from commit 47cc660d9c19572e5ef2dce7c8ae1766a2ac9885)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend. This was encountered in some Unreal4 tech
demos in shader-db. The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.
The fixes tag is a bit a lie. The actual bug was introduced about
26,000 commits earlier in 371c4b3c48f ("nir: Recognize open-coded
bitfield_reverse."). Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist. Hopefully nobody will care to
fix this on an earlier Mesa release.
Reviewed-by: Matt Turner <[email protected]>
Fixes: 7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
(cherry picked from commit d3fd1c761aab01e06665180ab86c9528c0b285b2)
|
|
|
|
|
|
|
|
|
|
| |
Without loop_prepare_for_unroll loops are losing phis.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111411
Fixes: 5db98195 "nir: add loop unroll support for wrapper loops"
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 84b3ef6a96eabc28b18e8cdf1b0d61826b1a8a67)
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: 414148cdc124 "nir: Support deref instructions in loop_analyze"
(cherry picked from commit 204846ad062fe4e154406fa2d9093cdab4461ea2)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These were left after a rebase and happen to make
NIR_INTRINSIC_SWIZZLE_MASK == NIR_INTRINSIC_SRC_ACCESS, which is how it
was noticed.
Fixes: 6f20643b471a851c936f ("nir: Allow qualifiers on copy_deref and image instructions")
Cc: Connor Abbott <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
(cherry picked from commit 5d7bcac4e711bc278eabf198d7d5016b77d9eb0e)
|
|
|
|
|
|
|
|
|
|
| |
We can have a access flag already set here so just augment the
existing ones.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 0fb61dfdeb ("spirv: propagate access qualifiers through ssa & pointer")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 7deb5ec0e89769382fb5dd86aa5305001ae413fa)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not only variables can be flagged as NonUniformEXT but also
expressions. We're currently ignoring it in an expression such as :
imageLoad(data[nonuniformEXT(rIndex)], 0)
The associated SPIRV :
OpDecorate %69 NonUniformEXT
...
%69 = OpLoad %61 %68
This changes propagates access qualifiers through ssa & pointers so
that when it hits a OpLoad/OpStore style instructions, qualifiers are
not forgotten.
Fixes failure the following tests :
dEQP-VK.descriptor_indexing.*
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 8ed583fe523703 ("spirv: Handle the NonUniformEXT decoration")
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 0fb61dfdebac802e4b4c7b5dbebf3d7ba1e60dc2)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactor allows for common code to apply decoration on all
ssa/pointer values. In particular this will allow to propagage access
qualifiers.
Signed-off-by: Lionel Landwerlin <[email protected]>
Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 86b53770e1ea6e79452ccc97bab829ad58ffc5fd)
[Lionel Landwerlin: patch adapted for 19.1 branch]
|
|
|
|
|
|
|
|
|
| |
In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 6f20643b471a851c936fc8b569cf05dcd6e6e7fe)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPIRV added the ability to access variables and have expressions non
dynamically uniform and because spirv_to_nir generates deref
instructions, we'll need to have that access there.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 8c330728f3094f2c836e022e57f003d0c82953ef)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Conflicts:
src/compiler/nir/nir.c
|
|
|
|
|
|
|
|
|
| |
Semantically, the memory barrier has to come first to wait
for the completion of pending memory requests.
Afterwards, the workgroups can be synchronized.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit e352b4d650d37730e5087792b9a74ef31d1974ab)
|
|
|
|
|
|
|
| |
Fixes: 14531d676b11999123c0 ("nir: make nir_const_value scalar")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
(cherry picked from commit 3acc4278ad4138ad3a914085aefd7c47d46e1ad4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles. Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition. The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.
Shader-db results on Kaby Lake:
total loops in shared programs: 4388 -> 4364 (-0.55%)
loops in affected programs: 29 -> 5 (-82.76%)
helped: 29
HURT: 5
Shader-db results on Haswell:
total loops in shared programs: 4370 -> 4373 (0.07%)
loops in affected programs: 2 -> 5 (150.00%)
helped: 2
HURT: 5
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit ff972c7a3a7e80a426b72f285902d35f6ca3b820)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code. In all these cases, it can be very tricky to do properly because
of swizzles. This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 8f7405ed9d473c1729d48c5add4f0d9fe147c75a)
[Juan A. Suarez: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Conflicts:
src/compiler/nir/nir.h
|
|
|
|
|
|
|
|
|
|
| |
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure. This makes their callers
significantly simpler.
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 0333649e638a38258957fd8b7e0367d73bbc7a80)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sources of phi instructions act as if they occur at the very end of the
predecessor block not the block in which the phi lives. In order to
handle them correctly, we have to skip phi sources on the normal
instruction walk and handle them as a separate walk over the successor
phis. While registers in phi instructions is a bit of an oddity it can
happen when we temporarily go out-of-SSA for control-flow manipulations.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111075
Cc: [email protected]
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 6fb685fe4b762c8030f86895707516e2481e9ece)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use alignment to calculate the stride associated with the pointer
types. That stride is used when the pointers are casted to arrays.
Note that size alone is not sufficient, e.g. struct { vec2 a; vec1 b;
} will have element an element size of 12 bytes, but the stride needs
to be 16 bytes to respect the 8 byte alignment.
Fixes: 050eb6389a8 "spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup"
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 026cfa10995ff3316476fa19507fa27adc531de5)
|
|
|
|
|
|
|
|
|
|
| |
We need this when doing full software 64-bit emulation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b3 "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 0ba508d7a3b6a006b5b8db1e865d33efc8d0abd5)
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071
Fixes: 2a74296f24ba "nir: add opt_if_loop_terminator()"
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 7a19e05e8c84152af3a15868f5ef781142ac8e23)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
None of the current code knows what to do with swizzles. Take the safe
option for now and bail if we see one. This does have a small shader-db
impact but it is at least safe.
Shader-db results on Kaby Lake:
total loops in shared programs: 4364 -> 4388 (0.55%)
loops in affected programs: 5 -> 29 (480.00%)
helped: 5
HURT: 29
Shader-db results on Haswell:
total loops in shared programs: 4373 -> 4370 (-0.07%)
loops in affected programs: 5 -> 2 (-60.00%)
helped: 5
HURT: 2
Fixes: 6772a17acc8ee "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 9a3cb6f5fec040dea4a229b93f789995b36f9c09)
|
|
|
|
|
|
|
|
|
|
|
| |
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means. Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way. We also use the new
constant constructors to build the correct size constants.
Fixes: 6772a17acc8ee "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 268ad47c1115be8a8444d8e0e40af71623f9d281)
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit ce5581e23e54be91e4c1ad6a6c5990eca6677ceb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions. Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true. If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.
Fixes: 9e6b39e1d521 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 9f7ffe41dd185487479ea8846df1f5cdbf1b83a6)
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 6e984bcb92cf5e8b7da7387bc73cf6519ea2f43d)
|
|
|
|
|
|
|
|
| |
This is simple now, but we're going to be adding a few more conditions
to this later.
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit a1c737927c0d96f26ce487930aa9a2ed323814c9)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is important because, for example nir_op_fne has
dest.dest.ssa.bit_size == 1, but the source operands can be 16-, 32-, or
64-bits. Fixing this helps partial redundancy elimination for compares
in a few more shaders.
v2: Add unit tests for nir_opt_comparison_pre that are fixed by this
commit.
All Intel platforms had similar results.
total instructions in shared programs: 17179408 -> 17179081 (<.01%)
instructions in affected programs: 43958 -> 43631 (-0.74%)
helped: 118
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.87 x̃: 2
helped stats (rel) min: 0.06% max: 4.12% x̄: 1.19% x̃: 0.81%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 5.83% max: 6.06% x̄: 5.94% x̃: 5.94%
95% mean confidence interval for instructions value: -3.08 -2.37
95% mean confidence interval for instructions %-change: -1.30% -0.85%
Instructions are helped.
total cycles in shared programs: 360959066 -> 360942386 (<.01%)
cycles in affected programs: 774274 -> 757594 (-2.15%)
helped: 111
HURT: 4
helped stats (abs) min: 1 max: 1591 x̄: 169.49 x̃: 36
helped stats (rel) min: <.01% max: 24.43% x̄: 8.86% x̃: 2.24%
HURT stats (abs) min: 1 max: 2068 x̄: 533.25 x̃: 32
HURT stats (rel) min: 0.02% max: 5.10% x̄: 3.06% x̃: 3.56%
95% mean confidence interval for cycles value: -200.61 -89.47
95% mean confidence interval for cycles %-change: -10.32% -6.58%
Cycles are helped.
Reviewed-by: Jason Ekstrand <[email protected]> [v1]
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Fixes: be1cc3552bc ("nir: Add nir_const_value_negative_equal")
(cherry picked from commit 0ac5ff9ecb26ebc07a48e4f15539f975cef9b82a)
|
|
|
|
|
|
|
|
|
|
| |
Each tests has a comment with the expected before and after NIR. The
tests don't actually check this. The tests only check whether or not
the optimization pass reported progress. I couldn't think of a robust,
future-proof way to check the before and after code.
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit b08d7040518cdf76792952ceef72cadaa54d0179)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1):
For objects in the Uniform, StorageBuffer, or PushConstant storage
classes, the element’s address or location is calculated using a
stride, which will be the Base-type’s Array Stride when the Base
type is decorated with ArrayStride. For all other objects, the
implementation will calculate the element’s address or location.
For non-CL shaders the driver should layout the Workgroup storage
class, so override any explicitly set ArrayStride in the shader. This
currently fixes only the lower_workgroup_access_to_offsets case, which
is used by anv.
Reviewed-by: Juan A. Suarez <[email protected]>
(cherry picked from commit 050eb6389a8867e6173644fbb6b2d13ad0db454b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix round64 function to handle round to nearest even cases specially
with positive and negative numbers with fraction part 0.5.
v2: 1) Simplify unused bits (Elie Tournier)
Fixes:
KHR-GL45.gpu_shader_fp64.builtin.round_dvec2
KHR-GL45.gpu_shader_fp64.builtin.round_dvec3
KHR-GL45.gpu_shader_fp64.builtin.round_dvec4
KHR-GL45.gpu_shader_fp64.builtin.roundeven_double
KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec2
KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec3
KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec4
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Elie Tournier <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
(cherry picked from commit 06807e1948f1bced9806b00908c892f1e3c3db5b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Incrementing the iteration count was intended to fix an off-by-one error
when the first terminator was superseded by a later terminator. If
there is no first terminator or later terminator, there is no off-by-one
error. Incrementing the loop count creates one. This can be seen in
loops like:
do {
if (something) {
// No breaks or continues here.
}
} while (false);
Reviewed-by: Timothy Arceri <[email protected]>
Tested-by: Abel Briggs <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110953
Fixes: 646621c66da ("glsl: make loop unrolling more like the nir unrolling path")
(cherry picked from commit ee1c69faddb3624ace6548dafaff50549a031380)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VaryingNames array has NumVaryings entries. But BufferStride is
a small array of MAX_FEEDBACK_BUFFERS (4) entries. Programs with
more than 4 varyings would read out of bounds.
Also, BufferStride is set based on the shader itself, which means that
it's inherently already included in the hash, and doesn't need to be
included again. At the point when shader_cache_read_program_metadata
is called, the linker hasn't even set those fields yet. So, just drop
it entirely.
Fixes valgrind errors in KHR-GL45.transform_feedback.linking_errors_test.
Fixes: 6d830940f78 glsl/shader_cache: Allow shader cache usage with transform feedback
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 3c10a2726bcf686f03e31e79e40786e3894ff063)
|
|
|
|
|
|
|
|
| |
Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars"
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
(cherry picked from commit d96878a66a559f6690f01e82f06fcf92ae958d3c)
|
|
|
|
|
|
|
|
|
|
| |
Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).
Fixes: af355aaa071 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit e24a7840f60ac2290761ea2dc2831e8c3ba8bbfc)
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we inlined cf_node_has_side_effects into node_is_dead, all the
conditions flipped and we forgot to flip one. Fortunately, it doesn't
matter right now because no one uses this pass on shaders with more than
one function.
Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects"
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 8948048c6f01209bac0051e41cd84c38853bd251)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a problem where the same instruction gets replaced twice.
This was happening when the replaced instruction would be at the end
of a block.
Replacement of :
if ssa_8 {
....
intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf */ /* image_array=false */ /* format=34836 */ /* access=32 */
}
Would be :
if ssa_8 {
loop {
vec1 32 ssa_47 = intrinsic read_first_invocation (ssa_44) ()
vec1 1 ssa_48 = ieq ssa_47, ssa_44
if ssa_48 {
loop {
vec1 32 ssa_49 = intrinsic read_first_invocation (ssa_44) ()
vec1 1 ssa_50 = ieq ssa_49, ssa_44
if ssa_50 {
intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf */ /* image_array=false */ /* format=34836 */ /* access=32 */
break
} else {
....
}
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 366811bedb67ae7d31a02ea9b1f9fa942fb93602)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When num_state_slots is 0, don't create the array. This was
triggering the following assert when running vkcube with
NIR_TEST_CLONE=1
vkcube: ../src/compiler/nir/nir_split_per_member_structs.c:66:
split_variable: Assertion `var->state_slots == NULL' failed.
Fixes: 9fbd390dd4b "nir: Add support for cloning shaders"
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 005cc9ae37ca45960d87389dc9eace5ed29d1b99)
|
|
|
|
|
|
|
|
|
|
|
| |
src/compiler/glsl_types.cpp:577: uninit_member: Non-static class member "packed" is not initialized in this constructor nor in any functions that it calls.
from Coverity.
Fixes: 659f333b3a4 (glsl: add packed for struct types)
Acked-by: Ilia Mirkin <[email protected]>
(cherry picked from commit b2d4d08a5cae29759bdbd4ac4e942ea372fe7735)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, allow the case for negative powers of two. Then ensure that we
use the absolute value of the non-constant value to calculate the
quotient -- this was hinted in the code by the name 'uq'.
This fixes an issue when 'd' is positive and 'n' is negative. The
ishr will propagate the negative sign and we'll use nir_ineg() again,
incorrectly.
v2: First version used only ishr, but that isn't sufficient, since it
never can produce a zero as a result. (Jason)
Allow negative powers of two. (Caio)
Fixes: 74492ebad94 "nir: Add a pass for lowering integer division by constants"
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 8a995f2b5e1e3f2a2eafd32870ebfb43b5cfdf27)
|
|
|
|
|
|
|
|
|
|
|
| |
This pass moves instructions around and adds control-flow in the
middle of blocks. We need to use nir_foreach_instr_safe to ensure that
we iterate over instructions correctly anyway.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit e04cf0b61269ca60b3260d81d94e625965d39901)
|
|
|
|
|
|
|
|
|
| |
Obviously missing the instruction insertion into the SSA list.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 391a836e8fb1c84170f3aa7550f0b347d31528f3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 7acc8652268205a266068ea4d059eccce43e1f78.
With these optimizations in place, the extra constant folding added in
the next commit extends some live ranges of 0.0 and ±1.0 constants, and
that causes several hundred shaders to have more spills and fills.
I believe this optimization we made basically irrelevant by 7725d609387
"intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))".
All Gen7.5+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225303 -> 17224634 (<.01%)
instructions in affected programs: 879402 -> 878733 (-0.08%)
helped: 679
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.02 -0.95
95% mean confidence interval for instructions %-change: -0.26% -0.22%
Instructions are helped.
total cycles in shared programs: 360842595 -> 360828542 (<.01%)
cycles in affected programs: 110443594 -> 110429541 (-0.01%)
helped: 389
HURT: 265
helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28
helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11%
HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48
HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10%
95% mean confidence interval for cycles value: -75.65 32.67
95% mean confidence interval for cycles %-change: -0.49% -0.06%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 12159 -> 12161 (0.02%)
spills in affected programs: 13 -> 15 (15.38%)
helped: 0
HURT: 1
total fills in shared programs: 25207 -> 25208 (<.01%)
fills in affected programs: 25 -> 26 (4.00%)
helped: 0
HURT: 1
Ivy Bridge
total instructions in shared programs: 12082019 -> 12082013 (<.01%)
instructions in affected programs: 1033 -> 1027 (-0.58%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.78% -0.45%
Instructions are helped.
total cycles in shared programs: 179849270 -> 179849157 (<.01%)
cycles in affected programs: 4735 -> 4622 (-2.39%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18
helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36%
95% mean confidence interval for cycles value: -82.73 26.23
95% mean confidence interval for cycles %-change: -7.98% 2.28%
Inconclusive result (value mean confidence interval includes 0).
Sandy Bridge
total instructions in shared programs: 10882750 -> 10882748 (<.01%)
instructions in affected programs: 266 -> 264 (-0.75%)
helped: 2
HURT: 0
Iron Lake
total cycles in shared programs: 188609440 -> 188609448 (<.01%)
cycles in affected programs: 4320 -> 4328 (0.19%)
helped: 0
HURT: 2
GM45
total cycles in shared programs: 129016868 -> 129016872 (<.01%)
cycles in affected programs: 2302 -> 2306 (0.17%)
helped: 0
HURT: 1
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit d2a9ba03e30602f040687da325470d72eeddef1a)
[Juan: resolve trivial conflicts]
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Conflicts:
src/compiler/nir/nir_opt_algebraic.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit a99c360a4630 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.
Cc: Rob Clark <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alistair Strachan <[email protected]>
Cc: Greg Hartman <[email protected]>
Cc: Tapani Pälli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Fixes: a99c360a463 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: John Stultz <[email protected]>
(cherry picked from commit c7f2145b4b1551d521de2303b0dc97b56a0e3907)
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
with that we can simplify code where nir vectors are created
v2: merge both lines in nir_vec
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: rename to nir_build_alu_src_arr
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: use vtn_push_ssa and vtn_ssa_value
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl. It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float. Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information. However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.
Reviewed-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Just don't emit the transform array at all if there are no transforms
v2:
- Don't use len(array) > 0 (Dylan)
- Keep using ARRAY_SIZE to make the generated C code easier to read
(Jason).
|
|
|
|
|
|
|
| |
v2: - Use new with_shader_cache variable instead of
host_machine.system() == 'windows'
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|