| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pulls the guts of function inlining into a builder helper so that
it can be used elsewhere. The rest of the infrastructure is still
needed for most inlining cases to ensure that everything gets inlined
and only ever once. However, there are use-cases where you just want to
inline one little thing. This new helper also has a neat trick where it
can seamlessly inline a function from one nir_shader into another.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This doesn't really change anything as the functions will all get
inlined anyway. However it does let us do a bit of the work earlier and
in a common place.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow. If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects. By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We already have one internally for int64 but we don't have a similar one
for doubles so we'll have to make one.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is set to True only for numeric conversion opcodes.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace done using:
find ./src -type f -exec sed -i -- \
's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace done using:
find ./src -type f -exec sed -i -- \
's/record_location_offset(/struct_location_offset(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace done using:
find ./src -type f -exec sed -i -- \
's/get_record_instance(/get_struct_instance(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.
Anyway, this is better than nothing and covers the most common builtins.
v2: add hadd proof from Jason
move some of the lowering into opt_algebraic and create new nir opcodes
simplify nextafter lowering
fix normalize lowering for inf
rework upsample to use nir_pack_bits
add missing files to build systems
v3: split lines of iadd/sub_sat expressions
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use formula with fewer operations
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: add assert in else clause
make local group intrinsics 32 bit wide
v3: always use 32 bit constant for local_size
v4: add comment by Jason
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: add some vtn_fail_ifs
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
we define it inside 'include/c99_math.h' so it is safe to use.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a shader_info field that tells the driver to use window
space coordinates for a given vertex shader. It also enables this feature
in radeonsi (the only NIR-capable driver that supported it in TGSI),
and makes tgsi_to_nir aware of it.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them. Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.
total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)
Acked-by: Ian Romanick <[email protected]> (nir)
|
|
|
|
|
|
|
| |
We were printing only when the channel was exactly the start channel, so
scalarized loads/stores would be missing the name on the rest.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
With the new deref changes, the old pointer_offset version may not be
the right one to call. Just call the generic one and let it sort it
out.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We can't pull it from the variable type because it might be an array of
blocks and not just the one block. While we're here, throw in some
error checking.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header. Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.
Fixes: cb98e0755f8d "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
No idea how this fell through the cracks besides the fact that the
sampler bound at 0 almost always works and the CTS isn't amazing. In
any case, this appears to have been broken for almost forever.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32
bits as needed instead of emitting mul and mul_high.
v2: Surround the switch case with curly braces (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Optimize a situation where we only need lower 32 bits from 64 bit
result.
Signed-off-by: Sagar Ghuge <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize mulExtended to use 32x32->64 multiplication.
Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.
v2: Add missing condition check (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <[email protected]>
Suggested-by: Matt Turner <Matt Turner <[email protected]>
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.
Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended
v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
2) Add lower_mul_2x32_64 flag (Matt Turner)
3) Remove associative property as bit size is different (Connor
Abbott)
v3: Fix indentation and variable naming convention (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.
Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage
v2: Add comment to the new program_resource_visitor::process function.
(Ilia Mirkin)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids regression on:
KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
that is uncovered by the following patch.
"glsl: fix recording of variables for XFB in TCS shaders"
v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
shaders" to avoid temporal regressions. (Illia Mirkin)
Cc: 19.0 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.
v2: Set ES 3.0 as minimum GLES version as required by the extension
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.
Suggested-by: Kenneth Graunke <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
The b2f can only produce 0.0 or 1.0, so the fsat does nothing.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.
In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?
As far as I can tell, it's just because our scheduler is terrible. In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).
Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader. Once that happens, everything
is different.
I looked at one vertex shader that was hurt (from Goat Simulator). In
that case, both the floor and fract are used. The optimization
eliminates the add, and it should allow better scheduling. In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.
In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions. It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.
total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.
Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.
total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have not investigated the result of doing this during code
generation. That should be possible, but it would be a bit more
effort.
All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2
total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2
This change has almost no effect right now. However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:
Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2
total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the helped shaders are in Deus Ex. I looked at a couple shaders,
and they have a pattern like:
vec1 32 ssa_373 = i2b32 ssa_345.w
vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0
...
vec1 32 ssa_377 = ine ssa_345.w, ssa_0
if ssa_377 {
...
vec1 32 ssa_416 = i2b32 ssa_385.w
vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374
...
}
The massive help occurs because the i2b32 is removed, then other passes
determine that ssa_374 must be ssa_20 inside the if-statement allowing
the first bcsel to also be deleted.
v2: Rebase on 1-bit Boolean changes.
v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by
Bas.
Skylake
total instructions in shared programs: 15241394 -> 15186287 (-0.36%)
instructions in affected programs: 890583 -> 835476 (-6.19%)
helped: 355
HURT: 0
helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149
helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59%
95% mean confidence interval for instructions value: -165.07 -145.39
95% mean confidence interval for instructions %-change: -6.42% -5.77%
Instructions are helped.
total cycles in shared programs: 373846583 -> 371023357 (-0.76%)
cycles in affected programs: 118972102 -> 116148876 (-2.37%)
helped: 343
HURT: 14
helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089
helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77%
HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019
HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11%
95% mean confidence interval for cycles value: -8723.28 -7093.12
95% mean confidence interval for cycles %-change: -2.57% -2.02%
Cycles are helped.
total spills in shared programs: 32401 -> 23465 (-27.58%)
spills in affected programs: 24457 -> 15521 (-36.54%)
helped: 343
HURT: 0
total fills in shared programs: 37866 -> 31765 (-16.11%)
fills in affected programs: 18889 -> 12788 (-32.30%)
helped: 343
HURT: 0
Broadwell and Haswell had similar results. (Haswell shown)
Haswell
total instructions in shared programs: 13764783 -> 13750679 (-0.10%)
instructions in affected programs: 1176256 -> 1162152 (-1.20%)
helped: 334
HURT: 21
helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47
helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37%
HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1
HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03%
95% mean confidence interval for instructions value: -43.99 -35.47
95% mean confidence interval for instructions %-change: -1.35% -1.08%
Instructions are helped.
total cycles in shared programs: 386511910 -> 385402528 (-0.29%)
cycles in affected programs: 143831110 -> 142721728 (-0.77%)
helped: 327
HURT: 39
helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570
helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96%
HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997
HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24%
95% mean confidence interval for cycles value: -3375.59 -2686.60
95% mean confidence interval for cycles %-change: -0.92% -0.64%
Cycles are helped.
total spills in shared programs: 100480 -> 97846 (-2.62%)
spills in affected programs: 84702 -> 82068 (-3.11%)
helped: 316
HURT: 21
total fills in shared programs: 96877 -> 94369 (-2.59%)
fills in affected programs: 69167 -> 66659 (-3.63%)
helped: 316
HURT: 9
No changes on Ivy Bridge or earlier platforms.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load. Such SSA
values don't help loading of the indirect unless we emit an if-ladder.
Copy_derefs are supported for indirects.
Also enable two tests that now pass.
v2: Remove unnecessary temporaries. Be clearer when identifying the
case where copy_entry doesn't help when we are dealing with an
indirect array_deref (of a vector). (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.
This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access. The vector tests are disabled and
will be enabled by a later commit.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.
Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w. Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.
The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.
It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.
Tests related to these cases are now enabled.
v2: Instead of asserting on invalid indices, "load" an undef and
remove the store. (Jason)
v3: Merge code path for the cases of is_array_deref_of_vector into the
regular code path. Add a base_index parameter to
value_set_from_value. (code changes by Jason)
v4: Removed the get_entry_for_deref helper, now being used only once.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Also replace uses of 0xf with the appropriate full mask created from
the number of components.
Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The name reflected this function role back when the pass also did dead
write elimination. So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.
This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:
%1 = OpLabel
OpLoopMerge %2 %3 None
OpBranchConditional %false %2 %2
%3 = OpLabel
OpBranch %1
%2 = OpLabel
[...]
We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.
This fixes dEQP-VK.graphicsfuzz.continue-and-merge.
CC: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.
Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.
This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.
Fixes: edded1237607 ("mesa: rework ParameterList to allow packing")
Reviewed-by: Marek Olšák <[email protected]>
|