| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we are outputting the same value to more than one output
component rewrite the inputs to read from a single component.
This will allow the duplicate varying components to be optimised
away by the existing opts.
shader-db results i965 (SKL):
total instructions in shared programs: 12869230 -> 12860886 (-0.06%)
instructions in affected programs: 322601 -> 314257 (-2.59%)
helped: 3080
HURT: 8
total cycles in shared programs: 317792574 -> 317730593 (-0.02%)
cycles in affected programs: 2584925 -> 2522944 (-2.40%)
helped: 2975
HURT: 477
shader-db results radeonsi (VEGA):
SGPRS: 31576 -> 31664 (0.28 %)
VGPRS: 17484 -> 17064 (-2.40 %)
Spilled SGPRs: 184 -> 167 (-9.24 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 583340 -> 569368 (-2.40 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 6162 -> 6270 (1.75 %)
Wait states: 0 -> 0 (0.00 %)
vkpipeline-db results RADV (VEGA):
Totals from affected shaders:
SGPRS: 14880 -> 15080 (1.34 %)
VGPRS: 10872 -> 10888 (0.15 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 674016 -> 668396 (-0.83 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 2708 -> 2704 (-0.15 %)
Wait states: 0 -> 0 (0.00 %
V2: bunch of tidy ups suggested by Jason
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This just cleans things up a little and make things more safe for
derefs.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be reused by the following patch.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former expects to see SSA-only things, but the latter injects registers.
The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.
Fixes: 11dc1307794e "nir: Add a bool to int32 lowering pass"
CC: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c61469255 "nir: Copy propagation between blocks"
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c8 "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.
v2: Remove misleading comment about sized literals (suggested by
Timothy). Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Timothy Arceri <[email protected]> [v1]
Reviewed-by: Timothy Arceri <[email protected]> [v1]
Fixes: 6bcd2af0863 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
|
|
|
|
|
|
|
|
|
| |
Tested on gen9.
v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
If we can't find the variable from the deref, just assume it isn't
invariant and continue on. This can happen if, for instance, we're
writing to a deref that points into an SSBO.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"There's no point in walking the program if we're never going to
actually lower anything."
Except we might lower compacted local arrays. In that case, modes will
be 0, but there is still lowering to be done.
This reverts commit 7f75cf2a9408b9af562e033ef6c1d1fd15141421.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Tested-by: Clayton Craft <[email protected]>
Cc: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Thomas Helland <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I botched some copy-and-paste and clamped to signed int max instead of
uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.
Fixes: d3e046e76c06 ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.
mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811
deus ex mankind divided 148: 62,845,304 => 62,829,640
deus ex mankind divided 2890: 71,922,686 => 71,922,686
dirt showdown 676: 69,238,607 => 69,238,607
dolphin ubershaders 210: 77,822,072 => 77,822,072
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found using pahole.
Changes in peak memory usage according to Valgrind massif:
mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571
deus ex mankind divided 148: 62,887,728 => 62,845,304
deus ex mankind divided 2890: 72,399,750 => 71,922,686
dirt showdown 676: 69,464,023 => 69,238,607
dolphin ubershaders 210: 78,359,728 => 77,822,072
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the old array in each value with a hash table in each value.
Changes in peak memory usage according to Valgrind massif:
mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971
deus ex mankind divided 148: 62,887,728 => 62,887,728
deus ex mankind divided 2890: 72,402,222 => 72,399,750
dirt showdown 676: 74,466,431 => 69,464,023
dolphin ubershaders 210: 109,630,376 => 78,359,728
Run-time change for a full run on shader-db on my Haswell desktop (with
-march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9
seconds on a 237 second run. The first time I sent this version of this
patch out, the run-time data was quite different. I had misconfigured
the script that ran the test, and none of the tests from higher GLSL
versions were run. These are generally more complex shaders, and they
are more affected by this change.
The previous version of this patch used a single hash table for the
whole phi builder. The mapping was from [value, block] -> def, so a
separate allocation was needed for each [value, block] tuple. There was
quite a bit of per-allocation overhead (due to ralloc), so the patch was
followed by a patch that added the use of the slab allocator. The
results of those two patches was not quite as good:
mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971
deus ex mankind divided 148: 62,887,728 => 62,887,728
deus ex mankind divided 2890: 72,402,222 => 72,402,222 *
dirt showdown 676: 74,466,431 => 72,443,591 *
dolphin ubershaders 210: 109,630,376 => 81,034,320 *
The * denote tests that are better now. In the tests that are the same
in both patches, the "after" peak memory usage was at a different
location. I did not check the local peaks.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D3D Booleans use a 32-bit 0/-1 representation. Because this previously
matched NIR exactly, we didn't have to really optimize for it. Now that
we have 1-bit Booleans, we need some specific optimizations to chew
through the D3D12-style Booleans.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15136811 -> 14967944 (-1.12%)
instructions in affected programs: 2457021 -> 2288154 (-6.87%)
helped: 8318
HURT: 10
total cycles in shared programs: 373544524 -> 359701825 (-3.71%)
cycles in affected programs: 151029683 -> 137186984 (-9.17%)
helped: 7749
HURT: 682
total loops in shared programs: 4431 -> 4399 (-0.72%)
loops in affected programs: 32 -> 0
helped: 21
HURT: 0
total spills in shared programs: 10290 -> 10051 (-2.32%)
spills in affected programs: 2532 -> 2293 (-9.44%)
helped: 18
HURT: 18
total fills in shared programs: 22203 -> 21732 (-2.12%)
fills in affected programs: 3319 -> 2848 (-14.19%)
helped: 18
HURT: 18
Note that a large chunk of the improvement fixing regressions caused by
switching to 1-bit Booleans. Previously, our ability to optimize D3D
booleans was improved by using the D3D representation directly in NIR.
Now that NIR does 1-bit bools, we need a few more optimizations.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a squash of a few distinct changes:
glsl,spirv: Generate 1-bit Booleans
Revert "Use 32-bit opcodes in the NIR producers and optimizations"
Revert "nir/builder: Generate 32-bit bool opcodes transparently"
nir/builder: Generate 1-bit Booleans in nir_build_imm_bool
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
We also enable it in all of the NIR drivers.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This just makes it nicely scale across bit sizes.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds support for 1-bit Booleans and integers. Booleans
obviously take a value of true or false. Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit contains three related changes. First, we define boolN_t
for N = 8, 16, and 64 and move the definition of boolN_vec to the loop
with the other vec definitions. Second, there's no reason why we need
the != 0 on the source because that happens implicitly when it's
converted to bool. Third, for destinations, we use a signed integer
type and just do -(int)bool_val which will give us the 0/-1 behavior we
want and neatly scales to all bit widths.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Later in this series, bool is not going to imply 32-bit.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results on Kaby Lake:
total instructions in shared programs: 15072525 -> 15072525 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This helps prevent regressions in later commits.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce5 "nir/constant_folding: fix incorrect bit-size check"
Fixes: 9076c4e289d "nir: update opcode definitions for different bit sizes"
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous code was creating a boolean by doing an arithmetic right-
shift by 31 which produces a boolean which is true if the argument is
negative. This is the same as the expression r < 0 which is much
simpler and doesn't depend on NIR's representation of booleans.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
nir_phi_builder_value_set_block_def too
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass did not correctly handle loops ending in:
if ssa_7 {
block block_8:
/* preds: block_7 */
continue
/* succs: block_1 */
} else {
block block_9:
/* preds: block_7 */
break
/* succs: block_11 */
}
The break will get eliminated by another opt but if this pass gets
called first (as it does on RADV) we ended up inserting
instructions after the break.
Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()")
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
I needed the same function for v3d. This was originally in d3e046e76c06
("nir: Pull some of intel's image load/store format conversion to
nir_format.h") before we made am istake about simplifying the function.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This helps a lot when debugging image load/store lowering on large
testcases. Unfortunately the Mesa enum name stuff is under src/mesa and
we can't get at it from the compiler.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick [email protected]
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick [email protected]
|
|
|
|
| |
Reviewed-by: Ian Romanick [email protected]
|