| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jason pointed out that variable dereferences in NIR are really part of
their parent instruction, and should have the same lifetime.
Unlike in GLSL IR, they're not used very often - just for intrinsic
variables, call parameters & return, and indirect samplers for
texturing. Also, nir_deref_var is the top-level concept, and
nir_deref_array/nir_deref_record are child nodes.
This patch attempts to allocate nir_deref_vars out of their parent
instruction, and any sub-dereferences out of their parent deref.
It enforces these restrictions in the validator as well.
This means that freeing an instruction should free its associated
dereference chain as well. The memory sweeper pass can also happily
ignore them.
v2: Rename make_deref to evaluate_deref and make it take a nir_instr *
instead of void *. This involves adding &instr->instr everywhere.
(Requested by Jason Ekstrand.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We can't allocate them out of the nir_ssa_def itself, because it may not
be ralloc'd (for example, nir_dest embeds a nir_ssa_def).
However, allocating them out of the instruction should work.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Phi sources are part of the phi instruction and should have the same
lifetime.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The lifetime of the params array needs to be match the nir_call_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
These don't work in MSVC or in older versions of GCC
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89899
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).
A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:
http://people.freedesktop.org/~robclark/div-lowering.py
I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Eric Anholt <[email protected]> (vc4)
|
|
|
|
|
|
|
|
| |
v2: discovered that i2b/b2i are also confused
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0. But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).
v2: use b2f rather than u2f
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
As of da5ec2a, we allocate instruction sources out of the instruction
itself. When we realloc the texture sources we need to use the right
memory context or ralloc will get angry and assert-fail
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a pass to L1-normalize cube-map coordinates. Some hardware
such as i965 requires that largest cube-map coordinate is +-1. We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.
Reviewed-by: Jordan Justen <[email protected]>
v2 (Suggested by Eric):
- Do a vector fabs and split into components later
- Move to core NIR
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We use nir_ssa_defs for nir_builder args, so this takes a nir_src and
makes one so it can be passed in.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
So far we'd only used nir_builder to build brand new programs. But if
we're doing modifications to instructions (like in a lowering pass), then
we want to generate new stuff before the instruction we're modifying.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The lifetime of the sources array needs to be match the nir_tex_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
These sets are part of the block, and their lifetime needs to match the
block itself. So, allocate them using the block itself as the context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The lifetime of each register's use/def/if_use sets needs to match the
register itself. So, allocate them using the register itself as the
context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.
We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We can just pass a pointer to the list of variables, and reuse the code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
instructions in affected programs: 2858 -> 2808 (-1.75%)
helped: 12
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The rcp(log(x)) pattern affects instruction counts.
instructions in affected programs: 144 -> 138 (-4.17%)
helped: 6
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
No changes in shader-db.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs: 4876 -> 4720 (-3.20%)
helped: 58
HURT: 10
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs: 34773 -> 33083 (-4.86%)
helped: 147
HURT: 6
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs: 355876 -> 349811 (-1.70%)
helped: 1455
HURT: 14
GAINED: 5
LOST: 0
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i965/nir: Use the dedicated ffma peephole
total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs: 1292790 -> 1268660 (-1.87%)
helped: 5999
HURT: 457
GAINED: 4
LOST: 9
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
i965/nir: Use the late optimizations
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
This optimization is repeated verbatim above
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions. To solve this, we play the
age-old header file trick and just #define around it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw. This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.
Reviewed-by: Kenneth Grunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value. Only, in NIR we want a proper
bool once again, so we compare with 0. This is a lot of pointless extra
instructions.
total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs: 1342 -> 1309 (-2.46%)
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand. This is common when generating NIR from TGSI.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These are nir_cf_nodes, not ALU instructions.
Also, use unreachable() to preempt said review feedback.
v2: Do it right (thanks Ilia).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These will be useful for prog->nir and tgsi->nir.
v2: Don't forget to mark nir_swizzle as inline (Eric).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Both prog->nir and tgsi->nir will want to use these.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Transform this into b2f(or(a, b)).
instructions in affected programs: 432 -> 430 (-0.46%)
helped: 2
Acked-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transform this into b2f(and(a, b)).
total instructions in shared programs: 6205448 -> 6204391 (-0.02%)
instructions in affected programs: 284030 -> 282973 (-0.37%)
helped: 903
HURT: 6
Acked-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|