| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Tested using the GLSL 1.30 tests for integer abs(). Not currently used,
but it was one of the new opcodes used by robclark's idiv lowering.
|
|
|
|
| |
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This will enable the driver to tell which regids to link up to which
MRT outputs.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This is needed for MRT support
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This adds all the plumbing to get EGL_EXT_image_dma_buf_import in
i915g.
Signed-off-by: Stéphane Marchesin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.
total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs: 117192 -> 116359 (-0.71%)
helped: 471
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
instructions in affected programs: 2858 -> 2808 (-1.75%)
helped: 12
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The rcp(log(x)) pattern affects instruction counts.
instructions in affected programs: 144 -> 138 (-4.17%)
helped: 6
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
No changes in shader-db.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs: 4876 -> 4720 (-3.20%)
helped: 58
HURT: 10
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs: 34773 -> 33083 (-4.86%)
helped: 147
HURT: 6
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
InputsRead is a 64-bit bitfield. Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.
Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32. Switch back to using < now that
the underlying problem is fixed.
Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
This is _mesa_fls() for 64-bit values.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
If we tell NIR to split ffma's, then we don't need seperate options
anymore.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering. Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem. We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs: 989359 -> 978057 (-1.14%)
helped: 5308
HURT: 97
GAINED: 78
LOST: 5
This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program. However, the damage is caused by changing a single
instruction from a ffma to an add. This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill. Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs: 355876 -> 349811 (-1.70%)
helped: 1455
HURT: 14
GAINED: 5
LOST: 0
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i965/nir: Use the dedicated ffma peephole
total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs: 1292790 -> 1268660 (-1.87%)
helped: 5999
HURT: 457
GAINED: 4
LOST: 9
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
i965/nir: Use the late optimizations
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
This optimization is repeated verbatim above
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions. To solve this, we play the
age-old header file trick and just #define around it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw. This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.
Reviewed-by: Kenneth Grunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.
v2: Use the correct variable name in the comments
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of using a distribution tarball (or a dirty git tree) one can
have the generated sources locally. Make configure.ac error out
otherwise, to alert that about the unmet requirement(s) of python/mako.
v2: Check only for a single file for each dependency.
Suggested-by: Matt Turner <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Found by Coverity.
|
|
|
|
|
|
| |
Ilia Mirkin found that I had forgotten to free the mutex in the error case.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
SCons does not build NIR yet.
Trivial.
|
|
|
|
|
|
| |
Add include path for generated nir_opcodes.h.
Trivial.
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value. Only, in NIR we want a proper
bool once again, so we compare with 0. This is a lot of pointless extra
instructions.
total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs: 1342 -> 1309 (-2.46%)
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand. This is common when generating NIR from TGSI.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
| |
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.
total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs: 62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs: 15494 -> 15240 (-1.64%)
v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
I'm not committing)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used by the VC4 driver for doing device-independent
optimization, and hopefully eventually replacing its whole IR. It also
may be useful to other drivers for the same reason.
v2: Add all of the instructions I was relying on tgsi_lowering to remove,
and more.
v3: Rebase on SSA rework of the builder.
v4: Use the NIR ineg operation instead of doing a src modifier.
v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I
expect, again).
v6: Fix handling of multi-channel KILL_IF sources.
v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than
a vector load_const. CSE doesn't recognize that srcs out of those
channels are actually all the same.
v8: Rebase on nir_builder auto-sizing, make the scalar arguments to
non-ALU instructions actually be scalars.
v9: Add support for if/loop instructions, additional texture targets, and
untested support for indirect addressing on temps.
v10: Rebase on master, drop bad comment about control flow and just choose
the X channel, use int comparison opcodes in LIT for now, drop unused
pipe_context argument..
v11: Fix translation of LRP (previously missed because I mis-translated
back out), use nir_builder init helpers.
v12: Rebase on master, adding explicit include of mtypes.h to get
INTERP_QUALIFIER_*
v13: Rebase on variables being in lists instead of hash tables, drop use
of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder
swizzle and fmov/imov_alu helpers, drop "struct" in front of
nir_builder, use nir_builder directly as the function arg in a lot of
cases, drop redundant members of ttn_compile that are also in
nir_builder, drop some half-baked malloc failure handling.
v14: The indirect uniform src0 should be scalar, not vector (noticed as
odd by robclark, confirmed by cwabbott). Apply Ken's review to
initialize s->num_uniforms and friends, skip ttn_channel for dot
products, and use the simpler discard_if intrinsic.
Reviewed-by: Kenneth Graunke <[email protected]> (v13)
Acked-by: Rob Clark <[email protected]>
|