| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for these functions to be called from other
files.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dirty-bit checking from each brw_upload_<stage>_prog function is
split out into its a new brw_<stage>_state_dirty function.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit splits portions of the existing brw_upload_vs_prog and
brw_upload_gs_prog function into new brw_vs_populate_key and
brw_gs_populate_key functions. This follows the same style as is
already present for all other stages, (see brw_wm_populate_key, etc.).
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Commit 09ee907266 added logic to fold immediates into mad operations,
but the emission code is only there for fmad. Only allow it on float
types.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Commit fb63df22151f added 4-byte mad support, but only supported
emission for floats. Disable it for ints for now.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
The lifetime of the sources array needs to be match the nir_tex_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
These sets are part of the block, and their lifetime needs to match the
block itself. So, allocate them using the block itself as the context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The lifetime of each register's use/def/if_use sets needs to match the
register itself. So, allocate them using the register itself as the
context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.
We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We can just pass a pointer to the list of variables, and reuse the code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ralloc_adopt() reparents all children from one context to another.
Conceptually, ralloc_adopt(new_ctx, old_ctx) behaves like this
pseudocode:
foreach child of old_ctx:
ralloc_steal(new_ctx, child)
However, ralloc provides no way to iterate over a memory context's
children, and ralloc_adopt does this task more efficiently anyway.
One potential use of this is to implement a memory-sweeper pass: first,
steal all of a context's memory to a temporary context. Then, walk over
anything that should be kept, and ralloc_steal it back to the original
context. Finally, free the temporary context. This works when the
context is something that can't be freed (i.e. an important structure).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Tested using the GLSL 1.30 tests for integer abs(). Not currently used,
but it was one of the new opcodes used by robclark's idiv lowering.
|
|
|
|
| |
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This will enable the driver to tell which regids to link up to which
MRT outputs.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This is needed for MRT support
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This adds all the plumbing to get EGL_EXT_image_dma_buf_import in
i915g.
Signed-off-by: Stéphane Marchesin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.
total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs: 117192 -> 116359 (-0.71%)
helped: 471
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
instructions in affected programs: 2858 -> 2808 (-1.75%)
helped: 12
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The rcp(log(x)) pattern affects instruction counts.
instructions in affected programs: 144 -> 138 (-4.17%)
helped: 6
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
No changes in shader-db.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs: 4876 -> 4720 (-3.20%)
helped: 58
HURT: 10
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs: 34773 -> 33083 (-4.86%)
helped: 147
HURT: 6
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
InputsRead is a 64-bit bitfield. Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.
Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32. Switch back to using < now that
the underlying problem is fixed.
Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
This is _mesa_fls() for 64-bit values.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
If we tell NIR to split ffma's, then we don't need seperate options
anymore.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering. Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem. We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs: 989359 -> 978057 (-1.14%)
helped: 5308
HURT: 97
GAINED: 78
LOST: 5
This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program. However, the damage is caused by changing a single
instruction from a ffma to an add. This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill. Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs: 355876 -> 349811 (-1.70%)
helped: 1455
HURT: 14
GAINED: 5
LOST: 0
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i965/nir: Use the dedicated ffma peephole
total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs: 1292790 -> 1268660 (-1.87%)
helped: 5999
HURT: 457
GAINED: 4
LOST: 9
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
i965/nir: Use the late optimizations
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
This optimization is repeated verbatim above
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions. To solve this, we play the
age-old header file trick and just #define around it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw. This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.
Reviewed-by: Kenneth Grunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.
v2: Use the correct variable name in the comments
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Acked-by: Matt Turner <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of using a distribution tarball (or a dirty git tree) one can
have the generated sources locally. Make configure.ac error out
otherwise, to alert that about the unmet requirement(s) of python/mako.
v2: Check only for a single file for each dependency.
Suggested-by: Matt Turner <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Found by Coverity.
|