| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were already do this for ALU operations but we haven't for non-ALU
operations. This changes that.
total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%)
NIR instructions in affected programs: 1768850 -> 1751305 (-0.99%)
helped: 14244
HURT: 124
total FS instructions in shared programs: 4083960 -> 4084036 (0.00%)
FS instructions in affected programs: 7302 -> 7378 (1.04%)
helped: 12
HURT: 51
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around. On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the number of equally-colored nearby nodes, what increases
the likelihood of failure in presence of optimistically colorable
nodes.
This patch disables the round-robin strategy for optimistically
colorable nodes. These typically arise in situations of high register
pressure or for registers with large live intervals, in both cases the
task of the instruction scheduler shouldn't be constrained excessively
by the dense packing of those nodes, and a spill (or on Intel hardware
a fall-back to SIMD8 mode) is invariably worse than a slightly less
optimal scheduling.
Shader-db results on the i965 driver:
total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
instructions in affected programs: 1121 -> 1071 (-4.46%)
helped: 1
HURT: 0
GAINED: 49
LOST: 5
v2: Re-enable round-robin already for the lowest one of the nodes
pushed optimistically onto the sack (Connor).
v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor).
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
immediates and uniforms.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes metadata guess when instructions in the program specify a
destination register with non-zero reg_offset and when the payload of
a LOAD_PAYLOAD spans several registers.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
MRFs cannot be read from anyway so they cannot possibly be a valid
source of LOAD_PAYLOAD.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It's perfectly fine to read the second half of a register written with
force_writemask_all from a first half MOV instruction or vice versa, and
lower_load_payload shouldn't mark the whole MOV as belonging to the second
half in that case. Replicate the same metadata to both halves of the
destination when writemasking is disabled.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
If your stdlib.h doesn't define this you should fix your stdlib.h.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
There's some commentary about how it's defined by other "modules", and
maybe that was true in 2000 when the code was added.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... and util_le{16,32}_to_cpu. I think I've used the right ones for
describing the actual operation performed (even though they're both just
"byte-swap this if I'm on big-endian").
The Linux Kernel has typedefs __le32/__be32 and friends that static
analysis tools can use to check that byte-orderings are correct. It
might be interesting to apply that here as well.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Cc: "10.5" <[email protected]>
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540962
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89260
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Also, wrapping the array in #ifdef DEBUG / #endif doesn't seem necessary.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Add add missing PIPE_TRANSFER_PERSISTENT, PIPE_TRANSFER_COHERENT flags.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This corrects a trivial error introduced in commit
19252fee46b835cb4f6b1cce18d7737d62b64a2e. That patch was merged recently
and omits one condition (that 'samples' is greater than zero) in one of
the error checks. That error will definitely cause regressions.
Also corrects the reference to the specification above the error check,
which was wrongly quoting OpenGL instead of OpenGL-ES.
Reviewed-by: Martin Peres <[email protected]>
|
|
|
|
| |
Cc: 10.5 10.4 <[email protected]>
|
|
|
|
|
|
|
|
| |
Broken by a27b74819ad375e8c0bc88e13f42c951d2b5cd6a.
This fix is critical and should be ported to stable ASAP.
Cc: 10.5 10.4 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When under dispatch_width=16 the previous code would allocate 2 registers for
the payload when only one is needed. This manifested itself through bugs on SKL
which needs to mess with this instruction.
Ken though this might impact shader-db, but apparently it doesn't
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Timo Aaltonen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful. In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.
No shader-db effects.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
vc4 results:
total instructions in shared programs: 39881 -> 39794 (-0.22%)
instructions in affected programs: 6302 -> 6215 (-1.38%)
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
| |
total NIR instructions in shared programs: 39426 -> 39411 (-0.04%)
NIR instructions in affected programs: 3748 -> 3733 (-0.40%)
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have some useful optimizations to drop things like 'ine a, 0' on a
boolean argument, but if 'a' came from logical operations on bools, it
couldn't tell. These kinds of constructs appear as a result of TGSI->NIR
quite frequently (at least with if flattening), so being a little more
aggressive in detecting booleans can pay off.
v2: Add ixor as a booleanness-preserving op (Suggestion by Connor).
vc4 results:
total instructions in shared programs: 40207 -> 39881 (-0.81%)
instructions in affected programs: 6677 -> 6351 (-4.88%)
Reviewed-by: Matt Turner <[email protected]> (v1)
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
| |
vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
May make life easier for tools like Coverity.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Spotted by Coverity.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Fixes xonotic, some webgl stuff, and really pretty much anything with
more than 4 varyings.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If there is no pci-id, which is valid for vc4 and freedreno, just emit
an info msg. Keep malformed but existing pci-id's as a warning.
Mostly just to clean up a warning that confuses users for the non-pci
devices.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I never actually implemented the stubbed out fence stuff back in the
early days. Fix that.
We'll need a few libdrm_freedreno changes to handle timeout properly,
so ignore that for now to avoid a libdrm_freedreno dependency bump.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88883
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_imm_ud will yield a HW_REG which then will introduce a barrier
for certain optimization opportunities.
No piglit regressions seen with gen8 (simd8vs).
Suggested-by: Matt Turner <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For fragment programs, we pull this mask from the payload header. The same
mask doesn't exist for compute shaders, so we set all bits to enabled.
Previously we were setting 0xff to support SIMD8 VS, but with CS we
support SIMD16, and therefore we change this to 0xffff.
Related commits for SIMD8 VS:
commit d9cd982d556be560af3bcbcdaf62b6b93eb934a5
Author: Ben Widawsky <[email protected]>
Date: Sun Feb 15 20:06:59 2015 -0800
i965/simd8vs: Fix SIMD8 atomics
commit 4a95be9772a255776309f23180519a4a8560f2dd
Author: Jordan Justen <[email protected]>
Date: Tue Feb 17 09:57:35 2015 -0800
i965/simd8vs: Fix SIMD8 atomics (read-only)
Note: this mask is ANDed with the execution mask, so some channels may not end
up issuing the atomic operation.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Gen8+ must use VALIGN_4. Unlike prior Gens, R32G32B32_FLOAT should supposedly
support VALIGN_4.
|
|
|
|
| |
The restriction is lifted.
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Because the TGSI interface creates merges for each instruction source
and then splits them back out, there are a lot of unnecessary
merge/split pairs which do essentially nothing. The various modifier/etc
propagation doesn't know how to walk though those, so just remove them
when they're unnecessary.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|