diff options
author | Iago Toral Quiroga <[email protected]> | 2016-08-29 10:41:45 +0200 |
---|---|---|
committer | Samuel Iglesias Gonsálvez <[email protected]> | 2017-01-03 11:26:51 +0100 |
commit | 58767f0fec7809c3408adbc4d147dd56f2ee3d4d (patch) | |
tree | 666cde98a6a44b5e0651d03f560efca949996682 /src/util | |
parent | 945269ab7280b772807e573dfefc0b4f967ec522 (diff) |
i965/vec4: add a SIMD lowering pass
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain hardware bugs and limitations
that we need to work around by splitting the instructions so we only
write to 1 register at a time. This patch implements a SIMD splitting
pass similar to the one in the scalar backend.
Because we only use double-precision instructions in Align16 mode
in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64)
the pass should be a no-op on any other generation.
For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2 registers.
This affects double-precision instructions reading uniforms, for
example. Later patches will extend the lowering pass adding a few
more cases.
v2:
- Move the simd lowering pass after the main optimization loop and
run copy-propagation and dce if it reports progress (Curro)
- Compute number of registers written instead of fixing it to 1 (Iago)
- Use group from backend_instruction (Iago)
- Drop assertion that checked that we only split 8-wide instructions
into 4-wide. (Curro)
- Don't assume that instructions can only be 8-wide, we might want
to use 16-wide instructions in the future too (Curro)
- Wrap gen7 workarounds in a conditional to ease adding workarounds
for other gens in the future (Curro)
- Handle dst/src overlap hazard (Curro)
- Use the horiz_offset() helper to simplify the implementation (Curro)
- Drop the assertion that checks that each split instruction writes
exactly one register (Curro)
- Use the copy constructor to generate split instructions with all
the relevant fields initialized to the values in the original
instruction instead of copying only a handful of them manually (Curro)
v3 (Iago):
- When copying to a temporary, allocate the number of registers required
for the copy based on the size written of the lowered instruction
instead of assuming that all lowered instructions produce single-register
writes
- Adapt to changes in offset()
Reviewed-by: Matt Turner <[email protected]>
Diffstat (limited to 'src/util')
0 files changed, 0 insertions, 0 deletions