diff options
author | Rob Clark <[email protected]> | 2018-01-11 16:08:47 -0500 |
---|---|---|
committer | Rob Clark <[email protected]> | 2018-01-14 16:14:19 -0500 |
commit | f10bd0a0e1f7cba65a4b462016d3869351b20106 (patch) | |
tree | 04b9623456c9a7da0d456b948f0554786e61ad08 /src/gallium/drivers/r600/r600_shader.c | |
parent | 50f9a9aa960b6340b84aae2fa0e86e14c0e40fa8 (diff) |
freedreno/ir3: "soft" depth scheduling for SFU instructions
First try with a "soft" depth, to try to schedule sfu instructions
further from their consumers, but fall back to hard depth (which might
result in stalling) if nothing else is avail to schedule.
Previously the consumer of a sfu instruction could end up scheduled
immediately after (since "hard" depth from sfu to consumer would be 0).
This works because legalize pass would insert a (ss) sync bit, but it
is sub-optimal since it would cause a stall.
Instead prioritize other instructions for 4 cycles if they would no
cause a nop to be inserted. This minimizes the stalling. There is a
slight penalty in general to overall # of instructions in shader (since
we could end up needing nop's later due to scheduling the "deeper" sfu
consumer later), but ends up being a wash on register pressure.
Overall this seems to be worth a 10+% gain in fps. Increasing the
"soft" depth of sfu consumer beyond 4 helps a bit in some cases, but 4
seems to be a good trade-off between getting 99% of the gain and not
increasing instruction count of shaders too much.
It's possible a similar approach could help for tex/mem instructions,
but the (sy) sync bit seems to trigger a switch to a different thread-
group to hide memory latency (possibly with some limits depending on
number of registers used?).
Signed-off-by: Rob Clark <[email protected]>
Diffstat (limited to 'src/gallium/drivers/r600/r600_shader.c')
0 files changed, 0 insertions, 0 deletions