aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/r600/sb/sb_pass.h
diff options
context:
space:
mode:
authorVadim Girlin <[email protected]>2013-07-17 18:29:56 +0400
committerVadim Girlin <[email protected]>2013-07-17 18:29:56 +0400
commit07baf9cfd16b38872be952382ae5a705057cbec2 (patch)
tree984159e130f9228e113b48253d4941505135e090 /src/gallium/drivers/r600/sb/sb_pass.h
parentba7fa4c4c93e67fec798d837005a3041adda3d5b (diff)
r600g/sb: improve alu packing on cayman
Scheduler/register allocator in r600-sb was developed and optimized on evergreen (VLIW-5) hardware, so currently it's not optimal for VLIW-4 chips. This patch should improve performance on cayman gpus due to better alu packing, but also it tends to increase register usage, so overall positive effect on performance has to be proven by real benchmarks yet. Some results with bfgminer kernel on cayman: source bytecode: 60 gprs, 3905 alu groups, sbcl before the patch: 45 gprs, 4088 alu groups, sbcl with this patch: 55 gprs, 3474 alu groups. Signed-off-by: Vadim Girlin <[email protected]>
Diffstat (limited to 'src/gallium/drivers/r600/sb/sb_pass.h')
-rw-r--r--src/gallium/drivers/r600/sb/sb_pass.h26
1 files changed, 25 insertions, 1 deletions
diff --git a/src/gallium/drivers/r600/sb/sb_pass.h b/src/gallium/drivers/r600/sb/sb_pass.h
index c3ea8734de3..95d2a203a60 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -507,12 +507,36 @@ class ra_init : public pass {
public:
- ra_init(shader &sh) : pass(sh) {}
+ ra_init(shader &sh) : pass(sh), prev_chans() {
+
+ // The parameter below affects register channels distribution.
+ // For cayman (VLIW-4) we're trying to distribute the channels
+ // uniformly, this means significantly better alu slots utilization
+ // at the expense of higher gpr usage. Hopefully this will improve
+ // performance, though it has to be proven with real benchmarks yet.
+ // For VLIW-5 this method could also slightly improve slots
+ // utilization, but increased register pressure seems more significant
+ // and overall performance effect is negative according to some
+ // benchmarks, so it's not used currently. Basically, VLIW-5 doesn't
+ // really need it because trans slot (unrestricted by register write
+ // channel) allows to consume most deviations from uniform channel
+ // distribution.
+ // Value 3 means that for new allocation we'll use channel that differs
+ // from 3 last used channels. 0 for VLIW-5 effectively turns this off.
+
+ ra_tune = sh.get_ctx().is_cayman() ? 3 : 0;
+ }
virtual int run();
private:
+ unsigned prev_chans;
+ unsigned ra_tune;
+
+ void add_prev_chan(unsigned chan);
+ unsigned get_preferable_chan_mask();
+
void ra_node(container_node *c);
void process_op(node *n);