r600g/sb: improve alu packing on cayman

Scheduler/register allocator in r600-sb was developed and optimized on evergreen (VLIW-5) hardware, so currently it's not optimal for VLIW-4 chips. This patch should improve performance on cayman gpus due to better alu packing, but also it tends to increase register usage, so overall positive effect on performance has to be proven by real benchmarks yet. Some results with bfgminer kernel on cayman: source bytecode: 60 gprs, 3905 alu groups, sbcl before the patch: 45 gprs, 4088 alu groups, sbcl with this patch: 55 gprs, 3474 alu groups. Signed-off-by: Vadim Girlin <[email protected]>
author: Vadim Girlin <[email protected]> 2013-07-17 18:29:56 +0400
committer: Vadim Girlin <[email protected]> 2013-07-17 18:29:56 +0400
commit: 07baf9cfd16b38872be952382ae5a705057cbec2 (patch)
tree: 984159e130f9228e113b48253d4941505135e090 /src/gallium/drivers/r600/sb/sb_pass.h
parent: ba7fa4c4c93e67fec798d837005a3041adda3d5b (diff)
1 files changed, 25 insertions, 1 deletions
diff --git a/src/gallium/drivers/r600/sb/sb_pass.h b/src/gallium/drivers/r600/sb/sb_pass.h
index c3ea8734de3..95d2a203a60 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -507,12 +507,36 @@ class ra_init : public pass {
 
 public:
 
-	ra_init(shader &sh) : pass(sh) {}
+	ra_init(shader &sh) : pass(sh), prev_chans() {
+
+		// The parameter below affects register channels distribution.
+		// For cayman (VLIW-4) we're trying to distribute the channels
+		// uniformly, this means significantly better alu slots utilization
+		// at the expense of higher gpr usage. Hopefully this will improve
+		// performance, though it has to be proven with real benchmarks yet.
+		// For VLIW-5 this method could also slightly improve slots
+		// utilization, but increased register pressure seems more significant
+		// and overall performance effect is negative according to some
+		// benchmarks, so it's not used currently. Basically, VLIW-5 doesn't
+		// really need it because trans slot (unrestricted by register write
+		// channel) allows to consume most deviations from uniform channel
+		// distribution.
+		// Value 3 means that for new allocation we'll use channel that differs
+		// from 3 last used channels. 0 for VLIW-5 effectively turns this off.
+
+		ra_tune = sh.get_ctx().is_cayman() ? 3 : 0;
+	}
 
 	virtual int run();
 
 private:
 
+	unsigned prev_chans;
+	unsigned ra_tune;
+
+	void add_prev_chan(unsigned chan);
+	unsigned get_preferable_chan_mask();
+
 	void ra_node(container_node *c);
 	void process_op(node *n);
author	Vadim Girlin <[email protected]>	2013-07-17 18:29:56 +0400
committer	Vadim Girlin <[email protected]>	2013-07-17 18:29:56 +0400
commit	07baf9cfd16b38872be952382ae5a705057cbec2 (patch)
tree	984159e130f9228e113b48253d4941505135e090 /src/gallium/drivers/r600/sb/sb_pass.h
parent	ba7fa4c4c93e67fec798d837005a3041adda3d5b (diff)