i965/fs: Emit better b2f of an expression on GEN4 and GEN5 - mesa.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Ian Romanick <[email protected]>	2015-01-23 17:31:12 -0800
committer	Ian Romanick <[email protected]>	2015-03-19 10:21:08 -0700
commit	b616164c95cb495ce43f6b61dc805ed911a85e89 (patch)
tree	a8c91e6551b6d01b0f714d69a8d400b01111999a /common.py
parent	036e347f3c129bb547137aed955e75062fca09b8 (diff)

i965/fs: Emit better b2f of an expression on GEN4 and GEN5

On platforms that do not natively generate 0u and ~0u for Boolean results, b2f expressions that look like f = b2f(expr cmp 0) will generate better code by pretending the expression is f = ir_triop_sel(0.0, 1.0, expr cmp 0) This is because the last instruction of "expr" can generate the condition code for the "cmp 0". This avoids having to do the "-(b & 1)" trick to generate 0u or ~0u for the Boolean result. This means code like mov(16) g16<1>F 1F mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F (+f0) sel(16) m6<1>F g16<8,8,1>F 0F will be generated instead of mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F and(16) g4<1>D g2<8,8,1>D 1D and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD v2: When the comparison is either == 0.0 or != 0.0 use the knowledge that the true (or false) case already results in zero would allow better code generation by possibly avoiding a load-immediate instruction. v3: Apply the optimization even when neither comparitor is zero. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3551002 -> 3550829 (-0.00%) instructions in affected programs: 33269 -> 33096 (-0.52%) helped: 121 Iron Lake (0x0046): total instructions in shared programs: 4993327 -> 4993146 (-0.00%) instructions in affected programs: 34199 -> 34018 (-0.53%) helped: 129 No change on other platforms. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Palli <[email protected]>

Diffstat (limited to 'common.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: