i965/vec4/nir: do not emit 64-bit MAD

RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0. In that situation, the regioning generated for the sources seems to be equivalent to <4,4,1>:DF, so it will only work for components XY, which means that we have to move any other swizzle to a temporary so that we can source from channel X (or Y) in MAD and we also need to split the instruction (we are already scalarizing DF instructions but there is room for improvement and with MAD would be more restricted in that area) Also, it seems that MAD operations like this only write proper output for channels X and Y, so writes to Z and W also need to be done to a temporary using channels X/Y and then move that to channels Z or W of the actual dst. As a result the code we produce for native 64-bit MAD instructions is rather bad, and much worse than just emitting MUL+ADD. For reference, a simple case of a fully scalarized dvec4 MAD operation requires 15 instructions if we use native MAD and 8 instructions if we emit ADD+MUL instead. There are some improvements that we can do to the emission of MAD that might bring the instruction count down in some cases, but it comes at the expense of a more complex implementation so it does not seem worth it, at least initially. This patch makes translation of NIR's 64-bit FMMA instructions produce MUL+ADD instead of MAD. Currently, there is nothing else in the vec4 backend that emits MAD instructions, so this is sufficient and it helps optimization passes see MUL+ADD from the get go. Reviewed-by: Matt Turner <[email protected]>
author: Iago Toral Quiroga <[email protected]> 2016-06-08 11:05:51 +0200
committer: Samuel Iglesias Gonsálvez <[email protected]> 2017-01-03 11:26:51 +0100
commit: 82e9dda8bf8875d232840585f48763c7a7092918 (patch)
tree: 3aedb830581413e323c86ac3467022137aaecc0c /src/mesa/drivers
parent: 83dcd146020f5e54d1e0a46c585ed672e75abaa0 (diff)
1 files changed, 12 insertions, 5 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index d8aa64023c3..fe82ed8f15b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1892,12 +1892,19 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
       break;
 
    case nir_op_ffma:
-      op[0] = fix_3src_operand(op[0]);
-      op[1] = fix_3src_operand(op[1]);
-      op[2] = fix_3src_operand(op[2]);
+      if (type_sz(dst.type) == 8) {
+         dst_reg mul_dst = dst_reg(this, glsl_type::dvec4_type);
+         emit(MUL(mul_dst, op[1], op[0]));
+         inst = emit(ADD(dst, src_reg(mul_dst), op[2]));
+         inst->saturate = instr->dest.saturate;
+      } else {
+         op[0] = fix_3src_operand(op[0]);
+         op[1] = fix_3src_operand(op[1]);
+         op[2] = fix_3src_operand(op[2]);
 
-      inst = emit(MAD(dst, op[2], op[1], op[0]));
-      inst->saturate = instr->dest.saturate;
+         inst = emit(MAD(dst, op[2], op[1], op[0]));
+         inst->saturate = instr->dest.saturate;
+      }
       break;
 
    case nir_op_flrp:
author	Iago Toral Quiroga <[email protected]>	2016-06-08 11:05:51 +0200
committer	Samuel Iglesias Gonsálvez <[email protected]>	2017-01-03 11:26:51 +0100
commit	82e9dda8bf8875d232840585f48763c7a7092918 (patch)
tree	3aedb830581413e323c86ac3467022137aaecc0c /src/mesa/drivers
parent	83dcd146020f5e54d1e0a46c585ed672e75abaa0 (diff)