diff options
author | Marek Olšák <[email protected]> | 2016-07-26 22:34:03 +0200 |
---|---|---|
committer | Marek Olšák <[email protected]> | 2016-08-03 17:46:46 +0200 |
commit | db2d31dab1ce8b91b5563b5cc63df2863c4f7e4b (patch) | |
tree | f3278d269edc9f3f04c209938549bd8f0c556161 | |
parent | 4c4bfed6709c7c35de4b2268ca2e73ed75c30f50 (diff) |
radeonsi: use v_mad_f32 for fma
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size: 6667596 -> 6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <[email protected]>
-rw-r--r-- | src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index d382496387f..4c603fb1aad 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -1769,8 +1769,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char *tripl HAVE_LLVM >= 0x0308 ? "llvm.exp2.f32" : "llvm.AMDIL.exp."; bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32"; - bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_FMA].intr_name = "llvm.fma.f32"; + bld_base->op_actions[TGSI_OPCODE_FMA].emit = + bld_base->op_actions[TGSI_OPCODE_MAD].emit; bld_base->op_actions[TGSI_OPCODE_FRC].emit = emit_frac; bld_base->op_actions[TGSI_OPCODE_F2I].emit = emit_f2i; bld_base->op_actions[TGSI_OPCODE_F2U].emit = emit_f2u; |