gallivm: Improve lp_build_rcp_refine.

Use the alternative more accurate expression from https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division v2: Use lp_build_fmuladd as suggested by Roland Tested by enabling this code path, and running lp_test_arit. Reviewed-by: Roland Scheidegger <[email protected]>
author: Jose Fonseca <[email protected]> 2019-05-31 17:10:40 +0100
committer: Jose Fonseca <[email protected]> 2019-06-28 11:48:12 +0100
commit: 35734129819ca7795adb9145123eb4eafea3a562 (patch)
tree: 68b4fa3f014edafd6a61d1fa3182ed249bb87abe /src/gallium/auxiliary
parent: 0ec8a292fb1e3b0e2f1b6c9d2201ac3bfd993295 (diff)
1 files changed, 6 insertions, 6 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 02fb81afe51..c4931c0b230 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -2707,11 +2707,11 @@ lp_build_sqrt(struct lp_build_context *bld,
 /**
  * Do one Newton-Raphson step to improve reciprocate precision:
  *
- *   x_{i+1} = x_i * (2 - a * x_i)
+ *   x_{i+1} = x_i + x_i * (1 - a * x_i)
  *
  * XXX: Unfortunately this won't give IEEE-754 conformant results for 0 or
  * +/-Inf, giving NaN instead.  Certain applications rely on this behavior,
- * such as Google Earth, which does RCP(RSQRT(0.0) when drawing the Earth's
+ * such as Google Earth, which does RCP(RSQRT(0.0)) when drawing the Earth's
  * halo. It would be necessary to clamp the argument to prevent this.
  *
  * See also:
@@ -2724,12 +2724,12 @@ lp_build_rcp_refine(struct lp_build_context *bld,
                     LLVMValueRef rcp_a)
 {
    LLVMBuilderRef builder = bld->gallivm->builder;
-   LLVMValueRef two = lp_build_const_vec(bld->gallivm, bld->type, 2.0);
+   LLVMValueRef neg_a;
    LLVMValueRef res;
 
-   res = LLVMBuildFMul(builder, a, rcp_a, "");
-   res = LLVMBuildFSub(builder, two, res, "");
-   res = LLVMBuildFMul(builder, rcp_a, res, "");
+   neg_a = LLVMBuildFNeg(builder, a, "");
+   res = lp_build_fmuladd(builder, neg_a, rcp_a, bld->one);
+   res = lp_build_fmuladd(builder, res, rcp_a, rcp_a);
 
    return res;
 }
author	Jose Fonseca <[email protected]>	2019-05-31 17:10:40 +0100
committer	Jose Fonseca <[email protected]>	2019-06-28 11:48:12 +0100
commit	35734129819ca7795adb9145123eb4eafea3a562 (patch)
tree	68b4fa3f014edafd6a61d1fa3182ed249bb87abe /src/gallium/auxiliary
parent	0ec8a292fb1e3b0e2f1b6c9d2201ac3bfd993295 (diff)