gallivm: do per-sample depth comparison instead of doing it post-filter

Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9) and definitely required by d3d10. This actually doesn't do it pre-filter but more "in-filter" as otherwise need to push the comparisons even further down into fetch code and this also trivially allows using a somewhat cheaper lerp. Doing it pre-filter would actually have some performance advantage for UNORM formats (because the comparisons should be done in texture format, we'd only need to convert the shadow ref coord to texture format once, but in turn would save converting the per-sample texture values to floats) but this gets a bit messy as this has implications for border color handling as well (which needs to be done prior to depth comparisons, hence would also need to convert border color to texture format too or use some other tricks like doing separate border color / shadow ref comparison and simply using that result directly when doing border replacement). Should make no difference for nearest filtering, and performance for linear filtering should be mostly the same too (essentially have one more comparison instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add special "lerp" which all in all shouldn't be much of a difference). v2: get rid of old code completely Reviewed-by: Zack Rusin <[email protected]>
author: Roland Scheidegger <[email protected]> 2013-08-15 18:40:32 +0200
committer: Roland Scheidegger <[email protected]> 2013-08-15 18:42:20 +0200
commit: 5626a84a002cb8565b527ebc1fca73a8497019db (patch)
tree: 545b22ad32851b633dd48ae3920a6f31afa966d9 /src/gallium/auxiliary/gallivm/lp_bld_arit.c
parent: 3b2f3f90ace68e0a4777661f8cbf07438855edcb (diff)
1 files changed, 12 insertions, 1 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 98409c3be86..ee30a02d78c 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -1411,8 +1411,19 @@ lp_build_clamp(struct lp_build_context *bld,
    assert(lp_check_value(bld->type, min));
    assert(lp_check_value(bld->type, max));
 
-   a = lp_build_min(bld, a, max);
+   /*
+    * XXX dark magic warning: The order of min/max here matters (!).
+    * The reason is a typical use case is clamp(a, 0.0, 1.0)
+    * (for example for float->unorm conversion) and on x86 sse2
+    * this will give 0.0 for NaNs, whereas doing min first will
+    * give 1.0 for NaN which makes d3d10 angry...
+    * This is very much not guaranteed behavior though which just
+    * happens to work x86 sse2 (and up), and obviously won't do anything
+    * for other non-zero clamps (say -1.0/1.0 in a SNORM conversion) neither,
+    * so need to fix this for real...
+    */
    a = lp_build_max(bld, a, min);
+   a = lp_build_min(bld, a, max);
    return a;
 }
author	Roland Scheidegger <[email protected]>	2013-08-15 18:40:32 +0200
committer	Roland Scheidegger <[email protected]>	2013-08-15 18:42:20 +0200
commit	5626a84a002cb8565b527ebc1fca73a8497019db (patch)
tree	545b22ad32851b633dd48ae3920a6f31afa966d9 /src/gallium/auxiliary/gallivm/lp_bld_arit.c
parent	3b2f3f90ace68e0a4777661f8cbf07438855edcb (diff)