gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soa

Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with minimal plumbing. I've seen code size go down nearly by a factor of 3 for a complete texture sampling function (including bilinear filtering) using R16F. (What we should do for everything not special cased is to do AoS gather, shuffle/shift things into SoA vectors, and then do the conversion there. Otherwise it's particularly bad with 1 or 2 channel formats - that r16f format with either 4 or 8-wide vectors was still doing one element at a time, essentially doing exactly the same work as for rgba16f. Also replacing the channels with SWIZZLE0/1 (particularly the latter) adds even more work, as it has to be done per aos vector, and not just straightforward at the end with the SoA vector.) Reviewed-by: Jose Fonseca <[email protected]>
author: Roland Scheidegger <[email protected]> 2016-12-03 17:10:46 +0100
committer: Roland Scheidegger <[email protected]> 2016-12-06 20:06:06 +0100
commit: fd5f420fbb237a9662532a4111e409f5ec2eba8a (patch)
tree: 93f30b1a83d76e007344d37fe97aebc59550ecc6
parent: 775a2446450fc71cb43e48ece9b59f0412c067fd (diff)
1 files changed, 18 insertions, 4 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
index 7fc4e8d24fd..7444c518e42 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
@@ -239,9 +239,22 @@ lp_build_unpack_rgba_soa(struct gallivm_state *gallivm,
 
       case UTIL_FORMAT_TYPE_FLOAT:
          if (type.floating) {
-            assert(start == 0);
-            assert(stop == 32);
-            assert(type.width == 32);
+            if (format_desc->channel[chan].size == 16) {
+               struct lp_type f16i_type = type;
+               f16i_type.width /= 2;
+               f16i_type.floating = 0;
+               if (start) {
+                  input = LLVMBuildLShr(builder, input,
+                             lp_build_const_int_vec(gallivm, type, start), "");
+               }
+               input = LLVMBuildTrunc(builder, input,
+                                      lp_build_vec_type(gallivm, f16i_type), "");
+               input = lp_build_half_to_float(gallivm, input);
+            } else {
+               assert(start == 0);
+               assert(stop == 32);
+               assert(type.width == 32);
+            }
             input = LLVMBuildBitCast(builder, input, lp_build_vec_type(gallivm, type), "");
          }
          else {
@@ -369,7 +382,8 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
        format_desc->block.height == 1 &&
        format_desc->block.bits <= type.width &&
        (format_desc->channel[0].type != UTIL_FORMAT_TYPE_FLOAT ||
-        format_desc->channel[0].size == 32))
+        format_desc->channel[0].size == 32 ||
+        format_desc->channel[0].size == 16))
    {
       /*
        * The packed pixel fits into an element of the destination format. Put
author	Roland Scheidegger <[email protected]>	2016-12-03 17:10:46 +0100
committer	Roland Scheidegger <[email protected]>	2016-12-06 20:06:06 +0100
commit	fd5f420fbb237a9662532a4111e409f5ec2eba8a (patch)
tree	93f30b1a83d76e007344d37fe97aebc59550ecc6
parent	775a2446450fc71cb43e48ece9b59f0412c067fd (diff)