[compiler-rt] b291182 - [compiler-rt][BF16] "bfloat -> float -> bfloat" round-trip conversions

Tue Aug 29 09:59:58 PDT 2023

Author: V Donaldson
Date: 2023-08-29T09:59:23-07:00
New Revision: b291182b3f2400532806dce457d7cb2ad8dd03b0

URL: https://github.com/llvm/llvm-project/commit/b291182b3f2400532806dce457d7cb2ad8dd03b0
DIFF: https://github.com/llvm/llvm-project/commit/b291182b3f2400532806dce457d7cb2ad8dd03b0.diff

LOG: [compiler-rt][BF16] "bfloat -> float -> bfloat" round-trip conversions

Invoking compiler-rt function __truncsfbf2 to convert a zero 32-bit float
0x00000000 to a 16-bit bfloat value currently generates the denormal value
0x0040, rather than value 0x0000. Negative zero 0x80000000 is converted
to denormal 0x8040 rather than 0x8000.

This behavior is seen in flang code under development (not yet integrated)
that converts bfloat/REAL(KIND=3) argument values to float/REAL(KIND=4)
values and then converts those values back to bfloat/REAL(KIND=3). There
are other instances of the problem. A round-trip type conversion using
__truncsfbf2 of a denormal generates a different denormal, and an sNaN
is converted to a qNaN.

The problem is addressed in generic conversion function fp_trunc_impl.inc
by removing trailing 0 significand bits when the source and destination
type formats are identical except for the significand size. This condition
is met only for float -> bfloat conversions.

Round-trip conversions for at least some other type pairs have the same
problem. A solution in those cases would need to account for exponent
size differences. Those cases are not relevant to flang compilations
and are not addressed here. A broader solution might subsume this fix,
or this fix might remain useful as is.

There are no existing tests of bfloat conversion functionality in the
compiler-rt test directory. Tests for other conversions use a common
infrastructure that does not currently have support for bfloat conversions.
This patch does not attempt to add that infrastructure for this new case.
CodeGen test bfloat.ll checks bfloat adds and other operations that invoke
__truncsfbf2.

Added: 
    

Modified: 
    compiler-rt/lib/builtins/fp_trunc_impl.inc

Removed: 
    


################################################################################
diff  --git a/compiler-rt/lib/builtins/fp_trunc_impl.inc b/compiler-rt/lib/builtins/fp_trunc_impl.inc
index 6662be7607e70e..e235f45965a727 100644

--- a/compiler-rt/lib/builtins/fp_trunc_impl.inc
+++ b/compiler-rt/lib/builtins/fp_trunc_impl.inc
@@ -75,6 +75,13 @@ static __inline dst_t __truncXfYf2__(src_t a) {
   const src_rep_t sign = aRep & srcSignMask;
   dst_rep_t absResult;
 
+  const int tailBits = srcBits - dstBits;
+  if (srcExpBits == dstExpBits && ((aRep >> tailBits) << tailBits) == aRep) {
+    // Same size exponents and a's significand tail is 0. Remove tail.
+    dst_rep_t result = aRep >> tailBits;
+    return dstFromRep(result);
+  }
+
   if (aAbs - underflow < aAbs - overflow) {
     // The exponent of a is within the range of normal numbers in the
     // destination format.  We can convert by simply right-shifting with