[flang-commits] [PATCH] D159005: [compiler-rt][BF16] "bfloat -> float -> bfloat" round-trip conversions

Mon Aug 28 09:57:25 PDT 2023

vdonaldson created this revision.
vdonaldson added a project: Flang.
Herald added subscribers: Enna1, jdoerfert, dberris.
Herald added a project: All.
vdonaldson requested review of this revision.

Invoking compiler-rt function __truncsfbf2 to convert a zero 32-bit float
0x00000000 to a 16-bit bfloat value currently generates the denormal value
0x0040, rather than value 0x0000. Negative zero 0x80000000 is converted
to denormal 0x8040 rather than 0x8000.

This behavior is seen in flang code under development (not yet integrated)
that converts bfloat/REAL(KIND=3) argument values to float/REAL(KIND=4)
values and then converts those values back to bfloat/REAL(KIND=3). There
are other instances of the problem. A round-trip type conversion using
__truncsfbf2 of a denormal generates a different denormal, and an sNaN
is converted to a qNaN.

The problem is addressed in generic conversion function fp_trunc_impl.inc
by removing trailing 0 significand bits when the source and destination
type formats are identical except for the significand size. This condition
is met only for float -> bfloat conversions.

Round-trip conversions for at least some other type pairs have the same
problem. A solution in those cases would need to account for exponent
size differences. Those cases are not relevant to flang compilations
and are not addressed here. A broader solution might subsume this fix,
or this fix might remain useful as is.

There are no existing tests of bfloat conversion functionality in the
compiler-rt test directory. Tests for other conversions use a common
infrastructure that does not currently have support for bfloat conversions.
This patch does not attempt to add that infrastructure for this new case.
CodeGen test bfloat.ll checks bfloat adds and other operations that invoke
__truncsfbf2.


https://reviews.llvm.org/D159005

Files:
  compiler-rt/lib/builtins/fp_trunc_impl.inc


Index: compiler-rt/lib/builtins/fp_trunc_impl.inc
===================================================================

--- compiler-rt/lib/builtins/fp_trunc_impl.inc
+++ compiler-rt/lib/builtins/fp_trunc_impl.inc
@@ -75,6 +75,13 @@
   const src_rep_t sign = aRep & srcSignMask;
   dst_rep_t absResult;
 
+  const int tailBits = srcBits - dstBits;
+  if (srcExpBits == dstExpBits && ((aRep >> tailBits) << tailBits) == aRep) {
+    // Same size exponents and a's significand tail is 0. Remove tail.
+    dst_rep_t result = aRep >> tailBits;
+    return dstFromRep(result);
+  }
+
   if (aAbs - underflow < aAbs - overflow) {
     // The exponent of a is within the range of normal numbers in the
     // destination format.  We can convert by simply right-shifting with


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D159005.553969.patch
Type: text/x-patch
Size: 768 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/flang-commits/attachments/20230828/66a00e75/attachment.bin>