[PATCH] Fix a bug in APFloat::fusedMultiplyAdd()

Sat May 11 12:13:56 PDT 2013

The original patch has a minor problem -- after the test is done, I 
intentionally modified
the unittest to see if I can see a failure report.  I forget to revert 
the that change.

Please use updated patch.

On 5/11/13 11:57 AM, Shuxin Yang wrote:
> Hi, There:
>
>   The attached patch is to fix the bug that APFloat::fusedMultiplyAdd()
> mistakenly evaluate "14.5f * -14.5f + 225.0f" to 225.0f. It is tracked by
> rdar://13863270.
>
>   The real culprit is "multiplySignificand(const APFloat &rhs, const 
> APFloat *addend)".
> While this function is called in many places, it is never called with 
> non-null addend
> until recently Owen committed a change to Instruction Selection.
>
>   The logic of multiplySignificand() is sketched bellow, assuming all 
> operands are
> single-precion.
>
>  ----------------------------------------------
>     multiplySignificand(APFloat B, APFloat *C) {
>       A = *this;
>
>       assume A = a23 . a22 ... a0 * 2^e1
>       assume B = b23 . b22 ... b0 * 2^e2
>
>       step 1: compute T = A * B, suppose the result is
>           T = t47 t46 . t45 ... t0 * 2^(e1+e2)
>              Note that there are two significant bits at the LHS of
>              radix point.
>
>       step 2: if (C) {
>         step 2.1) Normalize T such that the most-significant-bit (MSB)
>               of T is right before radix point. and this is done by
>                 shift-left(T's-significant, 47 - position-of-MSB)
>
>         step 2.2) Convert C such that it has 47 signicant bit
>                 to be consistent with T.
>
>
>         step 2.3) call add-sub helper routine to evaluate T = T + C.
>       }
>
>       step 3: convert T, which have 48 significant bits back to
>               the one having 24 bits (i.e. convert back to float-type)
>     }
>  ----------------------------------------------
>
> There are couple of problem over here:
> ========================================
>     p1) in step 2.1, it blindly assume "position-of-MSB <= 47",
>         actually position-of-MSB("14.5f * 14.5f") == 48.
>
>         So, in this case, step 2.1 actually evaluate
>             T = shift-left(T, -1) = shift-left(T, 0xff...ff) = 0
>         which explain why "14.5f * -14.5f + whatever" = whatever
>
>     p2. there are two bit before radix point, I don't know for sure
>         if step 2.3 always give correct result. It seems to be
>         pretty tricky here.
>
> The fix:
> ========
>     a) Insert step between 1 and 2, by move the radix point left by 
> one position.
>        i.e.
>        let T = t47 . t46 ... t0 ^ 2^(e1+e2+1)
>
>     b) replace all 47 with 48 in step 2.*
>
> Thanks
> Shuxin
>

-------------- next part --------------
Index: unittests/ADT/APFloatTest.cpp
===================================================================

--- unittests/ADT/APFloatTest.cpp	(revision 181662)
+++ unittests/ADT/APFloatTest.cpp	(working copy)
@@ -33,6 +33,29 @@
 
 namespace {
 
+TEST(APFloatTest, FMA) {
+  APFloat::roundingMode rdmd = APFloat::rmNearestTiesToEven;
+
+  {
+    APFloat f1(14.5f);
+    APFloat f2(-14.5f);
+    APFloat f3(225.0f);
+    f1.fusedMultiplyAdd(f2, f3, APFloat::rmNearestTiesToEven);
+    EXPECT_EQ(14.75f, f1.convertToFloat());
+  }
+
+  {
+    APFloat Val2(2.0f);
+    APFloat f1((float)1.17549435e-38F);
+    APFloat f2((float)1.17549435e-38F);
+    f1.divide(Val2, rdmd);
+    f2.divide(Val2, rdmd);
+    APFloat f3(12.0f);
+    f1.fusedMultiplyAdd(f2, f3, APFloat::rmNearestTiesToEven);
+    EXPECT_EQ(12.0f, f1.convertToFloat());
+  }
+}
+
 TEST(APFloatTest, Denormal) {
   APFloat::roundingMode rdmd = APFloat::rmNearestTiesToEven;
 
Index: lib/Support/APFloat.cpp
===================================================================
--- lib/Support/APFloat.cpp	(revision 181662)
+++ lib/Support/APFloat.cpp	(working copy)
@@ -872,7 +872,21 @@
   omsb = APInt::tcMSB(fullSignificand, newPartsCount) + 1;
   exponent += rhs.exponent;
 
+  // Assume the operands involved in the multiplication are single-precision
+  // FP, and the two multiplicants are:
+  //   *this = a23 . a22 ... a0 * 2^e1
+  //     rhs = b23 . b22 ... b0 * 2^e2
+  // the result of multiplication is:
+  //   *this = c47 c46 . c45 ... c0 * 2^(e1+e2)
+  // Note that there are two significant bits at the left-hand side of the 
+  // radix point. Move the radix point toward left by one bit, and adjust
+  // exponent accordingly.
+  exponent += 1;
+
   if (addend) {
+    // The intermediate result of the multiplication has "2 * precision" 
+    // signicant bit; adjust the addend to be consistent with mul result.
+    //
     Significand savedSignificand = significand;
     const fltSemantics *savedSemantics = semantics;
     fltSemantics extendedSemantics;
@@ -880,8 +894,9 @@
     unsigned int extendedPrecision;
 
     /* Normalize our MSB.  */
-    extendedPrecision = precision + precision - 1;
+    extendedPrecision = 2 * precision;
     if (omsb != extendedPrecision) {
+      assert(extendedPrecision > omsb);
       APInt::tcShiftLeft(fullSignificand, newPartsCount,
                          extendedPrecision - omsb);
       exponent -= extendedPrecision - omsb;
@@ -912,8 +927,18 @@
     omsb = APInt::tcMSB(fullSignificand, newPartsCount) + 1;
   }
 
-  exponent -= (precision - 1);
+  // Convert the result having "2 * precision" significant-bits back to the one
+  // having "precision" significant-bits. First, move the radix point from 
+  // poision "2*precision - 1" to "precision - 1". The exponent need to be
+  // adjusted by "2*precision - 1" - "precision - 1" = "precision".
+  exponent -= precision;
 
+  // In case MSB resides at the left-hand side of radix point, shift the
+  // mantissa right by some amount to make sure the MSB reside right before
+  // the radix point (i.e. "MSB . rest-significant-bits").
+  //
+  // Note that the result is not normalized when "omsb < precision". So, the
+  // caller needs to call APFloat::normalize() if normalized value is expected.
   if (omsb > precision) {
     unsigned int bits, significantParts;
     lostFraction lf;