[PATCH] D69275: Add constrained int->FP intrinsics

Fri Nov 15 10:06:04 PST 2019

uweigand added a comment.

Some comments on verifying the various algorithms for correctly respecting constrained FP semantics.  I have not looked at the X86 back-end.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1020
+                                    Node->getOperand(1).getValueType());
+    break;
   case ISD::SIGN_EXTEND_INREG: {
----------------
kpn wrote:
> cameron.mcinally wrote:
> > cameron.mcinally wrote:
> > > kpn wrote:
> > > > cameron.mcinally wrote:
> > > > > Should these be clustered with STRICT_LRINT/etc?
> > > > The regular U/SINT_TO_FP nodes are handled a couple of lines above. That makes these clustered close to the matching non-constrained versions, which seemed appropriate.
> > > > 
> > > > Is there a better place?
> > > I think the STRICT_LRINT/etc cases should be moved up and combined with these new cases. All the non-strict variants are clustered together. It would make sense to cluster the strict variants directly under those.
> > I suppose STRICT_LRINT/etc could be moved in a separate NFCI patch and then this patch just rebased. That's probably best. Sorry, just thinking aloud...
> When one does something like this one notices that I'm calling TLI.getOperationAction() instead of TLI.getStrictFPOperationAction(). That seems like a bug in one of the two places. Hmmm.
Yes, the code below  for STRICT_LRINT etc seems buggy, it should definitely call TLI.getOperationAction instead.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2425
+      Result = DAG.getFPExtendOrRound(Sub, dl, DestVT);
     return Result;
   }
----------------
kpn wrote:
> cameron.mcinally wrote:
> > Nit-picky: does this preserve the rounding results and flags raised?
> > 
> > If the target doesn't support the itofp instruction, I'm not sure if we can do better anyway. Just wondering if anyone had thought this through...
> Well, we don't have the metadata arguments at this point, so they can't be carried through to the extend/round instruction.
> 
> And I don't know how a target that doesn't have an itofp instruction would handle a round instruction. Maybe? I don't know. 
I don't believe we need the metadata arguments here.  Instead, we need to verify that this code sequence always rounds correctly according to the current rounding mode, no matter what it is, and also that this code sequence raises the same set of exceptions that a real IEEE compliant itofp operation would raise.

So looking at the algorithm, it first constructs two double values "by hand", using only integer operations, which is always fine.  Then those two are subtracted from one other.  Due to the construction, neither can be NaN, and also the difference is always exactly representable.  This means that the FSUB never raises any exception and does not depend on the current rounding mode, so this should also be fine.  (So possibly, we don't even need a STRICT_FSUB here but can continue using a normal FSUB?)

Finally, the result may be extended or rounded to another floating-point type.  Extension should be fine since it never rounds and the input cannot be a NaN.  Rounding should also be fine since it uses the correct (current) rounding mode, due to the construction of the input can never raise underflow or overflow exceptions, and raises the inexact exception exactly when it ought to be raised.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2448
+    Tmp1 = DAG.getNode(ISD::SINT_TO_FP, dl, DestVT, Op0);

   SDValue SignSet = DAG.getSetCC(dl, getSetCCResultType(SrcVT), Op0,
----------------
This is actually the more problematic case, because this may round the wrong way depending on the current rounding mode.

For example, if the input is a large i64 value that does not exactly fit into the result type, and the rounding mode is "towards zero", then the result should be the FP value immediately smaller than the exact result.  But since we're actually doing a signed conversion, the input will be interpreted as a negative value (of large absolute value), and rounded "towards zero", i.e. to the FP value immediately **larger** than the exact result of converting the negative value.   After adding back the bias, we'll then have the FP value immediately **larger** than the exact result of an (unsigned) conversion of the original value, which is incorrect.

It actually looks like this this has nothing to do with constrained FP semantics, this code simply gives the wrong result even for regular FP operations ...

(GCC avoid this by either converting to a larger intermediate FP type first, if possible, or else avoiding the rounding issue by ensuring only positive values are passed to the signed-to-FP intermediate conversion.)

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp:1272
+    DAG.ReplaceAllUsesOfValueWith(Op.getValue(1), SDValue(Result.getNode(), 1));
+    return Result;
+  }
----------------
This algorithm however looks fine to me.  The SINT_TO_FP operations always operate on positive values, and even more so, since they only use a half-word, the result is guaranteed to fit exactly into the target FP type, so there is never any rounding or exception.  (These then don't really need to be strict operations.)  The FMUL only increments the exponent, so again there is no rounding or exception.

The FADD at the end rounds correctly, and raises the correct exceptions, so this looks all good.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2653
+    return Res;
+  }
+
----------------
This confuses me again.   It seems this may generate a SINT_TO_FP -> HalfVT -> FP_ROUND -> OutVT chain, which introduces a potential double rounding that can lead to incorrect results even disregarding any constrained FP semantics ...

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:6076
+      Slow = DAG.getNode(ISD::STRICT_FADD, dl, { DstVT, MVT::Other },
+                         { SignCvt.getValue(1), SignCvt, SignCvt });
+      Chain = Slow.getValue(1);
----------------
This also looks correct to me.  The STRICT_SINT_TO_FP will round correctly in any rounding mode, if my understanding of the algorithm is correct, and it will also raise the inexact exception if appropriate.  The STRICT_FADD is just a multiply by two, which does not depend on rounding and cannot raise any exceptions given the input (so it might as well be just a plain FADD).

But what confuses me a bit again is why we have two algorithms for UINT_TO_FP expansion: a correct one here, and the incorrect one above in ExpandLegalINT_TO_FP?  Under what circumstances will we ever end up using the incorrect one?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:6129
+      Result = DAG.getNode(ISD::STRICT_FADD, dl, {DstVT, MVT::Other},
+                           {HiSub.getValue(1), LoFlt, HiSub});
+      Chain = Result.getValue(1);
----------------
Same comment as above.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69275/new/

https://reviews.llvm.org/D69275