[PATCH] D43515: More math intrinsics for conservative math handling

Mon Feb 26 12:26:39 PST 2018

kpn added inline comments.

================
Comment at: docs/LangRef.rst:13213
+
+The result produced is an unsigned integer converted from the floating
+point operand.
----------------
andrew.w.kaylor wrote:
> kpn wrote:
> > andrew.w.kaylor wrote:
> > > I think you need to say more about the semantics. The LLVM IR fptoui and fptosi are specifically documented as rounding to zero. I don't think we want that with the constrained intrinsics so we need to very specifically document how they will be different from the standard instructions.
> > What do we want these intrinsics to do that is different from the normal IR fptoui and fptosi?
> > 
> > I don't think any of the constrained intrinsics _today_ are doing anything with the rounding and exception metadata. That makes it hard for me to say much about it in documentation, today.
> > 
> > Well, unless I misunderstood the current code. That's totally possible.
> You're right, none of the other intrinsics are doing anything specific with the rounding mode. The purpose of the intrinsics is to prevent the optimizer from doing things that would introduce compile-time rounding. However, the general assumption of the intrinsics is that whatever you set the rounding mode to at runtime is the rounding behavior you will get.
> 
> I'd want to consult a front end expert as to exactly what should be happening. A quick glance at the C99 standard tells me that when a floating point number is converted to an integer it is truncated toward zero. I think that means the same thing as the LLVM language reference claim that fptoui and fptosi round the number toward zero.
> 
> So I think that we don't want the runtime rounding mode to change the behavior of these intrinsics, and so we should document it as such.
Our front end guy here confirms truncation. I've updated my working copy to state that fptoui and fptosi truncate.

================
Comment at: docs/LangRef.rst:13260
+      declare <type>
+      @llvm.experimental.constrained.round(<type> <op>,
+                                          metadata <rounding mode>,
----------------
andrew.w.kaylor wrote:
> kpn wrote:
> > andrew.w.kaylor wrote:
> > > This intrinsic is a bit troublesome. The llvm.round intrinsic says that it "returns the same values as the libm round functions would, and handles error conditions in the same way." The libm round function, in turn, is documented as rounding to the nearest integer (and away from zero in halfway cases) regardless of the current rounding mode.
> > > 
> > > So what do we want the constrained form of the intrinsic to do? I think it needs to ignore the rounding mode. I'm not sure about exception behavior. If it doesn't respect exception behavior then we probably don't want to have the constrained form of this intrinsic at all.
> > We'll still need the constrained intrinsic to avoid getting reordered.
> > 
> > How about if I rename the intrinsic to be fptrunc instead of round? Then the rounding would be explicit.
> No, fptrunc does something else, right? My concern is that if this is preserving the behavior of the round library function (and I think we need to) then the rounding mode argument isn't relevant, so maybe it should be omitted in this case. The key thing is to be explicit about its behavior in the documentation and then make sure the implementation actually does what we say it will.
Well, if I'm reading  SelectionDAGBuilder::visitFPTrunc() correctly then fptrunc gets turned into ISD::FP_ROUND. So, no, renaming this intrinsic to be fptrunc would not be wrong. Contrast this with llvm.round getting turned into ISD::FROUND. I think we need constrained intrinsics for both, so this one in this patch should be renamed after fptrunc.

I haven't touched the ISD::FROUND node yet, and that's what llvm.round gets turned into from the looks of SelectionDAGBuilder::visitIntrinsicCall(). That could be a later patch.

I agree that the rounding mode should be ignored and shouldn't be in the intrinsic's metadata.

WDYT?

================
Comment at: docs/LangRef.rst:13296
+      declare <type>
+      @llvm.experimental.constrained.round.inreg(<type> <op1>,
+                                          <type> <op2>,
----------------
andrew.w.kaylor wrote:
> kpn wrote:
> > andrew.w.kaylor wrote:
> > > This seems to be leaking SelectionDAG implementation details into the IR space. How is this used?
> > It seems to exist for the MVT::ppcf128 type. A quick grep doesn't show any other users. 
> > 
> > Test coverage is lacking. I added a test using the intrinsic, but I had to mark it expected fail since I couldn't get it to work.
> > 
> > To avoid the risk of bugs getting introduced later I went ahead and implemented the intrinsic. Would it be better to not have the intrinsic and to instead have a pass that replaces the non-STRICT SDNode with a STRICT version? That would avoid said leaking into the IR space. It would, however, mean that llvm would have an opinion on when STRICT nodes should be used. I'm not sure that's a good thing. 
> I think it's better not to have the intrinsic for now. I don't understand what the SD node is doing well enough to say much more, but it looks to me like the SD node shouldn't be there either. It's a target-specific hack that leaked into the target-independent code if my understanding is correct.
Done. It will be removed in the next diff.

================
Comment at: docs/LangRef.rst:13337
+      declare <type>
+      @llvm.experimental.constrained.extend(<type> <op>,
+                                          metadata <rounding mode>,
----------------
andrew.w.kaylor wrote:
> kpn wrote:
> > andrew.w.kaylor wrote:
> > > This is a replacement for fpext, right? I think you should say that somewhere.
> > I think I picked names for the intrinsics that matched the SelectionDAG node enum. Perhaps it would be better to match the bitcode language names? In which case this intrinsic would be "fpext" instead of "extend".
> > 
> > Either way that's a good idea for the documentation to at least mention fpext.
> These intrinsics should be driven by what the front end needs. If no front end is generating an equivalent now then we don't want an intrinsic. So, yes, please match the bitcode language names.
Will do.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:968
     case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;
+    case ISD::STRICT_FP_TO_SINT: EqOpc = ISD::FP_TO_SINT; break;
+    case ISD::STRICT_FP_TO_UINT: EqOpc = ISD::FP_TO_UINT; break;
----------------
andrew.w.kaylor wrote:
> kpn wrote:
> > andrew.w.kaylor wrote:
> > > STRICT_FP_TO_SINT needs to do something more than this. The default lowering of FP_TO_SINT is known to raise spurious FE_INEXACT exceptions because it involves speculative execution.
> > I have seen FP_TO_SINT cause traps when it shouldn't. Would having that default lowering use the chain in the STRICT_ case solve that issue?
> No, I don't think the chain will fix this. We need to implement strict lowering that does something different.
Is there something we can do at this level to fix this? If there is then I'm all for it, but if there isn't then we should probably still put the intrinsic in. 

We'll need it eventually, and currently none of the constrained intrinsics solve the complete optimization problem. So I wonder if this intrinsic is really all that different from the other experimental constrained intrinsics.

If a backend models FP side effects then wouldn't the existing default lowering work correctly?

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:6134
   ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);
-  ValueVTs.push_back(MVT::Other); // Out chain
+  if (!(Opcode == ISD::STRICT_FP_TO_UINT || 
+        Opcode == ISD::STRICT_FP_TO_SINT ||
----------------
andrew.w.kaylor wrote:
> kpn wrote:
> > andrew.w.kaylor wrote:
> > > Why are you not attaching these nodes to the chain?
> > Because they need to match what the default lowering is expecting. Otherwise a variety of failures happen.
> There is code that removes the chain when the strict node is mutated to a non-strict node. That should be preventing lowering problems. I believe the chain was necessary to prevent re-ordering prior to final instruction selection.
Mutation happens too late in some cases. 

If it happened early enough then there wouldn't be any need for the strict intrinsics to be mentioned in SelectionDAGLegalize::ExpandNode(). Since it doesn't we can't use the chain.

Should that mutation happen earlier?

Repository:
  rL LLVM

https://reviews.llvm.org/D43515