[PATCH] D36335: Add ‘llvm.experimental.constrained.fma‘ Intrinsic
Andy Kaylor via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 18 14:54:11 PDT 2017
andrew.w.kaylor added inline comments.
================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2015
default: break;
+ case ISD::FMA: {
+ SDValue ISDFMA = CurDAG->getNode(X86ISD::FMADD, SDLoc(Node),
----------------
wdng wrote:
> andrew.w.kaylor wrote:
> > wdng wrote:
> > > andrew.w.kaylor wrote:
> > > > wdng wrote:
> > > > > andrew.w.kaylor wrote:
> > > > > > Can you explain why this was necessary? I would have expected there to have been handling already in place for ISD::FMA.
> > > > > No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is -mattr=+fma option. Without this, CodeGen/X86/fp-intrinsics.ll will fail in instruction selection.
> > > > I still don't understand. What happens when -mattr=+fma is used?
> > > >
> > > > The CodeGen/X86/fma.ll test uses that option. This case should work in the same way.
> > > I think I made a mistake when describing the problem in my early comments. Let me rephrase and explain it there.
> > >
> > > 1. Without -mattr=+fma, a FMA libcall will be generated
> > > 2. With -mattr=+fma, we are expecting the corresponding FMA instruction to be generated.
> > >
> > > In fma.ll, all fma tests are *not* constrained fp operations, during the during the X86ISelLowering phase, the FMA node has been lowered to X86ISD::FMADD. So there is no ISD::FMA at this phase since it has already been changed to X86ISD::FMADD before the instruction selection starts. Please refer to the following dump.
> > >
> > > ```
> > > (gdb) p CurDAG->dump()
> > > SelectionDAG has 12 nodes:
> > > t0: ch = EntryToken
> > > t2: f64,ch = CopyFromReg t0, Register:f64 %vreg0
> > > t4: f64,ch = CopyFromReg t0, Register:f64 %vreg1
> > > t6: f64,ch = CopyFromReg t0, Register:f64 %vreg2
> > > t12: f64 = X86ISD::FMADD t2, t4, t6
> > > t10: ch,glue = CopyToReg t0, Register:f64 %XMM0, t12
> > > t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:f64 %XMM0, t10:1
> > >
> > > ```
> > > However, for the constrained fma, we use mutateStrictFPToFP( ) function to mutate constrained_fma to normal fma, namely ISD::FMA before the instruction selction starts. The X86 backend cannot recognize the ISD::FMA, so we have to add codes to convert ISD::FMA to X86ISD::FMADD during the instruction selection.
> > >
> > >
> > I'm still not sure I understand this, but it sounds to me like this should be happening somewhere else.
> >
> > Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA?
> >
> > My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path.
> Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA?
>
> --> Yes
>
> My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path.
>
> ---> I once tried to move the "mutateStrictFPToFP( )" to the LegalizeDAG phase, like the following code shows and I found it works and there is no need to add codes into X86 backend instruction selector:
>
> ```
> switch (Action) {
> case TargetLowering::Legal:
> if (Node->isStrictFPOpcode())
> Node = DAG.mutateStrictFPToFP(Node);
> return;
> ```
> So once those strict fp operator haven legalized to legal, we can directly mutate them to their corresponding normal fp operator.
>
> However, here comes a problem that non-default FP (or constrained fp operations) exception behaviors are target-specific, which means we have to leave it to each sub-target selectors to handle them. So I would not suggest mutating those instructions at somewhere. What do you think?
>
I do think the mutate needs to be done as late as possible. I'm not even entirely certain that we won't need to figure out a way to communicate the FP constraints beyond instruction selection.
Would it be possible to have the mutateStrictFPToFP call (in its current location) call a target-specific hook to get a target-specific mutated node, so that we could convert directly to X86ISD::FMADD there?
Also, have you considered how non-X86 architectures need to handle this case?
Repository:
rL LLVM
https://reviews.llvm.org/D36335
More information about the llvm-commits
mailing list