<html><body>


<p><font size="2" face="sans-serif">Thanks Hal.</font><br>


<br>


<font size="2" face="sans-serif">Olivier</font><br>


<br>


<i>(See attached file: fma.diff)</i><br>


<br>


<img width="16" height="16" src="cid:2__=0ABBF7C6DFE815C08f9e8a93df938@us.ibm.com" border="0" alt="Inactive hide details for Hal Finkel ---09/16/2014 06:16:08 PM-------- Original Message ----- > From: "Olivier H Sallenave" <oh"><font size="2" color="#424282" face="sans-serif">Hal Finkel ---09/16/2014 06:16:08 PM-------- Original Message ----- > From: "Olivier H Sallenave" <ohsallen@us.ibm.com></font><br>


<br>


<font size="1" color="#5F5F5F" face="sans-serif">From:      </font><font size="1" face="sans-serif">Hal Finkel <hfinkel@anl.gov></font><br>


<font size="1" color="#5F5F5F" face="sans-serif">To:        </font><font size="1" face="sans-serif">Olivier H Sallenave/Watson/IBM@IBMUS</font><br>


<font size="1" color="#5F5F5F" face="sans-serif">Cc:        </font><font size="1" face="sans-serif"><llvm-commits@cs.uiuc.edu>, Samuel F Antao/Watson/IBM@IBMUS</font><br>


<font size="1" color="#5F5F5F" face="sans-serif">Date:      </font><font size="1" face="sans-serif">09/16/2014 06:16 PM</font><br>


<font size="1" color="#5F5F5F" face="sans-serif">Subject:   </font><font size="1" face="sans-serif">Re: Improve FMA folding</font><br>


<hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br>


<br>


<br>


<tt><font size="2">----- Original Message -----<br>


> From: "Olivier H Sallenave" <ohsallen@us.ibm.com><br>


> To: "Hal Finkel" <hfinkel@anl.gov><br>


> Cc: llvm-commits@cs.uiuc.edu, "Samuel F Antao" <sfantao@us.ibm.com><br>


> Sent: Tuesday, September 16, 2014 5:08:37 PM<br>


> Subject: Re: Improve FMA folding<br>


> <br>


> <br>


> <br>


> Hal,<br>


> <br>


> Thanks for the quick review. I don't have commit access.<br>


<br>


Okay, please send an updated patch and I'll commit for you.<br>


<br>


 -Hal<br>


<br>


> <br>


> Olivier<br>


> <br>


> Sent from my iPhone<br>


> <br>


> > On Sep 16, 2014, at 5:41 PM, "Hal Finkel" <hfinkel@anl.gov> wrote:<br>


> > <br>


> > Hi,<br>


> > <br>


> > +bool PPCTargetLowering::enableAggressiveFMAFusion(EVT VT) const {<br>


> > + return VT.isFloatingPoint();<br>


> > +}<br>


> > <br>


> > This should never be called on anything but floating-point types<br>


> > (we don't have integer FMA nodes). I'd write this as:<br>


> > <br>


> > bool PPCTargetLowering::enableAggressiveFMAFusion(EVT VT) const {<br>


> > assert(VT.isFloatingPoint() && "Non-floating-point FMA?");<br>


> > return true;<br>


> > }<br>


> > <br>


> > otherwise, LGTM. Do you have commit access?<br>


> > <br>


> > -Hal<br>


> > <br>


> > ----- Original Message -----<br>


> > > From: "Olivier H Sallenave" <ohsallen@us.ibm.com><br>


> > > To: "Hal Finkel" <hfinkel@anl.gov><br>


> > > Cc: llvm-commits@cs.uiuc.edu, "Samuel F Antao"<br>


> > > <sfantao@us.ibm.com><br>


> > > Sent: Tuesday, September 16, 2014 4:28:27 PM<br>


> > > Subject: Re: Improve FMA folding<br>


> > > <br>


> > > <br>


> > > <br>


> > > Hi Hal,<br>


> > > <br>


> > > Thanks for the advice. Attached is the suggested change.<br>


> > > <br>


> > > Cheers,<br>


> > > Olivier<br>


> > > <br>


> > > (See attached file: fma.diff)<br>


> > > <br>


> > > Inactive hide details for Hal Finkel ---09/16/2014 02:02:07<br>


> > > AM---Hi<br>


> > > Olivier, I don't want to duplicate the DAGCombine logic in Hal<br>


> > > Finkel ---09/16/2014 02:02:07 AM---Hi Olivier, I don't want to<br>


> > > duplicate the DAGCombine logic in the PPC backend. Instead,<br>


> > > please<br>


> > > add a<br>


> > > <br>


> > > From: Hal Finkel <hfinkel@anl.gov><br>


> > > To: Olivier H Sallenave/Watson/IBM@IBMUS<br>


> > > Cc: Samuel F Antao/Watson/IBM@IBMUS, <llvm-commits@cs.uiuc.edu><br>


> > > Date: 09/16/2014 02:02 AM<br>


> > > Subject: Re: Improve FMA folding<br>


> > > <br>


> > > <br>


> > > <br>


> > > <br>


> > > Hi Olivier,<br>


> > > <br>


> > > I don't want to duplicate the DAGCombine logic in the PPC<br>


> > > backend.<br>


> > > Instead, please add a TLI callback to optionally enable the<br>


> > > transformation in DAGCombine. Make this callback take the value<br>


> > > type<br>


> > > as a parameter, and return false for all types by default. Then<br>


> > > override this callback in PPCISelLowering to return true (perhaps<br>


> > > for all types). You can call it something like<br>


> > > enableAggressiveFMAFormation, or<br>


> > > enablePartialReplacementWithFMAs,<br>


> > > etc.<br>


> > > <br>


> > > Thanks again,<br>


> > > Hal<br>


> > > <br>


> > > ----- Original Message -----<br>


> > > > From: "Olivier H Sallenave" <ohsallen@us.ibm.com><br>


> > > > To: llvm-commits@cs.uiuc.edu<br>


> > > > Cc: "Hal Finkel" <hfinkel@anl.gov>, "Samuel F Antao"<br>


> > > > <sfantao@us.ibm.com><br>


> > > > Sent: Wednesday, September 10, 2014 10:41:01 AM<br>


> > > > Subject: Fw: Improve FMA folding<br>


> > > > <br>


> > > > <br>


> > > > <br>


> > > > Hi,<br>


> > > > <br>


> > > > We should indeed check that the number of cycles has been<br>


> > > > reduced<br>


> > > > (through TTI::getArithmeticInstrCost). However, AArch64 doesn't<br>


> > > > override the cost of FMA, FADD and FMUL, therefore they all<br>


> > > > cost 2<br>


> > > > by default.<br>


> > > > <br>


> > > > Instead, here is a patch specific to the PPC backend. I<br>


> > > > followed<br>


> > > > Kevin's proposal to always combine, which should be fine for<br>


> > > > this<br>


> > > > target.<br>


> > > > <br>


> > > > Olivier<br>


> > > > <br>


> > > > (See attached file: fma-ppc.diff)<br>


> > > > <br>


> > > > ----- Forwarded by Olivier H Sallenave/Watson/IBM on 09/10/2014<br>


> > > > 11:26<br>


> > > > AM -----<br>


> > > > <br>


> > > > From: Olivier H Sallenave/Watson/IBM<br>


> > > > To: Hal Finkel <hfinkel@anl.gov><br>


> > > > Cc: llvm-commits@cs.uiuc.edu<br>


> > > > Date: 09/04/2014 05:26 PM<br>


> > > > Subject: Re: Improve FMA folding<br>


> > > > <br>


> > > > <br>


> > > > <br>


> > > > Hi Hal,<br>


> > > > <br>


> > > > Thanks for your feedback. Attached is the patch with the<br>


> > > > corrections<br>


> > > > and the regression test.<br>


> > > > <br>


> > > > Regarding the recursive call, I believe it is fine since it's<br>


> > > > guarded<br>


> > > > by some conditions which ensure it can only take an FNEG node<br>


> > > > as<br>


> > > > parameter — therefore the depth of the recursion cannot be<br>


> > > > greater<br>


> > > > than 1. If you still don't like it, we can replace the<br>


> > > > recursive<br>


> > > > call with an iteration over the uses of the FNEG node.<br>


> > > > <br>


> > > > More importantly, I found out that the following tests now<br>


> > > > fail:<br>


> > > > CodeGen/AArch64/fp-dp3.ll<br>


> > > > CodeGen/AArch64/neon-fma.ll<br>


> > > > <br>


> > > > The failing tests are basically the same, and check explicitly<br>


> > > > that<br>


> > > > FMUL isn't combined with two distinct FADD (see below). I am<br>


> > > > wondering why ARM doesn't want to combine here. It seems that<br>


> > > > FMUL<br>


> > > > and FMADD take 2 cycles, whereas FADD takes 1 cycle. Could that<br>


> > > > be<br>


> > > > the reason? If so, maybe we could check the latency of the<br>


> > > > instructions we combine?<br>


> > > > <br>


> > > > ; Another set of tests that check for multiply single use<br>


> > > > <br>


> > > > define float @test_fmadd_unfused_su(float %a, float %b, float<br>


> > > > %c) {<br>


> > > > ; CHECK-LABEL: test_fmadd_unfused_su:<br>


> > > > %prod = fmul float %b, %c<br>


> > > > %sum = fadd float %a, %prod<br>


> > > > %res = fadd float %sum, %prod<br>


> > > > ; CHECK-NOT: fmadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}},<br>


> > > > {{s[0-9]+}}<br>


> > > > ; CHECK: fmul {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}<br>


> > > > ; CHECK: fadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}<br>


> > > > ; CHECK: fadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}<br>


> > > > ret float %res<br>


> > > > }<br>


> > > > <br>


> > > > Thanks very much for your help,<br>


> > > > Olivier<br>


> > > > <br>


> > > > (See attached file: fold-fma.diff)<br>


> > > > <br>


> > > > Inactive hide details for Hal Finkel ---09/03/2014 03:29:08<br>


> > > > PM---Hi<br>


> > > > Olivier, Thanks for working on this. A few comments:Hal Finkel<br>


> > > > ---09/03/2014 03:29:08 PM---Hi Olivier, Thanks for working on<br>


> > > > this.<br>


> > > > A few comments:<br>


> > > > <br>


> > > > From: Hal Finkel <hfinkel@anl.gov><br>


> > > > To: Olivier H Sallenave/Watson/IBM@IBMUS<br>


> > > > Cc: <llvm-commits@cs.uiuc.edu><br>


> > > > Date: 09/03/2014 03:29 PM<br>


> > > > Subject: Re: Improve FMA folding<br>


> > > > <br>


> > > > <br>


> > > > <br>


> > > > <br>


> > > > Hi Olivier,<br>


> > > > <br>


> > > > Thanks for working on this. A few comments:<br>


> > > > <br>


> > > > +// operation. This is used to ensure that combining into FMA<br>


> > > > will<br>


> > > > actually<br>


> > > > <br>


> > > > into FMA -> into an FMA<br>


> > > > <br>


> > > > + if (N->getOpcode() != ISD::FMUL &&<br>


> > > > + N->getOpcode() != ISD::FNEG)<br>


> > > > + return false;<br>


> > > > <br>


> > > > These conditions guard every call to the function, so this can<br>


> > > > be<br>


> > > > an<br>


> > > > assert; something like this:<br>


> > > > <br>


> > > > assert((N->getOpcode() == ISD::FMUL || N->getOpcode() ==<br>


> > > > ISD::FNEG)<br>


> > > > &&<br>


> > > > "Expecting fmul or fneg node");<br>


> > > > <br>


> > > > + else if (!isAlwaysFoldedIntoFMA(*UI))<br>


> > > > + return false;<br>


> > > > <br>


> > > > Recursion here seems overly-general. Is that the best way to do<br>


> > > > it?<br>


> > > > Should you have a depth limit?<br>


> > > > <br>


> > > > And finally, you'll need an LLVM IR-level CodeGen regression<br>


> > > > test.<br>


> > > > Look in test/CodeGen/PowerPC or test/CodeGen/X86 (it should be<br>


> > > > something like test/CodeGen/PowerPC/fma.ll).<br>


> > > > <br>


> > > > Thanks,<br>


> > > > Hal<br>


> > > > <br>


> > > > ----- Original Message -----<br>


> > > > > From: "Olivier H Sallenave" <ohsallen@us.ibm.com><br>


> > > > > To: llvm-commits@cs.uiuc.edu<br>


> > > > > Sent: Wednesday, September 3, 2014 11:21:53 AM<br>


> > > > > Subject: Improve FMA folding<br>


> > > > > <br>


> > > > > <br>


> > > > > <br>


> > > > > <br>


> > > > > Hi,<br>


> > > > > <br>


> > > > > Here is a patch to improve FMA folding in DAGCombiner.cpp.<br>


> > > > > <br>


> > > > > Previously, when looking at a FMUL and a FADD, it would<br>


> > > > > combine<br>


> > > > > them<br>


> > > > > into FMA if the FMUL is only used by the FADD. This was to<br>


> > > > > ensure<br>


> > > > > that the FMUL will be removed, meaning that the number of<br>


> > > > > instructions will actually be reduced. However, if the FMUL<br>


> > > > > has<br>


> > > > > many<br>


> > > > > uses but can be combined with each one of them (i.e. they are<br>


> > > > > either<br>


> > > > > FADD or FSUB), it should still be combined. Attached is a<br>


> > > > > patch<br>


> > > > > which checks all uses of the FMUL.<br>


> > > > > <br>


> > > > > Olivier<br>


> > > > > <br>


> > > > > <br>


> > > > > (See attached file: fma.diff) (See attached file: test-fma.c)<br>


> > > > > <br>


> > > > > ----- Forwarded by Olivier H Sallenave/Watson/IBM on<br>


> > > > > 09/03/2014<br>


> > > > > 12:08<br>


> > > > > PM -----<br>


> > > > > <br>


> > > > > From: Hal Finkel <hfinkel@anl.gov><br>


> > > > > To: Kevin K O'Brien/Watson/IBM@IBMUS<br>


> > > > > Cc: Samuel F Antao/Watson/IBM@IBMUS, <llvmdev@cs.uiuc.edu>,<br>


> > > > > Olivier<br>


> > > > > H<br>


> > > > > Sallenave/Watson/IBM@IBMUS<br>


> > > > > Date: 08/26/2014 11:33 AM<br>


> > > > > S ubject: Re: [LLVMdev] Multiply-add combining<br>


> > > > > <br>


> > > > > <br>


> > > > > <br>


> > > > > <br>


> > > > > Kevin, Olivier,<br>


> > > > > <br>


> > > > > The target independent heuristic could certainly check all<br>


> > > > > uses,<br>


> > > > > patches welcome. We could also consider each case separately,<br>


> > > > > as<br>


> > > > > Kevin suggests, but that might not be optimal on targets with<br>


> > > > > only<br>


> > > > > one floating-point pipeline, so we'd need to make it opt-in.<br>


> > > > > You<br>


> > > > > should also look at the MachineCombiner Pass (added in<br>


> > > > > r214832,<br>


> > > > > currently only used by the ARM backend I think) the tries to<br>


> > > > > solve<br>


> > > > > this problem in a more-sophisticated way.<br>


> > > > > <br>


> > > > > -Hal<br>


> > > > > <br>


> > > > > ----- Original Message -----<br>


> > > > > > From: "Kevin K O'Brien" <caomhin@us.ibm.com><br>


> > > > > > To: "Olivier H Sallenave" <ohsallen@us.ibm.com><br>


> > > > > > Cc: "Samuel F Antao" <sfantao@us.ibm.com>,<br>


> > > > > > llvmdev@cs.uiuc.edu<br>


> > > > > > Sent: Tuesday, August 26, 2014 10:16:23 AM<br>


> > > > > > Subject: Re: [LLVMdev] Multiply-add combining<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > Hi Olivier,<br>


> > > > > > I think we discussed this last Thursday? My feeling is that<br>


> > > > > > each<br>


> > > > > > use<br>


> > > > > > of the multiply can be considered separately. If it can be<br>


> > > > > > combined,<br>


> > > > > > then we should do so. The multiply should be left in place<br>


> > > > > > and<br>


> > > > > > removed by a dead code elimination pass sometime later.<br>


> > > > > > This is<br>


> > > > > > what<br>


> > > > > > TOBEY does. If you want me to explain the XL method in more<br>


> > > > > > detail,<br>


> > > > > > come talk to me.<br>


> > > > > > <br>


> > > > > > Kevin<br>


> > > > > > ----------------------------------------------<br>


> > > > > > Kevin O'Brien<br>


> > > > > > Manager, Advanced Compiler Technology<br>


> > > > > > IBM T.J Watson Research Center, Yorktown Heights, NY<br>


> > > > > > <br>


> > > > > > Inactive hide details for Olivier H Sallenave---08/26/2014<br>


> > > > > > 11:12:04<br>


> > > > > > AM---Hi, I tried to compile the following using<br>


> > > > > > -ffp-contraOlivier<br>


> > > > > > H<br>


> > > > > > Sallenave---08/26/2014 11:12:04 AM---Hi, I tried to compile<br>


> > > > > > the<br>


> > > > > > following using -ffp-contract=fast:<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > From:<br>


> > > > > > Olivier H Sallenave/Watson/IBM<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > To:<br>


> > > > > > llvmdev@cs.uiuc.edu,<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > Cc:<br>


> > > > > > Samuel F Antao/Watson/IBM@IBMUS, Kevin K<br>


> > > > > > O'Brien/Watson/IBM@IBMUS<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > Date:<br>


> > > > > > 08/26/2014 11:12 AM<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > <br>


> > > > > > Subject:<br>


> > > > > > Multiply-add combining<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > Hi,<br>


> > > > > > <br>


> > > > > > I tried to compile the following using -ffp-contract=fast:<br>


> > > > > > <br>


> > > > > > %mul = fmul double %sub5, %x<br>


> > > > > > %add = fadd double %add6, %mul<br>


> > > > > > %sub = fsub double %sub5, %mul<br>


> > > > > > <br>


> > > > > > I expected fadd and fsub to be contracted with fmul, which<br>


> > > > > > didn't<br>


> > > > > > happen.<br>


> > > > > > <br>


> > > > > > When looking in DAGCombiner.cpp, it appears the result of<br>


> > > > > > the<br>


> > > > > > fmul<br>


> > > > > > needs to be used only once, which isn't the case here as it<br>


> > > > > > is<br>


> > > > > > used<br>


> > > > > > by both the fadd and the fsub:<br>


> > > > > > <br>


> > > > > > // fold (fadd (fmul x, y), z) -> (fma x, y, z)<br>


> > > > > > if (N0.getOpcode() == ISD::FMUL && N0.hasOneUse())<br>


> > > > > > return DAG.getNode(ISD::FMA, SDLoc(N), VT,<br>


> > > > > > N0.getOperand(0),<br>


> > > > > > N0.getOperand(1), N1);<br>


> > > > > > <br>


> > > > > > This heuristic looks a little conservative, could we<br>


> > > > > > instead<br>


> > > > > > check<br>


> > > > > > that every instruction using the result of the fmul are<br>


> > > > > > combinable<br>


> > > > > > (i.e., they are either fadd or fsub)?<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > Thanks in advance,<br>


> > > > > > Olivier<br>


> > > > > > <br>


> > > > > > <br>


> > > > > > _______________________________________________<br>


> > > > > > LLVM Developers mailing list<br>


> > > > > > LLVMdev@cs.uiuc.edu </font></tt><tt><font size="2"><a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a></font></tt><tt><font size="2"><br>


> > > > > > </font></tt><tt><font size="2"><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></font></tt><tt><font size="2"><br>


> > > > > > <br>


> > > > > <br>


> > > > > --<br>


> > > > > Hal Finkel<br>


> > > > > Assistant Computational Scientist<br>


> > > > > Leadership Computing Facility<br>


> > > > > Argonne National Laboratory<br>


> > > > > <br>


> > > > > <br>


> > > > > _______________________________________________<br>


> > > > > llvm-commits mailing list<br>


> > > > > llvm-commits@cs.uiuc.edu<br>


> > > > > </font></tt><tt><font size="2"><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a></font></tt><tt><font size="2"><br>


> > > > > <br>


> > > > <br>


> > > > --<br>


> > > > Hal Finkel<br>


> > > > Assistant Computational Scientist<br>


> > > > Leadership Computing Facility<br>


> > > > Argonne National Laboratory<br>


> > > > <br>


> > > > <br>


> > > > <br>


> > > <br>


> > > --<br>


> > > Hal Finkel<br>


> > > Assistant Computational Scientist<br>


> > > Leadership Computing Facility<br>


> > > Argonne National Laboratory<br>


> > > <br>


> > > <br>


> > > <br>


> > <br>


> > --<br>


> > Hal Finkel<br>


> > Assistant Computational Scientist<br>


> > Leadership Computing Facility<br>


> > Argonne National Laboratory<br>


> > <br>


> <br>


<br>


-- <br>


Hal Finkel<br>


Assistant Computational Scientist<br>


Leadership Computing Facility<br>


Argonne National Laboratory<br>


<br>


</font></tt><br>


</body></html>