Improve FMA folding

Hal Finkel hfinkel at anl.gov
Tue Sep 16 15:16:03 PDT 2014


----- Original Message -----
> From: "Olivier H Sallenave" <ohsallen at us.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at cs.uiuc.edu, "Samuel F Antao" <sfantao at us.ibm.com>
> Sent: Tuesday, September 16, 2014 5:08:37 PM
> Subject: Re: Improve FMA folding
> 
> 
> 
> Hal,
> 
> Thanks for the quick review. I don't have commit access.

Okay, please send an updated patch and I'll commit for you.

 -Hal

> 
> Olivier
> 
> Sent from my iPhone
> 
> > On Sep 16, 2014, at 5:41 PM, "Hal Finkel" <hfinkel at anl.gov> wrote:
> > 
> > Hi,
> > 
> > +bool PPCTargetLowering::enableAggressiveFMAFusion(EVT VT) const {
> > + return VT.isFloatingPoint();
> > +}
> > 
> > This should never be called on anything but floating-point types
> > (we don't have integer FMA nodes). I'd write this as:
> > 
> > bool PPCTargetLowering::enableAggressiveFMAFusion(EVT VT) const {
> > assert(VT.isFloatingPoint() && "Non-floating-point FMA?");
> > return true;
> > }
> > 
> > otherwise, LGTM. Do you have commit access?
> > 
> > -Hal
> > 
> > ----- Original Message -----
> > > From: "Olivier H Sallenave" <ohsallen at us.ibm.com>
> > > To: "Hal Finkel" <hfinkel at anl.gov>
> > > Cc: llvm-commits at cs.uiuc.edu, "Samuel F Antao"
> > > <sfantao at us.ibm.com>
> > > Sent: Tuesday, September 16, 2014 4:28:27 PM
> > > Subject: Re: Improve FMA folding
> > > 
> > > 
> > > 
> > > Hi Hal,
> > > 
> > > Thanks for the advice. Attached is the suggested change.
> > > 
> > > Cheers,
> > > Olivier
> > > 
> > > (See attached file: fma.diff)
> > > 
> > > Inactive hide details for Hal Finkel ---09/16/2014 02:02:07
> > > AM---Hi
> > > Olivier, I don't want to duplicate the DAGCombine logic in Hal
> > > Finkel ---09/16/2014 02:02:07 AM---Hi Olivier, I don't want to
> > > duplicate the DAGCombine logic in the PPC backend. Instead,
> > > please
> > > add a
> > > 
> > > From: Hal Finkel <hfinkel at anl.gov>
> > > To: Olivier H Sallenave/Watson/IBM at IBMUS
> > > Cc: Samuel F Antao/Watson/IBM at IBMUS, <llvm-commits at cs.uiuc.edu>
> > > Date: 09/16/2014 02:02 AM
> > > Subject: Re: Improve FMA folding
> > > 
> > > 
> > > 
> > > 
> > > Hi Olivier,
> > > 
> > > I don't want to duplicate the DAGCombine logic in the PPC
> > > backend.
> > > Instead, please add a TLI callback to optionally enable the
> > > transformation in DAGCombine. Make this callback take the value
> > > type
> > > as a parameter, and return false for all types by default. Then
> > > override this callback in PPCISelLowering to return true (perhaps
> > > for all types). You can call it something like
> > > enableAggressiveFMAFormation, or
> > > enablePartialReplacementWithFMAs,
> > > etc.
> > > 
> > > Thanks again,
> > > Hal
> > > 
> > > ----- Original Message -----
> > > > From: "Olivier H Sallenave" <ohsallen at us.ibm.com>
> > > > To: llvm-commits at cs.uiuc.edu
> > > > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Samuel F Antao"
> > > > <sfantao at us.ibm.com>
> > > > Sent: Wednesday, September 10, 2014 10:41:01 AM
> > > > Subject: Fw: Improve FMA folding
> > > > 
> > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > We should indeed check that the number of cycles has been
> > > > reduced
> > > > (through TTI::getArithmeticInstrCost). However, AArch64 doesn't
> > > > override the cost of FMA, FADD and FMUL, therefore they all
> > > > cost 2
> > > > by default.
> > > > 
> > > > Instead, here is a patch specific to the PPC backend. I
> > > > followed
> > > > Kevin's proposal to always combine, which should be fine for
> > > > this
> > > > target.
> > > > 
> > > > Olivier
> > > > 
> > > > (See attached file: fma-ppc.diff)
> > > > 
> > > > ----- Forwarded by Olivier H Sallenave/Watson/IBM on 09/10/2014
> > > > 11:26
> > > > AM -----
> > > > 
> > > > From: Olivier H Sallenave/Watson/IBM
> > > > To: Hal Finkel <hfinkel at anl.gov>
> > > > Cc: llvm-commits at cs.uiuc.edu
> > > > Date: 09/04/2014 05:26 PM
> > > > Subject: Re: Improve FMA folding
> > > > 
> > > > 
> > > > 
> > > > Hi Hal,
> > > > 
> > > > Thanks for your feedback. Attached is the patch with the
> > > > corrections
> > > > and the regression test.
> > > > 
> > > > Regarding the recursive call, I believe it is fine since it's
> > > > guarded
> > > > by some conditions which ensure it can only take an FNEG node
> > > > as
> > > > parameter — therefore the depth of the recursion cannot be
> > > > greater
> > > > than 1. If you still don't like it, we can replace the
> > > > recursive
> > > > call with an iteration over the uses of the FNEG node.
> > > > 
> > > > More importantly, I found out that the following tests now
> > > > fail:
> > > > CodeGen/AArch64/fp-dp3.ll
> > > > CodeGen/AArch64/neon-fma.ll
> > > > 
> > > > The failing tests are basically the same, and check explicitly
> > > > that
> > > > FMUL isn't combined with two distinct FADD (see below). I am
> > > > wondering why ARM doesn't want to combine here. It seems that
> > > > FMUL
> > > > and FMADD take 2 cycles, whereas FADD takes 1 cycle. Could that
> > > > be
> > > > the reason? If so, maybe we could check the latency of the
> > > > instructions we combine?
> > > > 
> > > > ; Another set of tests that check for multiply single use
> > > > 
> > > > define float @test_fmadd_unfused_su(float %a, float %b, float
> > > > %c) {
> > > > ; CHECK-LABEL: test_fmadd_unfused_su:
> > > > %prod = fmul float %b, %c
> > > > %sum = fadd float %a, %prod
> > > > %res = fadd float %sum, %prod
> > > > ; CHECK-NOT: fmadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}},
> > > > {{s[0-9]+}}
> > > > ; CHECK: fmul {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> > > > ; CHECK: fadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> > > > ; CHECK: fadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> > > > ret float %res
> > > > }
> > > > 
> > > > Thanks very much for your help,
> > > > Olivier
> > > > 
> > > > (See attached file: fold-fma.diff)
> > > > 
> > > > Inactive hide details for Hal Finkel ---09/03/2014 03:29:08
> > > > PM---Hi
> > > > Olivier, Thanks for working on this. A few comments:Hal Finkel
> > > > ---09/03/2014 03:29:08 PM---Hi Olivier, Thanks for working on
> > > > this.
> > > > A few comments:
> > > > 
> > > > From: Hal Finkel <hfinkel at anl.gov>
> > > > To: Olivier H Sallenave/Watson/IBM at IBMUS
> > > > Cc: <llvm-commits at cs.uiuc.edu>
> > > > Date: 09/03/2014 03:29 PM
> > > > Subject: Re: Improve FMA folding
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Hi Olivier,
> > > > 
> > > > Thanks for working on this. A few comments:
> > > > 
> > > > +// operation. This is used to ensure that combining into FMA
> > > > will
> > > > actually
> > > > 
> > > > into FMA -> into an FMA
> > > > 
> > > > + if (N->getOpcode() != ISD::FMUL &&
> > > > + N->getOpcode() != ISD::FNEG)
> > > > + return false;
> > > > 
> > > > These conditions guard every call to the function, so this can
> > > > be
> > > > an
> > > > assert; something like this:
> > > > 
> > > > assert((N->getOpcode() == ISD::FMUL || N->getOpcode() ==
> > > > ISD::FNEG)
> > > > &&
> > > > "Expecting fmul or fneg node");
> > > > 
> > > > + else if (!isAlwaysFoldedIntoFMA(*UI))
> > > > + return false;
> > > > 
> > > > Recursion here seems overly-general. Is that the best way to do
> > > > it?
> > > > Should you have a depth limit?
> > > > 
> > > > And finally, you'll need an LLVM IR-level CodeGen regression
> > > > test.
> > > > Look in test/CodeGen/PowerPC or test/CodeGen/X86 (it should be
> > > > something like test/CodeGen/PowerPC/fma.ll).
> > > > 
> > > > Thanks,
> > > > Hal
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Olivier H Sallenave" <ohsallen at us.ibm.com>
> > > > > To: llvm-commits at cs.uiuc.edu
> > > > > Sent: Wednesday, September 3, 2014 11:21:53 AM
> > > > > Subject: Improve FMA folding
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Here is a patch to improve FMA folding in DAGCombiner.cpp.
> > > > > 
> > > > > Previously, when looking at a FMUL and a FADD, it would
> > > > > combine
> > > > > them
> > > > > into FMA if the FMUL is only used by the FADD. This was to
> > > > > ensure
> > > > > that the FMUL will be removed, meaning that the number of
> > > > > instructions will actually be reduced. However, if the FMUL
> > > > > has
> > > > > many
> > > > > uses but can be combined with each one of them (i.e. they are
> > > > > either
> > > > > FADD or FSUB), it should still be combined. Attached is a
> > > > > patch
> > > > > which checks all uses of the FMUL.
> > > > > 
> > > > > Olivier
> > > > > 
> > > > > 
> > > > > (See attached file: fma.diff) (See attached file: test-fma.c)
> > > > > 
> > > > > ----- Forwarded by Olivier H Sallenave/Watson/IBM on
> > > > > 09/03/2014
> > > > > 12:08
> > > > > PM -----
> > > > > 
> > > > > From: Hal Finkel <hfinkel at anl.gov>
> > > > > To: Kevin K O'Brien/Watson/IBM at IBMUS
> > > > > Cc: Samuel F Antao/Watson/IBM at IBMUS, <llvmdev at cs.uiuc.edu>,
> > > > > Olivier
> > > > > H
> > > > > Sallenave/Watson/IBM at IBMUS
> > > > > Date: 08/26/2014 11:33 AM
> > > > > S ubject: Re: [LLVMdev] Multiply-add combining
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Kevin, Olivier,
> > > > > 
> > > > > The target independent heuristic could certainly check all
> > > > > uses,
> > > > > patches welcome. We could also consider each case separately,
> > > > > as
> > > > > Kevin suggests, but that might not be optimal on targets with
> > > > > only
> > > > > one floating-point pipeline, so we'd need to make it opt-in.
> > > > > You
> > > > > should also look at the MachineCombiner Pass (added in
> > > > > r214832,
> > > > > currently only used by the ARM backend I think) the tries to
> > > > > solve
> > > > > this problem in a more-sophisticated way.
> > > > > 
> > > > > -Hal
> > > > > 
> > > > > ----- Original Message -----
> > > > > > From: "Kevin K O'Brien" <caomhin at us.ibm.com>
> > > > > > To: "Olivier H Sallenave" <ohsallen at us.ibm.com>
> > > > > > Cc: "Samuel F Antao" <sfantao at us.ibm.com>,
> > > > > > llvmdev at cs.uiuc.edu
> > > > > > Sent: Tuesday, August 26, 2014 10:16:23 AM
> > > > > > Subject: Re: [LLVMdev] Multiply-add combining
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Hi Olivier,
> > > > > > I think we discussed this last Thursday? My feeling is that
> > > > > > each
> > > > > > use
> > > > > > of the multiply can be considered separately. If it can be
> > > > > > combined,
> > > > > > then we should do so. The multiply should be left in place
> > > > > > and
> > > > > > removed by a dead code elimination pass sometime later.
> > > > > > This is
> > > > > > what
> > > > > > TOBEY does. If you want me to explain the XL method in more
> > > > > > detail,
> > > > > > come talk to me.
> > > > > > 
> > > > > > Kevin
> > > > > > ----------------------------------------------
> > > > > > Kevin O'Brien
> > > > > > Manager, Advanced Compiler Technology
> > > > > > IBM T.J Watson Research Center, Yorktown Heights, NY
> > > > > > 
> > > > > > Inactive hide details for Olivier H Sallenave---08/26/2014
> > > > > > 11:12:04
> > > > > > AM---Hi, I tried to compile the following using
> > > > > > -ffp-contraOlivier
> > > > > > H
> > > > > > Sallenave---08/26/2014 11:12:04 AM---Hi, I tried to compile
> > > > > > the
> > > > > > following using -ffp-contract=fast:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > From:
> > > > > > Olivier H Sallenave/Watson/IBM
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > To:
> > > > > > llvmdev at cs.uiuc.edu,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Cc:
> > > > > > Samuel F Antao/Watson/IBM at IBMUS, Kevin K
> > > > > > O'Brien/Watson/IBM at IBMUS
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Date:
> > > > > > 08/26/2014 11:12 AM
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Subject:
> > > > > > Multiply-add combining
> > > > > > 
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > I tried to compile the following using -ffp-contract=fast:
> > > > > > 
> > > > > > %mul = fmul double %sub5, %x
> > > > > > %add = fadd double %add6, %mul
> > > > > > %sub = fsub double %sub5, %mul
> > > > > > 
> > > > > > I expected fadd and fsub to be contracted with fmul, which
> > > > > > didn't
> > > > > > happen.
> > > > > > 
> > > > > > When looking in DAGCombiner.cpp, it appears the result of
> > > > > > the
> > > > > > fmul
> > > > > > needs to be used only once, which isn't the case here as it
> > > > > > is
> > > > > > used
> > > > > > by both the fadd and the fsub:
> > > > > > 
> > > > > > // fold (fadd (fmul x, y), z) -> (fma x, y, z)
> > > > > > if (N0.getOpcode() == ISD::FMUL && N0.hasOneUse())
> > > > > > return DAG.getNode(ISD::FMA, SDLoc(N), VT,
> > > > > > N0.getOperand(0),
> > > > > > N0.getOperand(1), N1);
> > > > > > 
> > > > > > This heuristic looks a little conservative, could we
> > > > > > instead
> > > > > > check
> > > > > > that every instruction using the result of the fmul are
> > > > > > combinable
> > > > > > (i.e., they are either fadd or fsub)?
> > > > > > 
> > > > > > 
> > > > > > Thanks in advance,
> > > > > > Olivier
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > LLVM Developers mailing list
> > > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > > > > 
> > > > > 
> > > > > --
> > > > > Hal Finkel
> > > > > Assistant Computational Scientist
> > > > > Leadership Computing Facility
> > > > > Argonne National Laboratory
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > llvm-commits mailing list
> > > > > llvm-commits at cs.uiuc.edu
> > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > > > 
> > > > 
> > > > --
> > > > Hal Finkel
> > > > Assistant Computational Scientist
> > > > Leadership Computing Facility
> > > > Argonne National Laboratory
> > > > 
> > > > 
> > > > 
> > > 
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > > 
> > > 
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-commits mailing list