[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Wed Aug 6 11:30:17 PDT 2014

Hi Samuel,

I don't think clang follows what gcc does regarding FMA - at least by
default. I don't have a PPC compiler to test with, but for x86-64 using
clang trunk and gcc 4.9:

$ cat fma.c
float foo(float x, float y, float z) { return x * y + z; }

$ ./clang -march=core-avx2 -O2 -S fma.c -o - | grep ss
    vmulss    %xmm1, %xmm0, %xmm0
    vaddss    %xmm2, %xmm0, %xmm0

$ ./gcc -march=core-avx2 -O2 -S fma.c -o - | grep ss
    vfmadd132ss    %xmm1, %xmm2, %xmm0

----------------------------------------------------------------------

This was brought up in Dec 2013 on this list:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-December/068868.html

I don't see an answer as to whether this is a bug for all the other
compilers, a deficiency in clang's default settings, or just an
implementation choice.

Sanjay

On Thu, Jul 31, 2014 at 9:50 AM, Samuel F Antao <sfantao at us.ibm.com> wrote:

> Hi Tim,
>
> Thanks for the thorough explanation. It makes perfect sense.
>
> I was not aware fast-math is supposed to prevent more precision being used
> than what is in the standard.
>
> I came across this issue while looking into the output or different
> compilers. XL and Microsoft compiler seem
> to have that turned on by default. But I assume that clang follows what
> gcc does, and have that turned off.
>
> Thanks again,
> Samuel
>
> Tim Northover <t.p.northover at gmail.com> wrote on 07/31/2014 09:54:55 AM:
>
> > From: Tim Northover <t.p.northover at gmail.com>
> > To: Samuel F Antao/Watson/IBM at IBMUS
> > Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>, Olivier H
> > Sallenave/Watson/IBM at IBMUS
> > Date: 07/31/2014 09:55 AM
> > Subject: Re: [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
>
> >
> > Hi Samuel,
> >
> > On 30 July 2014 22:37, Samuel F Antao <sfantao at us.ibm.com> wrote:
> > > In the DAGCombiner, during the combination of mul and add/subtract into
> > > multiply-and-add/subtract, this option is expected to be Fast in order
> to
> > > enable the combine. This means, that by default no multiply-and-add
> opcodes
> > > are going to be generated. If I understand it correctly, this is
> undesirable
> > > given that multiply-and-add for targets like PPC (I am not sure about
> all
> > > the other targets) does not pose any rounding problem and it can even
> be
> > > more accurate than performing the two operations separately.
> >
> > That extra precision is actually what we're being very careful to
> > avoid unless specifically told we're allowed. It can be just as
> > harmful to carefully written floating-point code as dropping precision
> > would be.
> >
> > > Also, in TargetOptions.h I read:
> > >
> > > Standard, // Only allow fusion of 'blessed' ops (currently just
> fmuladd)
> > >
> > > which made me suspect that the check against Fast in the DAGCombiner
> is not
> > > correct.
> >
> > I think it's OK. In the IR there are 3 different ways to express mul +
> add:
> >
> > 1. fmul + fadd. This must not be fused into a single step without
> > intermediate rounding (unless we're in Fast mode).
> > 2. call @llvm.fmuladd. This *may* be fused or not, depending on
> > profitability (unless we're in Strict mode, in which case it's
> > separate).
> > 3. call @llvm.fma. This must not be split into two operations (unless
> > we're in Fast mode).
> >
> > That middle one is there because C actually allows you to allow &
> > disallow contraction within a limited region with "#pragma STDC
> > FP_CONTRACT ON". So we need a way to represent the idea that it's not
> > usually OK to fuse them (i.e. not Fast mode), but this particular one
> > actually is OK.
> >
> > Cheers.
> >
> > Tim.
> >
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140806/d52031f0/attachment.html>