[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Thu Jul 31 06:54:55 PDT 2014

Hi Samuel,

On 30 July 2014 22:37, Samuel F Antao <sfantao at us.ibm.com> wrote:
> In the DAGCombiner, during the combination of mul and add/subtract into
> multiply-and-add/subtract, this option is expected to be Fast in order to
> enable the combine. This means, that by default no multiply-and-add opcodes
> are going to be generated. If I understand it correctly, this is undesirable
> given that multiply-and-add for targets like PPC (I am not sure about all
> the other targets) does not pose any rounding problem and it can even be
> more accurate than performing the two operations separately.

That extra precision is actually what we're being very careful to
avoid unless specifically told we're allowed. It can be just as
harmful to carefully written floating-point code as dropping precision
would be.

> Also, in TargetOptions.h I read:
>
> Standard, // Only allow fusion of 'blessed' ops (currently just fmuladd)
>
> which made me suspect that the check against Fast in the DAGCombiner is not
> correct.

I think it's OK. In the IR there are 3 different ways to express mul + add:

1. fmul + fadd. This must not be fused into a single step without
intermediate rounding (unless we're in Fast mode).
2. call @llvm.fmuladd. This *may* be fused or not, depending on
profitability (unless we're in Strict mode, in which case it's
separate).
3. call @llvm.fma. This must not be split into two operations (unless
we're in Fast mode).

That middle one is there because C actually allows you to allow &
disallow contraction within a limited region with "#pragma STDC
FP_CONTRACT ON". So we need a way to represent the idea that it's not
usually OK to fuse them (i.e. not Fast mode), but this particular one
actually is OK.

Cheers.

Tim.