[cfe-dev] question about fused multiply add and Clang GNU modes

Mon Sep 21 07:39:28 PDT 2015

> On Sep 21, 2015, at 9:06 AM, Joerg Sonnenberger via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> On Mon, Sep 21, 2015 at 07:37:50AM -0500, Hal Finkel via cfe-dev wrote:
>> We don't track the C-level expressions in the IR, but Clang will
>> directly form @llvm.fmuladd intrinsics where allowed. CodeGen then
>> converts these into FMA nodes, or expands them into ADD + MUL depending
>> on target hooks.
> 
> Wouldn't that be suboptimal from a CSE PoV? Consider something like:
> 
> r = a + b * c + b * c * d;
> 
> If we are greedy, the b * c would end up as (a + b * c) FMA instrinsic
> and the multiplication computed twice?

For what I would call “modern” hardware FMA implementations (where fma is no more costly than multiply, and often as cheap as an add), this can never be too bad, because adding the different addends to the common products isn’t actually significantly cheaper than doing partially-redundant FMAs, and if the product is re-used in a non-FMA expression, computing FMA and product is no more expensive than product and sum.

There definitely exist some FMA implementations where it’s as expensive as a separate multiply and add, however, and on those machines this *can* indeed be a hazard.

– Steve