[cfe-dev] question about fused multiply add and Clang GNU modes

Mon Sep 21 09:30:30 PDT 2015

On Mon, Sep 21, 2015 at 10:39:28AM -0400, Stephen Canon wrote:
> 
> > On Sep 21, 2015, at 9:06 AM, Joerg Sonnenberger via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> > 
> > On Mon, Sep 21, 2015 at 07:37:50AM -0500, Hal Finkel via cfe-dev wrote:
> >> We don't track the C-level expressions in the IR, but Clang will
> >> directly form @llvm.fmuladd intrinsics where allowed. CodeGen then
> >> converts these into FMA nodes, or expands them into ADD + MUL depending
> >> on target hooks.
> > 
> > Wouldn't that be suboptimal from a CSE PoV? Consider something like:
> > 
> > r = a + b * c + b * c * d;
> > 
> > If we are greedy, the b * c would end up as (a + b * c) FMA instrinsic
> > and the multiplication computed twice?
> 
> For what I would call “modern” hardware FMA implementations (where fma
> is no more costly than multiply, and often as cheap as an add), this
> can never be too bad, because adding the different addends to the
> common products isn’t actually significantly cheaper than doing
> partially-redundant FMAs, and if the product is re-used in a non-FMA
> expression, computing FMA and product is no more expensive than product and sum.

As long as FMA and plain multiply are more expensive than add, the above
can be trivially extended by another term or two to still highlight the
problem. But this goes back to the core of the issue: It is a target
specific issue what is better and C -> IR is too early for that decision
to be made.

Joerg