[cfe-dev] question about fused multiply add and Clang GNU modes

Mon Sep 21 09:37:18 PDT 2015

> On Sep 21, 2015, at 12:30 PM, Joerg Sonnenberger via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> On Mon, Sep 21, 2015 at 10:39:28AM -0400, Stephen Canon wrote:
>> 
>>> On Sep 21, 2015, at 9:06 AM, Joerg Sonnenberger via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>> 
>>> On Mon, Sep 21, 2015 at 07:37:50AM -0500, Hal Finkel via cfe-dev wrote:
>>>> We don't track the C-level expressions in the IR, but Clang will
>>>> directly form @llvm.fmuladd intrinsics where allowed. CodeGen then
>>>> converts these into FMA nodes, or expands them into ADD + MUL depending
>>>> on target hooks.
>>> 
>>> Wouldn't that be suboptimal from a CSE PoV? Consider something like:
>>> 
>>> r = a + b * c + b * c * d;
>>> 
>>> If we are greedy, the b * c would end up as (a + b * c) FMA instrinsic
>>> and the multiplication computed twice?
>> 
>> For what I would call “modern” hardware FMA implementations (where fma
>> is no more costly than multiply, and often as cheap as an add), this
>> can never be too bad, because adding the different addends to the
>> common products isn’t actually significantly cheaper than doing
>> partially-redundant FMAs, and if the product is re-used in a non-FMA
>> expression, computing FMA and product is no more expensive than product and sum.
> 
> As long as FMA and plain multiply are more expensive than add, the above
> can be trivially extended by another term or two to still highlight the
> problem. But this goes back to the core of the issue: It is a target
> specific issue what is better and C -> IR is too early for that decision
> to be made.

Semantically llvm.fmuladd is just a “fusable” multiply-add pair.  The decision hasn’t been made yet.

– Steve