[cfe-dev] question about fused multiply add and Clang GNU modes
Stephen Canon via cfe-dev
cfe-dev at lists.llvm.org
Mon Sep 21 09:37:18 PDT 2015
> On Sep 21, 2015, at 12:30 PM, Joerg Sonnenberger via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> On Mon, Sep 21, 2015 at 10:39:28AM -0400, Stephen Canon wrote:
>>
>>> On Sep 21, 2015, at 9:06 AM, Joerg Sonnenberger via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>>
>>> On Mon, Sep 21, 2015 at 07:37:50AM -0500, Hal Finkel via cfe-dev wrote:
>>>> We don't track the C-level expressions in the IR, but Clang will
>>>> directly form @llvm.fmuladd intrinsics where allowed. CodeGen then
>>>> converts these into FMA nodes, or expands them into ADD + MUL depending
>>>> on target hooks.
>>>
>>> Wouldn't that be suboptimal from a CSE PoV? Consider something like:
>>>
>>> r = a + b * c + b * c * d;
>>>
>>> If we are greedy, the b * c would end up as (a + b * c) FMA instrinsic
>>> and the multiplication computed twice?
>>
>> For what I would call “modern” hardware FMA implementations (where fma
>> is no more costly than multiply, and often as cheap as an add), this
>> can never be too bad, because adding the different addends to the
>> common products isn’t actually significantly cheaper than doing
>> partially-redundant FMAs, and if the product is re-used in a non-FMA
>> expression, computing FMA and product is no more expensive than product and sum.
>
> As long as FMA and plain multiply are more expensive than add, the above
> can be trivially extended by another term or two to still highlight the
> problem. But this goes back to the core of the issue: It is a target
> specific issue what is better and C -> IR is too early for that decision
> to be made.
Semantically llvm.fmuladd is just a “fusable” multiply-add pair. The decision hasn’t been made yet.
– Steve
More information about the cfe-dev
mailing list