[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]

Steve Canon via llvm-dev llvm-dev at lists.llvm.org
Sat Sep 10 03:33:28 PDT 2016



Sent from my iPhone

> On Sep 9, 2016, at 10:40 PM, Chris Lattner <clattner at apple.com> wrote:
> 
> 
>> On Sep 9, 2016, at 3:27 PM, Steve Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>>> On Sep 9, 2016, at 6:21 PM, Abe Skolnik <a.skolnik at samsung.com> wrote:
>>> 
>>>> On 09/09/2016 04:31 PM, Stephen Canon wrote:
>>>> 
>>>> Gating this on -Owhatever is dangerous, .  We should simply default to the pragma “on” state universally.
>>> 
>>> Why so?  [honestly asking, not arguing]
>>> 
>>> My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast".
>> 
>> Pretty much.  In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).
> 
> I think that’s unavoidable, because of the way the optimization levels work.  Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for:
> 
> auto x = a*b;
> auto y = x+c;
> 
> but we would do that at -O2 since we do mem2reg on x.

In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast.  The pragma only licenses contraction within a statement.

IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present.

You're definitely correct that there will still be differences; e.g.:

    x = a*b + c;
    y = a*b;

It might be that at some optimization level we prove y is unused / constant / etc.  When targeting a machine where fma is costlier than mul, we generate mul+add in one case and fma in the other.  These cases are necessarily rarer than if we gate it on optimization level, however.  (And we want the perf win for -O0 anyway).

TLDR: yeah, let's do this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160910/054b5969/attachment.html>


More information about the llvm-dev mailing list