[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]
Steve Canon via llvm-dev
llvm-dev at lists.llvm.org
Sat Sep 10 03:33:28 PDT 2016
Sent from my iPhone
> On Sep 9, 2016, at 10:40 PM, Chris Lattner <clattner at apple.com> wrote:
>> On Sep 9, 2016, at 3:27 PM, Steve Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> Sent from my iPhone
>>> On Sep 9, 2016, at 6:21 PM, Abe Skolnik <a.skolnik at samsung.com> wrote:
>>>> On 09/09/2016 04:31 PM, Stephen Canon wrote:
>>>> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.
>>> Why so? [honestly asking, not arguing]
>>> My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast".
>> Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).
> I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for:
> auto x = a*b;
> auto y = x+c;
> but we would do that at -O2 since we do mem2reg on x.
In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast. The pragma only licenses contraction within a statement.
IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present.
You're definitely correct that there will still be differences; e.g.:
x = a*b + c;
y = a*b;
It might be that at some optimization level we prove y is unused / constant / etc. When targeting a machine where fma is costlier than mul, we generate mul+add in one case and fma in the other. These cases are necessarily rarer than if we gate it on optimization level, however. (And we want the perf win for -O0 anyway).
TLDR: yeah, let's do this.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev