[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit
Alex L via cfe-dev
cfe-dev at lists.llvm.org
Thu Nov 9 16:40:37 PST 2017
Sorry about the delay, I saw the thread only today.
There's an existing bug report about this already btw:
https://bugs.llvm.org/show_bug.cgi?id=31872. When I looked into it last
time ICC and GCC had different fast division implementations.
With -fp-model fast=1 ICC promotes floats to doubles and doubles to 80-bit
long doubles to avoid loss of precision. With -fp-model fast=2 ICC uses the
original type, but does one division for floats to get a reciprocal instead
of doing two divisions, and two (in one instruction) divisions for complex
double. GCC just uses two divisions every time without any type promotion.
On 9 November 2017 at 16:19, Richard Campbell via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> > On Nov 9, 2017, at 4:05 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > On 11/06/2017 12:29 PM, Hal Finkel via cfe-dev wrote:
> >> (actually cc'ing Alex this time)
> >> On 11/06/2017 12:18 PM, Hal Finkel via cfe-dev wrote:
> >>> [+Alex]
> >>> On 11/06/2017 11:59 AM, John McCall wrote:
> >>> I'd like to add that Alex L. looked at this in some detail in 2013.
> For some relevant notes, see PR17248 (and how divide is handled in
> There are indeed more- and less-numerically-stable ways to implement
> complex division. For an extended discussion, I recommend looking at
> https://arxiv.org/pdf/1210.4539v2.pdf -- There are certainly versions
> that are reasonable to inline, especially in the fast-math context, and I
> support doing so. Alex found that we had to use Smith's algorithm in order
> to pass LAPACK's regression tests.
Smith's might be too slow for -ffast-math though, especially if __divsc3 is
doing a regular division.
> > One more thing, we can use the cheaper (but less numerically-stable
> formula) when we have #pragma STDC CX_LIMITED_RANGE ON.
> > -Hal
> Again, by far the biggest performance problem right now is that it’s
> making a function call, rather than the specifics of the arithmetic. One
> would hope that simply inlining the existing __divsc3() and allowing the
> compiler to eliminate the inf and nan branches (which it should do
> automatically with -ffinite-math-only or CX_LIMITED_RANGE) would result in
> performance not noticeably slower than the previous baseline.
> Unfortunately, no one seems to be able to tell why this problem went away
> for __mulsc3 in July 2015 (a change in CGExprComplex.cpp does NOT seem to
> have been involved) after Chandler’s earlier mods in r219557 originally
> caused __mulsc3 to start issuing function calls. If we knew why __mulsc3
> started being inlined again, we’d be able to apply the same fix to __divsc3.
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev