[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Thu Nov 9 16:40:59 PST 2017
On 11/09/2017 06:19 PM, Richard Campbell wrote:
>> On Nov 9, 2017, at 4:05 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> On 11/06/2017 12:29 PM, Hal Finkel via cfe-dev wrote:
>>> (actually cc'ing Alex this time)
>>> On 11/06/2017 12:18 PM, Hal Finkel via cfe-dev wrote:
>>>> On 11/06/2017 11:59 AM, John McCall wrote:
>>>> I'd like to add that Alex L. looked at this in some detail in 2013. For some relevant notes, see PR17248 (and how divide is handled in https://github.com/hyp/flang/blob/master/lib/CodeGen/CGExprComplex.cpp). There are indeed more- and less-numerically-stable ways to implement complex division. For an extended discussion, I recommend looking at https://arxiv.org/pdf/1210.4539v2.pdf -- There are certainly versions that are reasonable to inline, especially in the fast-math context, and I support doing so. Alex found that we had to use Smith's algorithm in order to pass LAPACK's regression tests.
>> One more thing, we can use the cheaper (but less numerically-stable formula) when we have #pragma STDC CX_LIMITED_RANGE ON.
> Again, by far the biggest performance problem right now is that it’s making a function call, rather than the specifics of the arithmetic.
I think that we're all well aware of that.
> One would hope that simply inlining the existing __divsc3() and allowing the compiler to eliminate the inf and nan branches (which it should do automatically with -ffinite-math-only or CX_LIMITED_RANGE) would result in performance not noticeably slower than the previous baseline.
To be clear. CX_LIMITED_RANGE and -ffinite-math-only are different in
this regard. C says, "The usual mathematical formulas for complex
multiply, divide, and absolute value are problematic because of their
treatment of infinities and because of undue overflow and underflow."
It's the numerical stability, the "undue overflow and underflow" part,
that makes them different.
> Unfortunately, no one seems to be able to tell why this problem went away for __mulsc3 in July 2015 (a change in CGExprComplex.cpp does NOT seem to have been involved) after Chandler’s earlier mods in r219557 originally caused __mulsc3 to start issuing function calls. If we knew why __mulsc3 started being inlined again, we’d be able to apply the same fix to __divsc3.
I vaguely recall when this happened, and I'm pretty sure that there was
indeed a change to CodeGen somewhere. I do plan to look at this.
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-dev