[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit
Richard Campbell via cfe-dev
cfe-dev at lists.llvm.org
Thu Nov 9 16:19:22 PST 2017
> On Nov 9, 2017, at 4:05 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On 11/06/2017 12:29 PM, Hal Finkel via cfe-dev wrote:
>> (actually cc'ing Alex this time)
>> On 11/06/2017 12:18 PM, Hal Finkel via cfe-dev wrote:
>>> On 11/06/2017 11:59 AM, John McCall wrote:
>>> I'd like to add that Alex L. looked at this in some detail in 2013. For some relevant notes, see PR17248 (and how divide is handled in https://github.com/hyp/flang/blob/master/lib/CodeGen/CGExprComplex.cpp). There are indeed more- and less-numerically-stable ways to implement complex division. For an extended discussion, I recommend looking at https://arxiv.org/pdf/1210.4539v2.pdf -- There are certainly versions that are reasonable to inline, especially in the fast-math context, and I support doing so. Alex found that we had to use Smith's algorithm in order to pass LAPACK's regression tests.
> One more thing, we can use the cheaper (but less numerically-stable formula) when we have #pragma STDC CX_LIMITED_RANGE ON.
Again, by far the biggest performance problem right now is that it’s making a function call, rather than the specifics of the arithmetic. One would hope that simply inlining the existing __divsc3() and allowing the compiler to eliminate the inf and nan branches (which it should do automatically with -ffinite-math-only or CX_LIMITED_RANGE) would result in performance not noticeably slower than the previous baseline.
Unfortunately, no one seems to be able to tell why this problem went away for __mulsc3 in July 2015 (a change in CGExprComplex.cpp does NOT seem to have been involved) after Chandler’s earlier mods in r219557 originally caused __mulsc3 to start issuing function calls. If we knew why __mulsc3 started being inlined again, we’d be able to apply the same fix to __divsc3.
More information about the cfe-dev