[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Mon Nov 6 10:29:38 PST 2017
(actually cc'ing Alex this time)
On 11/06/2017 12:18 PM, Hal Finkel via cfe-dev wrote:
> [+Alex]
>
> On 11/06/2017 11:59 AM, John McCall wrote:
>>> On Nov 6, 2017, at 12:21 PM, Richard Campbell <rlcamp.pdx at gmail.com>
>>> wrote:
>>>> On Nov 6, 2017, at 12:20 AM, John McCall <rjmccall at apple.com> wrote:
>>>>
>>>>> On Nov 6, 2017, at 2:47 AM, Richard Campbell
>>>>> <rlcamp.pdx at gmail.com> wrote:
>>>>> The much bigger issue is not on division or two, but rather zero
>>>>> function calls or one. The function call overhead, and the
>>>>> resulting inability to make any other refactoring optimisations,
>>>>> far outweighs the choice of instructions used.
>>>> By "refactoring optimizations", do you mean reordering and
>>>> potentially CSE'ing the component arithmetic with operations
>>>> outside of the division, or do you mean the compiler-barrier costs
>>>> of emitting an opaque function call in the frontend instead of
>>>> something that can be CSE'ed / reordered itself? Because the latter
>>>> is a problem that can be fixed for non-fast-math arithmetic as well.
>>>>
>>>> My general impression is that there is a lot of low-hanging fruit
>>>> in optimizing complex math in LLVM for one simple reason: it's not
>>>> widely used, so it's an accordingly low priority for most of our
>>>> current contributors. If this is something that interests you,
>>>> we'd be very open to contributions.
>>>>
>>>>
>>>> John.
>>> I suppose I mean both of those optimisations, although I don’t know
>>> the actual breakdown of the performance hit of one vs the other vs
>>> just the fact of the function call. When one writes a critical inner
>>> loop that doesn’t contain any function calls, one should reasonably
>>> expect the compiler not to add them.
>> Complex divide is a large, complicated operation when full precision
>> and infinity-correctness is required. We appreciate that you have
>> performance constraints, but implementing it with an outlined
>> function is not an unreasonable choice.
>>
>>> While there may be more low hanging fruit, I don’t want it to get in
>>> the way of fixing this. My main concern is that there not be
>>> noticeable regressions. This particular regression has the potential
>>> to result in certain calculations taking HOURS longer than expected,
>>> if I hadn’t been hacking my way around it already. I would greatly
>>> prefer to write simple maintainable code and let the compiler do the
>>> right thing on the hardware of today and tomorrow.
>> Richard, let me be clear about your options here. If you're
>> interested in working on this, that would be great, and I'd be happy
>> to review your patches. If you're not interested in working on this,
>> then you should file a bug and hope that someone else has the
>> motivation to pick it up.
>
> I'd like to add that Alex L. looked at this in some detail in 2013.
> For some relevant notes, see PR17248 (and how divide is handled in
> https://github.com/hyp/flang/blob/master/lib/CodeGen/CGExprComplex.cpp).
> There are indeed more- and less-numerically-stable ways to implement
> complex division. For an extended discussion, I recommend looking at
> https://arxiv.org/pdf/1210.4539v2.pdf -- There are certainly versions
> that are reasonable to inline, especially in the fast-math context,
> and I support doing so. Alex found that we had to use Smith's
> algorithm in order to pass LAPACK's regression tests.
>
> -Hal
>
>>
>> John.
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-dev
mailing list