[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit

Mon Nov 6 09:21:08 PST 2017

> On Nov 6, 2017, at 12:20 AM, John McCall <rjmccall at apple.com> wrote:
> 
>> On Nov 6, 2017, at 2:47 AM, Richard Campbell <rlcamp.pdx at gmail.com> wrote:
>> The much bigger issue is not on division or two, but rather zero function calls or one. The function call overhead, and the resulting inability to make any other refactoring optimisations, far outweighs the choice of instructions used.
> 
> By "refactoring optimizations", do you mean reordering and potentially CSE'ing the component arithmetic with operations outside of the division, or do you mean the compiler-barrier costs of emitting an opaque function call in the frontend instead of something that can be CSE'ed / reordered itself?  Because the latter is a problem that can be fixed for non-fast-math arithmetic as well.
> 
> My general impression is that there is a lot of low-hanging fruit in optimizing complex math in LLVM for one simple reason: it's not widely used, so it's an accordingly low priority for most of our current contributors.  If this is something that interests you, we'd be very open to contributions.
> 
> 
> John.

I suppose I mean both of those optimisations, although I don’t know the actual breakdown of the performance hit of one vs the other vs just the fact of the function call. When one writes a critical inner loop that doesn’t contain any function calls, one should reasonably expect the compiler not to add them.

While there may be more low hanging fruit, I don’t want it to get in the way of fixing this. My main concern is that there not be noticeable regressions. This particular regression has the potential to result in certain calculations taking HOURS longer than expected, if I hadn’t been hacking my way around it already. I would greatly prefer to write simple maintainable code and let the compiler do the right thing on the hardware of today and tomorrow.