[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit

Richard Campbell via cfe-dev cfe-dev at lists.llvm.org
Sun Nov 5 21:43:57 PST 2017

__attribute__((always_inline)) static inline float _Complex __divsc3(const float ar, const float ai, const float br, const float bi) {
    const float one_over_denominator = 1.0f / (br * br + bi * bi);
    return (float _Complex){ (ar * br + ai * bi) * one_over_denominator, (ai * br - ar * bi) * one_over_denominator };

This contains a single real divide, and is general. I proposed it two years ago for -ffast-math/-freciprocal-math but didn’t get any takers. Clang and gcc both normally (until this regression) implement a complex divide using two divide instructions.

> On Nov 5, 2017, at 9:28 PM, John McCall <rjmccall at apple.com> wrote:
>> On Nov 5, 2017, at 12:43 PM, Richard Campbell via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>> Similarly to a problem that occurred two years ago with multiplication (http://lists.llvm.org/pipermail/cfe-dev/2015-July/043792.html), production Clang (Apple LLVM version 9.0.0 (clang-900.0.38)) is now issuing __divsc3 function calls anywhere complex division occurs, irrespective of the -ffast-math setting. This can cause a single division (which should be one or at most two divss instructions, with minimal performance impact) to slow down a performance-critical section of code (single-precision complex tridiagonal matrix solving) significantly. I’m not sure how long this has been the case - for some time now, in my critical loop, I instead call an inline function which explicitly does the single divss required. When I take out this hack, the speed of the entire application is cut in half.
>> Can we fix this (again) and maybe add a code comment wherever it needs to be such that it doesn’t keep getting broken after a couple years?
> These don't sound like the same bug; they're just the same overall problem observed in different patterns of source code.  The best way to prevent regressions in cases like this is to add some tests that the source patterns generate the desired output.
> I can't imagine how the general case of complex division in general could possibly be implemented in "one or two divss instructions", so I suspect you're talking about the special case of dividing a complex number by a scalar (either real or imaginary).  I wouldn't be that surprised if attempts got broken if we didn't have a good, exhaustive set of tests for it.
> John.

More information about the cfe-dev mailing list