[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit
John McCall via cfe-dev
cfe-dev at lists.llvm.org
Sun Nov 5 21:28:40 PST 2017
> On Nov 5, 2017, at 12:43 PM, Richard Campbell via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> Similarly to a problem that occurred two years ago with multiplication (http://lists.llvm.org/pipermail/cfe-dev/2015-July/043792.html), production Clang (Apple LLVM version 9.0.0 (clang-900.0.38)) is now issuing __divsc3 function calls anywhere complex division occurs, irrespective of the -ffast-math setting. This can cause a single division (which should be one or at most two divss instructions, with minimal performance impact) to slow down a performance-critical section of code (single-precision complex tridiagonal matrix solving) significantly. I’m not sure how long this has been the case - for some time now, in my critical loop, I instead call an inline function which explicitly does the single divss required. When I take out this hack, the speed of the entire application is cut in half.
> Can we fix this (again) and maybe add a code comment wherever it needs to be such that it doesn’t keep getting broken after a couple years?
These don't sound like the same bug; they're just the same overall problem observed in different patterns of source code. The best way to prevent regressions in cases like this is to add some tests that the source patterns generate the desired output.
I can't imagine how the general case of complex division in general could possibly be implemented in "one or two divss instructions", so I suspect you're talking about the special case of dividing a complex number by a scalar (either real or imaginary). I wouldn't be that surprised if attempts got broken if we didn't have a good, exhaustive set of tests for it.
More information about the cfe-dev