[cfe-dev] Clang ignoring --fast-math for complex division, serious performance hit
Richard Campbell via cfe-dev
cfe-dev at lists.llvm.org
Sun Nov 5 09:43:56 PST 2017
Similarly to a problem that occurred two years ago with multiplication (http://lists.llvm.org/pipermail/cfe-dev/2015-July/043792.html), production Clang (Apple LLVM version 9.0.0 (clang-900.0.38)) is now issuing __divsc3 function calls anywhere complex division occurs, irrespective of the -ffast-math setting. This can cause a single division (which should be one or at most two divss instructions, with minimal performance impact) to slow down a performance-critical section of code (single-precision complex tridiagonal matrix solving) significantly. I’m not sure how long this has been the case - for some time now, in my critical loop, I instead call an inline function which explicitly does the single divss required. When I take out this hack, the speed of the entire application is cut in half.
Can we fix this (again) and maybe add a code comment wherever it needs to be such that it doesn’t keep getting broken after a couple years?
More information about the cfe-dev