[cfe-dev] Complex arithmetic ignores -ffast-math after clang r219557, serious performance regressions
Matthijs van Duin
matthijsvanduin at gmail.com
Fri Jul 3 19:44:47 PDT 2015
A temporary workaround is defining __mulsc3 in your own code... clang seems
to pick up on it correctly, e.g.:
__attribute__(( always_inline ))
static inline float _Complex
__mulsc3( float ar, float ai, float br, float bi)
{
return (float _Complex){ ar * br - ai * bi, ar * bi + ai * br };
}
I've noticed it really needs to be static always_inline to get optimized
properly. At least using latest clang-3.7 from debian sid with:
-target arm-linux-gnueabihf -mfloat-abi=hard -mcpu=cortex-a8 -mfpu=neon
-Ofast
Different storage class specifications give fascinating differences, even
with a function as simple as return a * b; where a and b are its complex
float arguments.
Two curious observations:
* If my __mulsc3 is declared "extern inline", clang nevertheless emits code
for it. I had expected any non-inlineable uses to become references to the
standard one.
* If it is declared static (inline or not) it acquires soft float ABI
calling conventions (with associated terrible overhead), and it still gets
called in places where __mulsc3 would normally get called. Using
always_inline avoids this.
(Since you're declaring complex mul, you can of course take the opportunity
to see if there's any benefit in a different implementation of complex
multiply, e.g.
float t = ai * ( br - bi );
return (float _Complex){ br * (ar - ai) + t, bi * (ar + ai) + t };
or one of its many variants. Probably not unless your target has a slow
multiplier or the relevant sums/differences are needed already anyway, but
who knows...)
Matthijs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150704/c3e1fed1/attachment.html>
More information about the cfe-dev
mailing list