[cfe-dev] Complex arithmetic ignores -ffast-math after clang r219557, serious performance regressions

Fri Jul 3 19:44:47 PDT 2015

A temporary workaround is defining __mulsc3 in your own code... clang seems
to pick up on it correctly, e.g.:

__attribute__(( always_inline ))
static inline  float _Complex
__mulsc3( float ar, float ai, float br, float bi)
{
	return (float _Complex){  ar * br - ai * bi,  ar * bi + ai * br  };
}

I've noticed it really needs to be static always_inline to get optimized
properly. At least using latest clang-3.7 from debian sid with:
-target arm-linux-gnueabihf -mfloat-abi=hard -mcpu=cortex-a8 -mfpu=neon
-Ofast

Different storage class specifications give fascinating differences, even
with a function as simple as return a * b; where a and b are its complex
float arguments.

Two curious observations:
* If my __mulsc3 is declared "extern inline", clang nevertheless emits code
for it. I had expected any non-inlineable uses to become references to the
standard one.
* If it is declared static (inline or not) it acquires soft float ABI
calling conventions (with associated terrible overhead), and it still gets
called in places where __mulsc3 would normally get called. Using
always_inline avoids this.

(Since you're declaring complex mul, you can of course take the opportunity
to see if there's any benefit in a different implementation of complex
multiply, e.g.

	float t = ai * ( br - bi );
	return (float _Complex){  br * (ar - ai) + t,  bi * (ar + ai) + t  };

or one of its many variants. Probably not unless your target has a slow
multiplier or the relevant sums/differences are needed already anyway, but
who knows...)

Matthijs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150704/c3e1fed1/attachment.html>