[LLVMdev] LLVM ARM VMLA instruction

Thu Dec 19 00:35:19 PST 2013

> cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this
> seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma
> instructions generated will be invalid )

If I'm understanding correctly, you've specifically told it this
Cortex-A8 *does* come with vfpv4. Those kinds of odd combinations can
be useful sometimes (if only for tests), so I'm not sure policing them
is a good idea.

> cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction)

I get a VFP vmla here rather than a NEON one (clang -target
armv7-linux-gnueabihf -mcpu=cortex-a15): "vmla.f32 s0, s1, s2". Are
you seeing something different?

> However, if gcc emits vmla (NEON) instruction with cortex-a8 then shouldn't
> LLVM also emit vmla (NEON) instruction?

It appears we've decided in the past that vmla just isn't worth it on
Cortex-A8. There's this comment in the source:

// Some processors have FP multiply-accumulate instructions that don't
// play nicely with other VFP / NEON instructions, and it's generally better
// to just not use them.

Sufficient benchmarking evidence could overturn that decision, but I
assume the people who added it in the first place didn't do so on a
whim.

> The performance gain with vmla instruction is huge.

Is it, on Cortex-A8? The TRM referrs to them jumping across pipelines
in odd ways, and that was a very primitive core so it's almost
certainly not going to be just as good as a vmul (in fact if I'm
reading correctly, it takes pretty much exactly the same time as
separate vmul and vadd instructions, 10 cycles vs 2 * 5).

Cheers.

Tim.