[LLVMdev] LLVM ARM VMLA instruction
Tim Northover
t.p.northover at gmail.com
Thu Dec 19 00:35:19 PST 2013
> cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this
> seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma
> instructions generated will be invalid )
If I'm understanding correctly, you've specifically told it this
Cortex-A8 *does* come with vfpv4. Those kinds of odd combinations can
be useful sometimes (if only for tests), so I'm not sure policing them
is a good idea.
> cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction)
I get a VFP vmla here rather than a NEON one (clang -target
armv7-linux-gnueabihf -mcpu=cortex-a15): "vmla.f32 s0, s1, s2". Are
you seeing something different?
> However, if gcc emits vmla (NEON) instruction with cortex-a8 then shouldn't
> LLVM also emit vmla (NEON) instruction?
It appears we've decided in the past that vmla just isn't worth it on
Cortex-A8. There's this comment in the source:
// Some processors have FP multiply-accumulate instructions that don't
// play nicely with other VFP / NEON instructions, and it's generally better
// to just not use them.
Sufficient benchmarking evidence could overturn that decision, but I
assume the people who added it in the first place didn't do so on a
whim.
> The performance gain with vmla instruction is huge.
Is it, on Cortex-A8? The TRM referrs to them jumping across pipelines
in odd ways, and that was a very primitive core so it's almost
certainly not going to be just as good as a vmul (in fact if I'm
reading correctly, it takes pretty much exactly the same time as
separate vmul and vadd instructions, 10 cycles vs 2 * 5).
Cheers.
Tim.
More information about the llvm-dev
mailing list