[LLVMdev] LLVM ARM VMLA instruction

Thu Dec 19 00:50:13 PST 2013

Hi Tim,

> > cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction)
>
> I get a VFP vmla here rather than a NEON one (clang -target
> armv7-linux-gnueabihf -mcpu=cortex-a15): "vmla.f32 s0, s1, s2". Are
> you seeing something different?
>

As per Renato comment above, vmla instruction is NEON instruction while
vmfa is VFP instruction. Correct me if i am wrong on this.

>
> > However, if gcc emits vmla (NEON) instruction with cortex-a8 then
> shouldn't
> > LLVM also emit vmla (NEON) instruction?
>
> It appears we've decided in the past that vmla just isn't worth it on
> Cortex-A8. There's this comment in the source:
>
> // Some processors have FP multiply-accumulate instructions that don't
> // play nicely with other VFP / NEON instructions, and it's generally
> better
> // to just not use them.
>
> Sufficient benchmarking evidence could overturn that decision, but I
> assume the people who added it in the first place didn't do so on a
> whim.
>
> > The performance gain with vmla instruction is huge.
>
> Is it, on Cortex-A8? The TRM referrs to them jumping across pipelines
> in odd ways, and that was a very primitive core so it's almost
> certainly not going to be just as good as a vmul (in fact if I'm
> reading correctly, it takes pretty much exactly the same time as
> separate vmul and vadd instructions, 10 cycles vs 2 * 5).
>

It may seem that total number of cycles are more or less same for single
vmla and vmul+vadd. However, when vmul+vadd combination is used instead of
vmla, then intermediate results will be generated which needs to be stored
in memory for future access. This will lead to lot of load/store ops being
inserted which degrade performance. Correct me if i am wrong on this, but
my observation till date have shown this.

>
> Cheers.
>
> Tim.
>

-- 
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/fe30dc82/attachment.html>