[LLVMdev] LLVM ARM VMLA instruction

Wed Dec 18 01:42:37 PST 2013

> I was going through Code of LLVM instruction code generation for ARM. I came
> across VMLA instruction hazards (Floating point multiply and accumulate). I
> was comparing assembly code emitted by LLVM and GCC, where i saw that GCC
> was happily using VMLA instruction for floating point while LLVM never used
> it, instead it used a pair of VMUL and VADD instruction.

It looks like Clang allows the formation by default, but you need to
be compiling for a CPU that actually supports the instruction (the key
feature is called "VFPv4". That means one strictly newer than
cortex-a8: cortex-a7 (don't ask), cortex-a9, cortex-a12, cortex-a15 or
krait I believe. With that I get:

$ cat tmp.c
float foo(float accum, float lhs, float rhs) {
  return accum + lhs*rhs;
}
$ clang -target armv7-linux-gnueabihf -mcpu=cortex-a15 -S -o- -O3 tmp.c
[...]
foo:                                    @ @foo
@ BB#0:                                 @ %entry
        vmla.f32        s0, s1, s2
        bx      lr

Cheers.

Tim.