[LLVMdev] LLVM ARM VMLA instruction

Tim Northover t.p.northover at gmail.com
Fri Dec 20 05:00:34 PST 2013


Hi Suyog,

> I tested it on A15, i don't have access to A8 rightnow, but i intend to test
> it for A8 as well.

That's extremely dodgy, the two processors are very different.

> I don't think i
> will get A8 hardware soon, can someone please check it on A8 hardware as
> well (Sorry for the trouble)?

I've got a BeagleBone hanging around, and tested Clang against a
hacked version of itself (without the VMLx disabling on Cortex-A8).
The results (for matmul_f64_4x4, -O3 -mcpu=cortex=a8) were:
1. vfpv3-d16, stock Clang: 96.2s
2. vfpv3-d16, clang + vmla: 95.7s
3. vfpv3, stock clang: 82.9s
4. vfpv3, clang + vmla: 81.1s

Worth investigating more, but as the others have said nowhere near
enough data on its own. Especially since Evan clearly did some
benchmarking himself before specifically disabling the vmla formation.

> Also, I will
> be glad to know the code place where we start differentiating between
> cortex-a8 and cortex-a15 for code generation.

Probably most relevant is the combination of features given to each
processor in lib/Target/ARM/ARM.td. This vmul/vmla difference comes
from "FeatureHasSlowFPVMLx", via ARMSubtarget.h's useFPVMLx and
ARMInstrInfo.td's UseFPVMLx.

Cheers.

Tim.



More information about the llvm-dev mailing list