[llvm-commits] LLVM patch to support ARM fused multiply add/subtract instructions

Mon Jan 23 18:35:42 PST 2012

Hi Anton,

We have internal micro benchmarks that do matrix operations (multiply, transpose, etc.) and we see up to 60% improvement when using fused multiply add/sub instructions.

Regarding accuracy, what I know is that Qualcomm provides IEEE-754 2008 specified result. This means the multiply is performed without any loss of accuracy (i.e., no rounding) and then the add/subtract operation happens. The final result is rounded according to the configured rounding mode in the VFP unit.

Thanks for integrating the change. Hope others find it useful.

Ana.

-----Original Message-----
From: Anton Korobeynikov [mailto:anton at korobeynikov.info] 
Sent: Sunday, January 22, 2012 4:12 AM
To: Ana Pazos
Cc: llvm-commits at cs.uiuc.edu; rajav at codeaurora.org
Subject: Re: [llvm-commits] LLVM patch to support ARM fused multiply add/subtract instructions

Hi Ana,

Comitted as r148658.

> Some ARMv7-A processor implementations  (e.g, Qualcomm 8960, ARM Cortex-A5)
> support fused multiply add/subtract instructions (VFMA/VFMS) which have
> lower latency and greater accuracy than the chained multiply add/subtract
> instructions (VMLA/VMLS).
Just curious - what are the performance / accuracy wins for fma stuff here?

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University