[llvm-commits] [llvm] r85697 - in /llvm/trunk: lib/Target/ARM/ARMInstrNEON.td test/CodeGen/ARM/fmacs.ll test/CodeGen/ARM/fnmacs.ll test/CodeGen/Thumb2/cross-rc-coalescing-2.ll

Tue Nov 3 14:07:35 PST 2009

On Nov 2, 2009, at 1:53 AM, David Conrad wrote:

> Thus even without modeling the special behaviour of vmla it's always
> better to use it: it'll always be at least as fast as a separate vmul
> +vadd. This applies to the integer versions as well.

Hi David,

Unfortunately, this turns out not to be the case. The NEON unit will  
stall adjacent instructions in the presence of vmla to preserve in- 
order retirement. If a RAW hazard is present, the stall is 8 (possibly  
7) cycles, otherwise it is 4 cycles. It may be possible to model this  
with the recent post-allocation scheduler improvements, but for now  
it's better to just avoid the instructions when doing scalar math.

Since we're using NEON vector instructions for scalar floating point  
math on the A8, having vmla intermingled with other NEON instructions  
is not uncommon in generated code.

-Jim