[llvm-commits] [llvm] r85697 - in /llvm/trunk: lib/Target/ARM/ARMInstrNEON.td test/CodeGen/ARM/fmacs.ll test/CodeGen/ARM/fnmacs.ll test/CodeGen/Thumb2/cross-rc-coalescing-2.ll
Jim Grosbach
grosbach at apple.com
Tue Nov 3 14:07:35 PST 2009
On Nov 2, 2009, at 1:53 AM, David Conrad wrote:
> Thus even without modeling the special behaviour of vmla it's always
> better to use it: it'll always be at least as fast as a separate vmul
> +vadd. This applies to the integer versions as well.
Hi David,
Unfortunately, this turns out not to be the case. The NEON unit will
stall adjacent instructions in the presence of vmla to preserve in-
order retirement. If a RAW hazard is present, the stall is 8 (possibly
7) cycles, otherwise it is 4 cycles. It may be possible to model this
with the recent post-allocation scheduler improvements, but for now
it's better to just avoid the instructions when doing scalar math.
Since we're using NEON vector instructions for scalar floating point
math on the A8, having vmla intermingled with other NEON instructions
is not uncommon in generated code.
-Jim
More information about the llvm-commits
mailing list