[llvm-commits] [llvm] r85697 - in /llvm/trunk: lib/Target/ARM/ARMInstrNEON.td test/CodeGen/ARM/fmacs.ll test/CodeGen/ARM/fnmacs.ll test/CodeGen/Thumb2/cross-rc-coalescing-2.ll
Evan Cheng
evan.cheng at apple.com
Sun Nov 1 10:06:27 PST 2009
On Oct 31, 2009, at 9:07 PM, Anton Korobeynikov wrote:
> Hello, Jim
>
>> vml[as].f32 cause stalls in following advanced SIMD instructions.
>> Avoid using
>> them for scalar floating point operations for now.
> Basically, every vfp instruction causes stall for the adjacent neon
> instruction. The stall can be up to 20 cycles long and we don't have
> any proper way to model such stalls during scheduling.
This is not quite the same issue. The problem is vmla.f32 followed by
vadd.f32, vmul.f32 will stall by 4 cycles. If the following
instruction is RAW depended on the vmla.f32 and it's not a f32
instruction, it's a 8 cycle stall. This is a specific problem for vmla
and vmls as far as I know.
On the other hand, a vmla.32 followed by another vmla.32 is just fine.
And it is faster than vmul + vadd. I agree we should try to solve it
better. Perhaps expanding it before or during schedule2.
Evan
>
> However, according to ARM docs, neon vmla.f32 is cheaper than pair of
> vmul + vadd, are you sure there is no e.g. assembler bug here?
>
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State
> University
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list