[llvm-commits] [llvm] r85697 - in /llvm/trunk: lib/Target/ARM/ARMInstrNEON.td test/CodeGen/ARM/fmacs.ll test/CodeGen/ARM/fnmacs.ll test/CodeGen/Thumb2/cross-rc-coalescing-2.ll

Sun Nov 1 10:06:27 PST 2009

On Oct 31, 2009, at 9:07 PM, Anton Korobeynikov wrote:

> Hello, Jim
>
>> vml[as].f32 cause stalls in following advanced SIMD instructions.  
>> Avoid using
>> them for scalar floating point operations for now.
> Basically, every vfp instruction causes stall for the adjacent neon
> instruction. The stall can be up to 20 cycles long and we don't have
> any proper way to model such stalls during scheduling.

This is not quite the same issue. The problem is vmla.f32 followed by  
vadd.f32, vmul.f32 will stall by 4 cycles. If the following  
instruction is RAW depended on the vmla.f32 and it's not a f32  
instruction, it's a 8 cycle stall. This is a specific problem for vmla  
and vmls as far as I know.

On the other hand, a vmla.32 followed by another vmla.32 is just fine.  
And it is faster than vmul + vadd. I agree we should try to solve it  
better. Perhaps expanding it before or during schedule2.

Evan

>
> However, according to ARM docs, neon vmla.f32 is cheaper than pair of
> vmul + vadd, are you sure there is no e.g. assembler bug here?
>
> -- 
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State  
> University
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits