[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

Bob Wilson bob.wilson at apple.com
Wed Nov 11 09:20:24 PST 2009


On Nov 11, 2009, at 3:27 AM, Rodolph Perfetta wrote:
>
> If you know about the alignment, maybe use structured load/store
> (vst1.64/vld1.64 {dn-dm}). You may also want to work on whole cache  
> lines
> (64 bytes on A8). You can find more in this discussion:
> http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc
> 0993/e382202f1a92b0f8?lnk=gst&q=memcpy&pli=1 .
>
>> Even if it's not faster, it's still a code size win which is also
>> important.
>
> Yes but NEON will drive up your power consumption, so if you are not  
> faster
> you will drain your battery faster (assuming you care of course).
>
> In general we wouldn't recommend writing memcpy using NEON unless  
> you can
> detect the exact core you will be running on: on A9 NEON will not  
> give you
> any speed up, you'll just end up using more power. NEON is a SIMD  
> engine.
>
> If one wanted to write memcpy on A9 we would recommend something like:
> * do not use NEON
> * use PLD (3-6 cache lines ahead, to be tuned)
> * ldm/stm whole cache lines (32 bytes on A9)
> * align destination

Thanks, Rodolph.  That is very helpful.

Can you comment on David Conrad's message in this thread regarding a  
~20 cycle penalty for an ARM store following a NEON store to the same  
16-byte block?  If the memcpy size is not a multiple of 8, we need  
some ARM load/store instructions to copy the tail end of it.  The  
context here is LLVM generating inline code for small copies, so if  
there is a penalty like that, it is probably not worthwhile to use  
NEON unless the alignment shows that the tail will be in a separate 16- 
byte block.  (And what's up with the 16-byte divisions?  I thought the  
cache lines are 64 bytes....)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091111/1cd1162f/attachment.html>


More information about the llvm-dev mailing list