[libc-commits] [PATCH] D114637: [libc] Optimized version of memmove

Andre Vieira via Phabricator via libc-commits libc-commits at lists.llvm.org
Mon Feb 7 08:58:51 PST 2022

avieira added a comment.

In D114637#3231652 <https://reviews.llvm.org/D114637#3231652>, @gchatelet wrote:

> In D114637#3199471 <https://reviews.llvm.org/D114637#3199471>, @avieira wrote:
>> Hi @gchatelet,
>> I'm working on an aarch64 optimised version and I came across something that might be of use to you too. I found that the Repeated implementation of Move was yielding sub-optimal code in the large loop, it would load a _64 element in reverse (last 32-bytes first), I believe this was a side-effect of how it was stacking the loads and stores in opposite order like:
>> Load (src)
>> Load (src + 8)
>> Load (src + 16)
>> Load (src + 32)
>> Store (src + 32)
>> Store (src + 16)
>> ...
> Do you have an idea of why this is yielding suboptimal results?
> In the code I generated for x86-64, using this pyramid shape offset pattern reduced the number of instructions (the compiler could outline the last store across different functions).
> I'm not sure this translated into better performance though, only slightly smaller function size.
>> I found that changing the implementation of the Repeated Move to a for-loop of loads followed by a for-loop of stores from 0 to ElementCount solved it and gave me a speed up on larger memmoves.
> Could you share the resulting asm?

Sorry I hadn't seen this earlier, notification must have fallen through the cracks, but I'll share here the same I shared with you, I won't share the full memmove function as that is a lot of code, but basically the difference in codegen between before and your change is that in the forward and backward loops the stores go from:
 40fb14:       ad011da6        stp     q6, q7, [x13, #32]
 40fb18:       ad0015a4        stp     q4, q5, [x13]
 40fb14:       ad0015a4        stp     q4, q5, [x13]
 40fb18:       ad011da6        stp     q6, q7, [x13, #32]

And the latter is preferred on AArch64.

  rG LLVM Github Monorepo



More information about the libc-commits mailing list