[libc-commits] [PATCH] D92236: [LIBC] Add optimized memcpy routine for AArch64

Thu Dec 3 10:09:17 PST 2020

avieira added a comment.

Hi Guillaume,

Thanks for the review! I left some further questions/comments as replies. I agree with all other changes I didn't reply to. I'll update the patch once I've figured out the pending queries.

================
Comment at: libc/src/string/CMakeLists.txt:84
   set(MEMCPY_SRC ${LIBC_SOURCE_DIR}/src/string/x86/memcpy.cpp)
+elseif(${LIBC_TARGET_MACHINE} STREQUAL "aarch64")
+  set(LIBC_STRING_TARGET_ARCH "aarch64")
----------------
gchatelet wrote:
> This change is not needed: it will be handled by the `else()` clause.
> We have a special case for x86 to be able to support 32 and 64 bits architectures with the same code.
I'm confused by this one. If I don't do this, then a call to __llvm_libc::memcpy in the benchmark will lead to the 'default' 'src/string/memcpy.cpp' implementation rather than the aarch64 one.

================
Comment at: libc/src/string/aarch64/memcpy.cpp:71-103
+  // Large copy
+  // Copy 16 bytes and then align src to 16-byte alignment.
+  CopyBlock<16>(dst, src);
+
+  // Align to either source or destination depending on target.
+  // Default aligns to source, define 'ALIGN_DST' to align to destination.
+#if ALIGN_DST
----------------
gchatelet wrote:
> I think the code here could be replaced with a call to `CopyAlignedBlocks<64>` from `src/string/memory_utils/memcpy_utils.h` which essentially aligns the destination pointers, copies blocks of 64 bytes and handles the trailing bytes with an overlapping copy.
> 
> If it's important that the alignment is 16B instead of 64B I can change the implementation but since the code is heavily tested I'd be in favor of reusing it as much as possible.
> 
> Note: I believe that the compiler will generate the same code whether using two `CopyBlock<32>` or a single `CopyBlock<64>`.
I am looking at this one, the reason I wrote out this code was because we want to align for source, not destination. I will have a look at the implementation of CopyAlignedBlocks, maybe the answer is to further template this to allow a switch between src and dst alignment?

I am also looking at a potential codegen difference between 2* CopyBlock<32> and CopyBlock<64>, will get back to you next week on this.

================
Comment at: libc/src/string/aarch64/memcpy.cpp:94
+  // signed comparison.
+  int64_t count2 = count - (144 - misalign);
+  while (count2 > 0) {
----------------
gchatelet wrote:
> `const`
Can't make this const as it is updated in the loop below.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92236/new/

https://reviews.llvm.org/D92236