[libc-commits] [PATCH] D92236: [LIBC] Add optimized memcpy routine for AArch64
Andre Vieira via Phabricator via libc-commits
libc-commits at lists.llvm.org
Thu Dec 3 10:09:17 PST 2020
avieira added a comment.
Hi Guillaume,
Thanks for the review! I left some further questions/comments as replies. I agree with all other changes I didn't reply to. I'll update the patch once I've figured out the pending queries.
================
Comment at: libc/src/string/CMakeLists.txt:84
set(MEMCPY_SRC ${LIBC_SOURCE_DIR}/src/string/x86/memcpy.cpp)
+elseif(${LIBC_TARGET_MACHINE} STREQUAL "aarch64")
+ set(LIBC_STRING_TARGET_ARCH "aarch64")
----------------
gchatelet wrote:
> This change is not needed: it will be handled by the `else()` clause.
> We have a special case for x86 to be able to support 32 and 64 bits architectures with the same code.
I'm confused by this one. If I don't do this, then a call to __llvm_libc::memcpy in the benchmark will lead to the 'default' 'src/string/memcpy.cpp' implementation rather than the aarch64 one.
================
Comment at: libc/src/string/aarch64/memcpy.cpp:71-103
+ // Large copy
+ // Copy 16 bytes and then align src to 16-byte alignment.
+ CopyBlock<16>(dst, src);
+
+ // Align to either source or destination depending on target.
+ // Default aligns to source, define 'ALIGN_DST' to align to destination.
+#if ALIGN_DST
----------------
gchatelet wrote:
> I think the code here could be replaced with a call to `CopyAlignedBlocks<64>` from `src/string/memory_utils/memcpy_utils.h` which essentially aligns the destination pointers, copies blocks of 64 bytes and handles the trailing bytes with an overlapping copy.
>
> If it's important that the alignment is 16B instead of 64B I can change the implementation but since the code is heavily tested I'd be in favor of reusing it as much as possible.
>
> Note: I believe that the compiler will generate the same code whether using two `CopyBlock<32>` or a single `CopyBlock<64>`.
I am looking at this one, the reason I wrote out this code was because we want to align for source, not destination. I will have a look at the implementation of CopyAlignedBlocks, maybe the answer is to further template this to allow a switch between src and dst alignment?
I am also looking at a potential codegen difference between 2* CopyBlock<32> and CopyBlock<64>, will get back to you next week on this.
================
Comment at: libc/src/string/aarch64/memcpy.cpp:94
+ // signed comparison.
+ int64_t count2 = count - (144 - misalign);
+ while (count2 > 0) {
----------------
gchatelet wrote:
> `const`
Can't make this const as it is updated in the loop below.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D92236/new/
https://reviews.llvm.org/D92236
More information about the libc-commits
mailing list