[libc-commits] [PATCH] D92236: [LIBC] Add optimized memcpy routine for AArch64

Tue Dec 1 04:20:42 PST 2020

gchatelet added inline comments.

================
Comment at: libc/src/string/CMakeLists.txt:84
   set(MEMCPY_SRC ${LIBC_SOURCE_DIR}/src/string/x86/memcpy.cpp)
+elseif(${LIBC_TARGET_MACHINE} STREQUAL "aarch64")
+  set(LIBC_STRING_TARGET_ARCH "aarch64")
----------------
This change is not needed: it will be handled by the `else()` clause.
We have a special case for x86 to be able to support 32 and 64 bits architectures with the same code.

================
Comment at: libc/src/string/aarch64/memcpy.cpp:38
+  const char *src_m = src + count;
+  if (__builtin_expect(count < 128, 1)) {
+    if (__builtin_expect(count > 32, 0)) {
----------------
Please use the `likely` and `unlikely` macros from `src/__support/common.h.def`.
Here and below.

================
Comment at: libc/src/string/aarch64/memcpy.cpp:40-41
+    if (__builtin_expect(count > 32, 0)) {
+      CopyBlock<32>(dst, src);
+      CopyBlock<32>(dst_m - 32, src_m - 32);
+      if (__builtin_expect(count > 64, 0)) {
----------------
Pairs of CopyBlock used to produce an overlapping copy can make use `CopyBlockOverlap` from `src/string/memory_utils/memcpy_utils.h` instead.
Simply replace there two lines with `CopyBlockOverlap<32>(dst, src, count)`

================
Comment at: libc/src/string/aarch64/memcpy.cpp:51
+      if (__builtin_expect((count & 0x8) != 0, 0)) {
+        CopyBlock<8>(dst, src);
+        return CopyBlock<8>(dst_m - 8, src_m - 8);
----------------
ditto `CopyBlockOverlap`

================
Comment at: libc/src/string/aarch64/memcpy.cpp:54
+      } else if (__builtin_expect((count & 0x4) != 0, 0)) {
+        CopyBlock<4>(dst, src);
+        return CopyBlock<4>(dst_m - 4, src_m - 4);
----------------
ditto `CopyBlockOverlap`

================
Comment at: libc/src/string/aarch64/memcpy.cpp:67
+    } else {
+      CopyBlock<16>(dst, src);
+      return CopyBlock<16>(dst_m - 16, src_m - 16);
----------------
ditto `CopyBlockOverlap`

================
Comment at: libc/src/string/aarch64/memcpy.cpp:71-103
+  // Large copy
+  // Copy 16 bytes and then align src to 16-byte alignment.
+  CopyBlock<16>(dst, src);
+
+  // Align to either source or destination depending on target.
+  // Default aligns to source, define 'ALIGN_DST' to align to destination.
+#if ALIGN_DST
----------------
I think the code here could be replaced with a call to `CopyAlignedBlocks<64>` from `src/string/memory_utils/memcpy_utils.h` which essentially aligns the destination pointers, copies blocks of 64 bytes and handles the trailing bytes with an overlapping copy.

If it's important that the alignment is 16B instead of 64B I can change the implementation but since the code is heavily tested I'd be in favor of reusing it as much as possible.

Note: I believe that the compiler will generate the same code whether using two `CopyBlock<32>` or a single `CopyBlock<64>`.

================
Comment at: libc/src/string/aarch64/memcpy.cpp:82
+#endif
+  size_t misalign = ((intptr_t)ALIGN_SRCDST) % 16;
+  dst -= misalign;
----------------
`const`

================
Comment at: libc/src/string/aarch64/memcpy.cpp:94
+  // signed comparison.
+  int64_t count2 = count - (144 - misalign);
+  while (count2 > 0) {
----------------
`const`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92236/new/

https://reviews.llvm.org/D92236