[libc-commits] [PATCH] D92236: [LIBC] Add optimized memcpy routine for AArch64
Andre Vieira via Phabricator via libc-commits
libc-commits at lists.llvm.org
Wed Jan 20 02:14:24 PST 2021
avieira updated this revision to Diff 317812.
avieira added a comment.
Hi,
So here is an updated version for an optimized memcpy routine for AArch64. This one basically uses the same as the default memcpy, but picks a different block size and alignment for copies > 128.
I also disable tail merging as I found it was leading to worse code. This new memcpy seems to show improvements accross the board for both sweep and distribution benchmarks.
I am continuing to investigate a better organization of the copies smaller than 128bytes, as I had before, using the new benchmarks. Using the same code I had before I am seeing an improvement in Uniform1024 (new uniform distribution I added for sizes 0-1024), I also see an improvement in Memcpy Distributions A, M, Q and U, but a regression in B, L, S and W. For distribution D the optimized version beats the older version but shows a regression compared to the version in this patch.
I'll spend a few extra cycles trying to see if I can find a sweet spot, but I might leave it like this.
Is this OK for main?
Also I have two patches downstream for:
1. Uniform1024 distribution, an uniform distribution for sizes 0-1024
2. Options to define a Sweep 'min size' and 'step'.
Let me know if you are interested in either of these.
Kind regards,
Andre
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D92236/new/
https://reviews.llvm.org/D92236
Files:
libc/src/string/CMakeLists.txt
libc/src/string/aarch64/CMakeLists.txt
libc/src/string/aarch64/memcpy.cpp
Index: libc/src/string/aarch64/memcpy.cpp
===================================================================
--- /dev/null
+++ libc/src/string/aarch64/memcpy.cpp
@@ -0,0 +1,67 @@
+//===-- Implementation of memcpy ------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/string/memcpy.h"
+#include "src/__support/common.h"
+#include "src/string/memory_utils/memcpy_utils.h"
+
+namespace __llvm_libc {
+
+// Design rationale
+// ================
+//
+// Using a profiler to observe size distributions for calls into libc
+// functions, it was found most operations act on a small number of bytes.
+// This makes it important to favor small sizes.
+//
+// We have used __builtin_expect to tell the compiler to favour lower sizes as
+// that will reduce the branching overhead where that would hurt most
+// proportional to total cost of copying.
+//
+// The function is written in C++ for several reasons:
+// - The compiler can __see__ the code, this is useful when performing Profile
+// Guided Optimization as the optimized code can take advantage of branching
+// probabilities.
+// - It also allows for easier customization and favors testing multiple
+// implementation parameters.
+// - As compilers and processors get better, the generated code is improved
+// with little change on the code side.
+static void memcpy_aarch64(char *__restrict dst, const char *__restrict src,
+ size_t count) {
+ if (count == 0)
+ return;
+ if (count == 1)
+ return CopyBlock<1>(dst, src);
+ if (count == 2)
+ return CopyBlock<2>(dst, src);
+ if (count == 3)
+ return CopyBlock<3>(dst, src);
+ if (count == 4)
+ return CopyBlock<4>(dst, src);
+ if (count < 8)
+ return CopyBlockOverlap<4>(dst, src, count);
+ if (count < 16)
+ return CopyBlockOverlap<8>(dst, src, count);
+ if (count < 32)
+ return CopyBlockOverlap<16>(dst, src, count);
+ if (count < 64)
+ return CopyBlockOverlap<32>(dst, src, count);
+ if (count < 128)
+ return CopyBlockOverlap<64>(dst, src, count);
+ return CopyAlignedBlocks<64,16>(dst, src, count);
+}
+
+LLVM_LIBC_FUNCTION(void *, memcpy,
+ (void *__restrict dst, const void *__restrict src,
+ size_t size)) {
+ memcpy_aarch64(reinterpret_cast<char *>(dst),
+ reinterpret_cast<const char *>(src), size);
+ return dst;
+}
+
+} // namespace __llvm_libc
Index: libc/src/string/aarch64/CMakeLists.txt
===================================================================
--- /dev/null
+++ libc/src/string/aarch64/CMakeLists.txt
@@ -0,0 +1 @@
+add_memcpy("memcpy_${LIBC_TARGET_MACHINE}")
Index: libc/src/string/CMakeLists.txt
===================================================================
--- libc/src/string/CMakeLists.txt
+++ libc/src/string/CMakeLists.txt
@@ -215,6 +215,11 @@
if(${LIBC_TARGET_MACHINE} STREQUAL "x86_64")
set(LIBC_STRING_TARGET_ARCH "x86")
set(MEMCPY_SRC ${LIBC_SOURCE_DIR}/src/string/x86/memcpy.cpp)
+elseif(${LIBC_TARGET_MACHINE} STREQUAL "aarch64")
+ set(LIBC_STRING_TARGET_ARCH "aarch64")
+ set(MEMCPY_SRC ${LIBC_SOURCE_DIR}/src/string/aarch64/memcpy.cpp)
+#Disable tail merging as it leads to lower performance
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mllvm --tail-merge-threshold=0")
else()
set(LIBC_STRING_TARGET_ARCH ${LIBC_TARGET_MACHINE})
set(MEMCPY_SRC ${LIBC_SOURCE_DIR}/src/string/memcpy.cpp)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D92236.317812.patch
Type: text/x-patch
Size: 3689 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libc-commits/attachments/20210120/7d949c1f/attachment-0001.bin>
More information about the libc-commits
mailing list