[libc-commits] [libc] [libc][x86] Add Non-temporal code path for large memcpy (PR #187108)

Alexey Samsonov via libc-commits libc-commits at lists.llvm.org
Tue Mar 17 14:02:13 PDT 2026


================
@@ -143,14 +152,33 @@ inline_memcpy_x86_avx_ge64_sw_prefetching(Ptr __restrict dst,
   // - we prefetched cachelines at 'src + 64', 'src + 128', and 'src + 196'
   // - 'dst' is 32B aligned,
   // - count >= 128.
-  while (offset + K_THREE_CACHELINES + 64 <= count) {
-    // Three cache lines at a time.
-    inline_memcpy_prefetch(dst, src, offset + K_ONE_CACHELINE);
-    inline_memcpy_prefetch(dst, src, offset + K_TWO_CACHELINES);
-    inline_memcpy_prefetch(dst, src, offset + K_THREE_CACHELINES);
-    // Copy one cache line at a time to prevent the use of `rep;movsb`.
-    for (size_t i = 0; i < 3; ++i, offset += K_ONE_CACHELINE)
-      builtin::Memcpy<K_ONE_CACHELINE>::block_offset(dst, src, offset);
+  // If we are using the Non-temporal stores, we don't need prefetching
+  bool need_prefetch_run = true;
+  if constexpr (x86::K_NTA_THRESHOLD != 0 && x86::K_AVX) {
----------------
vonosmas wrote:

We're already `using namespace LIBC_NAMESPACE::x86` here. Also, this entire function is only triggered when `K_AVX` is true, so I feel you can skip this condition.

https://github.com/llvm/llvm-project/pull/187108


More information about the libc-commits mailing list