[libc-commits] [libc] [libc][x86] Add Non-temporal code path for large memcpy (PR #187108)

Guillaume Chatelet via libc-commits libc-commits at lists.llvm.org
Thu Mar 19 08:14:26 PDT 2026


================
@@ -143,14 +152,31 @@ inline_memcpy_x86_avx_ge64_sw_prefetching(Ptr __restrict dst,
   // - we prefetched cachelines at 'src + 64', 'src + 128', and 'src + 196'
   // - 'dst' is 32B aligned,
   // - count >= 128.
-  while (offset + K_THREE_CACHELINES + 64 <= count) {
-    // Three cache lines at a time.
-    inline_memcpy_prefetch(dst, src, offset + K_ONE_CACHELINE);
-    inline_memcpy_prefetch(dst, src, offset + K_TWO_CACHELINES);
-    inline_memcpy_prefetch(dst, src, offset + K_THREE_CACHELINES);
-    // Copy one cache line at a time to prevent the use of `rep;movsb`.
-    for (size_t i = 0; i < 3; ++i, offset += K_ONE_CACHELINE)
-      builtin::Memcpy<K_ONE_CACHELINE>::block_offset(dst, src, offset);
+  // If we are using the Non-temporal stores, we don't need prefetching
+  bool need_prefetch_run = true;
+  if constexpr (x86::K_NTA_THRESHOLD != 0) {
+    if (count >= x86::K_NTA_THRESHOLD) {
----------------
gchatelet wrote:

I would say it's unnecessary, we're already doing quite a lot of work here and I don't think it will change anything to mark the branch unlikely.

https://github.com/llvm/llvm-project/pull/187108


More information about the libc-commits mailing list