[libc-commits] [libc] [libc][x86] Add Non-temporal code path for large memcpy (PR #187108)
via libc-commits
libc-commits at lists.llvm.org
Wed Mar 18 11:55:10 PDT 2026
================
@@ -143,14 +152,31 @@ inline_memcpy_x86_avx_ge64_sw_prefetching(Ptr __restrict dst,
// - we prefetched cachelines at 'src + 64', 'src + 128', and 'src + 196'
// - 'dst' is 32B aligned,
// - count >= 128.
- while (offset + K_THREE_CACHELINES + 64 <= count) {
- // Three cache lines at a time.
- inline_memcpy_prefetch(dst, src, offset + K_ONE_CACHELINE);
- inline_memcpy_prefetch(dst, src, offset + K_TWO_CACHELINES);
- inline_memcpy_prefetch(dst, src, offset + K_THREE_CACHELINES);
- // Copy one cache line at a time to prevent the use of `rep;movsb`.
- for (size_t i = 0; i < 3; ++i, offset += K_ONE_CACHELINE)
- builtin::Memcpy<K_ONE_CACHELINE>::block_offset(dst, src, offset);
+ // If we are using the Non-temporal stores, we don't need prefetching
+ bool need_prefetch_run = true;
+ if constexpr (x86::K_NTA_THRESHOLD != 0) {
+ if (count >= x86::K_NTA_THRESHOLD) {
----------------
lntue wrote:
Should we add `LIBC_UNLIKELY` to this check? Will it have any effect on the performance at all?
https://github.com/llvm/llvm-project/pull/187108
More information about the libc-commits
mailing list