[libcxx-commits] [libcxx] [libc++] Optimize the std::mismatch tail (PR #83440)
Louis Dionne via libcxx-commits
libcxx-commits at lists.llvm.org
Thu Mar 28 09:31:06 PDT 2024
================
@@ -84,8 +87,36 @@ __mismatch(_Tp* __first1, _Tp* __last1, _Tp* __first2, _Pred& __pred, _Proj1& __
__first1 += __unroll_count * __vec_size;
__first2 += __unroll_count * __vec_size;
}
+
+ // check the remaining 0-3 vectors
+ while (static_cast<size_t>(__last1 - __first1) >= __vec_size) {
+ if (auto __cmp_res = std::__load_vector<__vec>(__first1) == std::__load_vector<__vec>(__first2);
+ !std::__all_of(__cmp_res)) {
+ auto __offset = std::__find_first_not_set(__cmp_res);
+ return {__first1 + __offset, __first2 + __offset};
+ }
+ __first1 += __vec_size;
+ __first2 += __vec_size;
+ }
+
+ if (__last1 - __first1 == 0)
+ return {__first1, __first2};
+
+ // Check if we can load elements in fron of the current pointer. If that's the case load a vector at
+ // (last - vector_size) to check the remaining elements
+ if (static_cast<size_t>(__first1 - __orig_first1) >= __vec_size) {
+ __first1 = __last1 - __vec_size;
+ __first2 = __last2 - __vec_size;
+ auto __offset =
+ std::__find_first_not_set(std::__load_vector<__vec>(__first1) == std::__load_vector<__vec>(__first2));
+ return {__first1 + __offset, __first2 + __offset};
+ } // else loop over the elements individually
+
+ // TODO: Consider vectorizing the loop tail further with
+ // - smaller vectors
+ // - loading bytes out of range if it's known to be safe
----------------
ldionne wrote:
I would argue we're reaching diminishing returns here. I'm not sure it's worth trying to do more. So maybe remove the comment? Unless you have concrete ideas and think we can provide a substantial improvement.
https://github.com/llvm/llvm-project/pull/83440
More information about the libcxx-commits
mailing list