[libc-commits] [PATCH] D148717: [libc] Improve memcmp latency and codegen
Nafi Rouf via Phabricator via libc-commits
libc-commits at lists.llvm.org
Wed May 3 10:04:56 PDT 2023
nafi3000 added inline comments.
================
Comment at: libc/src/string/memory_utils/op_generic.h:466-476
+ if constexpr (cmp_is_expensive<T>::value) {
+ if (!eq<T>(p1, p2, 0))
+ return cmp_neq<T>(p1, p2, 0);
+ } else {
+ if (auto value = cmp<T>(p1, p2, 0))
return value;
+ }
----------------
I wonder if it is better to use `cmp<T>` only for the last comparison. Motivation is that for non-last compare blocks we need to check the comparison result anyway (e.g. line 470 above) to decide whether to load and compare the next block in the sequence. Isn't it better to compute this decision (0 or non-0) as early as possible instead of computing the full cmp result (0, <0 or >0)?
E.g.
if constexpr (sizeof...(TS) == 0) {
if constexpr (cmp_is_expensive<T>::value) {
if (eq<T>(p1, p2, 0))
return MemcmpReturnType::ZERO();
return cmp_neq<T>(p1, p2, 0);
} else {
return cmp<T>(p1, p2, 0);
}
} else {
if (!eq<T>(p1, p2, 0))
return cmp_neq<T>(p1, p2, 0);
return MemcmpSequence<TS...>::block(p1 + sizeof(T), p2 + sizeof(T));
}
And, for the last block, I wonder if we can invariably call `cmp<T>` instead. What is better would depend on data. E.g. for `__m512i`, `cmp<T>` is faster if there is at least 1 byte mismatch in the last 64 bytes.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D148717/new/
https://reviews.llvm.org/D148717
More information about the libc-commits
mailing list