[libc-commits] [PATCH] D148717: [libc] Improve memcmp latency and codegen

Nafi Rouf via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed May 3 10:04:56 PDT 2023


nafi3000 added inline comments.


================
Comment at: libc/src/string/memory_utils/op_generic.h:466-476
+    if constexpr (cmp_is_expensive<T>::value) {
+      if (!eq<T>(p1, p2, 0))
+        return cmp_neq<T>(p1, p2, 0);
+    } else {
+      if (auto value = cmp<T>(p1, p2, 0))
         return value;
+    }
----------------
I wonder if it is better to use `cmp<T>` only for the last comparison. Motivation is that for non-last compare blocks we need to check the comparison result anyway (e.g. line 470 above) to decide whether to load and compare the next block in the sequence. Isn't it better to compute this decision (0 or non-0) as early as possible instead of computing the full cmp result (0, <0 or >0)?

E.g.

    if constexpr (sizeof...(TS) == 0) {
      if constexpr (cmp_is_expensive<T>::value) {
        if (eq<T>(p1, p2, 0))
          return MemcmpReturnType::ZERO();
        return cmp_neq<T>(p1, p2, 0);
      } else {
        return cmp<T>(p1, p2, 0);
      }
    } else {
      if (!eq<T>(p1, p2, 0))
        return cmp_neq<T>(p1, p2, 0);
      return MemcmpSequence<TS...>::block(p1 + sizeof(T), p2 + sizeof(T));
    }

And, for the last block, I wonder if we can invariably call `cmp<T>` instead. What is better would depend on data. E.g. for `__m512i`, `cmp<T>` is faster if there is at least 1 byte mismatch in the last 64 bytes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148717/new/

https://reviews.llvm.org/D148717



More information about the libc-commits mailing list