[libc-commits] [PATCH] D100646: [libc] Add a set of elementary operations

Thu Apr 22 05:08:41 PDT 2021

avieira added inline comments.

================
Comment at: libc/src/string/memory_utils/elements.h:154-166
+inline int Scalar<uint32_t>::ScalarThreeWayCompare(uint32_t a, uint32_t b) {
+  const int64_t la = Endian::ToBigEndian(a);
+  const int64_t lb = Endian::ToBigEndian(b);
+  const int64_t diff = la - lb;
+  return diff ? (diff < 0 ? -1 : 1) : 0;
+}
+template <>
----------------
gchatelet wrote:
> @avieira the generated code is not that bad but maybe you can come up with a better idea?
> 
> The challenge here is to keep the semantic of the [3-way comparison](https://en.wikipedia.org/wiki/Three-way_comparison) that fits in an `int` while working on types which diff is larger than `int`.
> 
> Generated code:
> intel : https://godbolt.org/z/Khxv6b3r9
> arm : https://godbolt.org/z/YTah338hY
@gchatelet Avoiding the subtraction and comparing the 'la' and 'lb' variables directly lead to better codegen on AArch64.

================
Comment at: libc/src/string/memory_utils/elements.h:189-205
+  static int ThreeWayCompare(const char *a, const char *b) {
+    const auto mask = Base::NotEqualMask(Base::Load(a), Base::Load(b));
+    if (!mask)
+      return 0;
+    return CharDiff(a, b, mask);
+  }
+
----------------
gchatelet wrote:
> @avieira for the vector version of the three way compare we compute the byte-wise not equal mask, convert this mask to GPR and look at the number of trailing zeros (X86 is LittleEndian), this number is the index of the mismatching byte.
> 
> After trying to work on the vector directly to extract this byte I figured out it's easier to just reload it, it should be fast and probably faster than a bunch of vector operations.
> 
> i suspect the same approach will work for ARM as well, WDYT?
@gchatelet I did not look at using NEON. The problem with the mask approach is that NEON does not have movemask-like instructions, so finding the first mismatched byte on a vector would require more work than it is worth it I think. SVE2 might help us here in the future though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100646/new/

https://reviews.llvm.org/D100646