[libc-commits] [PATCH] D100646: [libc] Add a set of elementary operations
Andre Vieira via Phabricator via libc-commits
libc-commits at lists.llvm.org
Thu Apr 22 05:08:41 PDT 2021
avieira added inline comments.
================
Comment at: libc/src/string/memory_utils/elements.h:154-166
+inline int Scalar<uint32_t>::ScalarThreeWayCompare(uint32_t a, uint32_t b) {
+ const int64_t la = Endian::ToBigEndian(a);
+ const int64_t lb = Endian::ToBigEndian(b);
+ const int64_t diff = la - lb;
+ return diff ? (diff < 0 ? -1 : 1) : 0;
+}
+template <>
----------------
gchatelet wrote:
> @avieira the generated code is not that bad but maybe you can come up with a better idea?
>
> The challenge here is to keep the semantic of the [3-way comparison](https://en.wikipedia.org/wiki/Three-way_comparison) that fits in an `int` while working on types which diff is larger than `int`.
>
> Generated code:
> intel : https://godbolt.org/z/Khxv6b3r9
> arm : https://godbolt.org/z/YTah338hY
@gchatelet Avoiding the subtraction and comparing the 'la' and 'lb' variables directly lead to better codegen on AArch64.
================
Comment at: libc/src/string/memory_utils/elements.h:189-205
+ static int ThreeWayCompare(const char *a, const char *b) {
+ const auto mask = Base::NotEqualMask(Base::Load(a), Base::Load(b));
+ if (!mask)
+ return 0;
+ return CharDiff(a, b, mask);
+ }
+
----------------
gchatelet wrote:
> @avieira for the vector version of the three way compare we compute the byte-wise not equal mask, convert this mask to GPR and look at the number of trailing zeros (X86 is LittleEndian), this number is the index of the mismatching byte.
>
> After trying to work on the vector directly to extract this byte I figured out it's easier to just reload it, it should be fast and probably faster than a bunch of vector operations.
>
> i suspect the same approach will work for ARM as well, WDYT?
@gchatelet I did not look at using NEON. The problem with the mask approach is that NEON does not have movemask-like instructions, so finding the first mismatched byte on a vector would require more work than it is worth it I think. SVE2 might help us here in the future though.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100646/new/
https://reviews.llvm.org/D100646
More information about the libc-commits
mailing list