[libc-commits] [PATCH] D100646: [libc] Add a set of elementary operations
Guillaume Chatelet via Phabricator via libc-commits
libc-commits at lists.llvm.org
Wed Apr 28 07:44:04 PDT 2021
gchatelet marked 3 inline comments as done.
gchatelet added a subscriber: sivachandra.
gchatelet added inline comments.
================
Comment at: libc/src/string/memory_utils/elements.h:154-166
+inline int Scalar<uint32_t>::ScalarThreeWayCompare(uint32_t a, uint32_t b) {
+ const int64_t la = Endian::ToBigEndian(a);
+ const int64_t lb = Endian::ToBigEndian(b);
+ const int64_t diff = la - lb;
+ return diff ? (diff < 0 ? -1 : 1) : 0;
+}
+template <>
----------------
avieira wrote:
> gchatelet wrote:
> > @avieira the generated code is not that bad but maybe you can come up with a better idea?
> >
> > The challenge here is to keep the semantic of the [3-way comparison](https://en.wikipedia.org/wiki/Three-way_comparison) that fits in an `int` while working on types which diff is larger than `int`.
> >
> > Generated code:
> > intel : https://godbolt.org/z/Khxv6b3r9
> > arm : https://godbolt.org/z/YTah338hY
> @gchatelet Avoiding the subtraction and comparing the 'la' and 'lb' variables directly lead to better codegen on AArch64.
Indeed it's better, thank you.
================
Comment at: libc/src/string/memory_utils/elements.h:189-205
+ static int ThreeWayCompare(const char *a, const char *b) {
+ const auto mask = Base::NotEqualMask(Base::Load(a), Base::Load(b));
+ if (!mask)
+ return 0;
+ return CharDiff(a, b, mask);
+ }
+
----------------
avieira wrote:
> gchatelet wrote:
> > @avieira for the vector version of the three way compare we compute the byte-wise not equal mask, convert this mask to GPR and look at the number of trailing zeros (X86 is LittleEndian), this number is the index of the mismatching byte.
> >
> > After trying to work on the vector directly to extract this byte I figured out it's easier to just reload it, it should be fast and probably faster than a bunch of vector operations.
> >
> > i suspect the same approach will work for ARM as well, WDYT?
> @gchatelet I did not look at using NEON. The problem with the mask approach is that NEON does not have movemask-like instructions, so finding the first mismatched byte on a vector would require more work than it is worth it I think. SVE2 might help us here in the future though.
We can add an arm specialization as I did for x86 if needed, let's keep it as-is for now.
================
Comment at: libc/test/src/string/memory_utils/CMakeLists.txt:14
+ COMPILE_OPTIONS
+ -march=native
)
----------------
avieira wrote:
> AArch64 doesn't support -march=native, we use -mcpu=native instead, so it would be good to specify the correct option per target.
Thx I wasn't aware of this. I found this article, it's helpful and confusing at the same time :-D
https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu
@sivachandra should we define a new CMake macro or do you think this should be part of `add_libc_unittest`?
The aim is to use whatever is available on the host machine when performing the tests.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100646/new/
https://reviews.llvm.org/D100646
More information about the libc-commits
mailing list