[libc-commits] [libc] [libc] Add support for string/memory_utils functions for AArch64 without HW FP/SIMD (PR #137592)
Guillaume Chatelet via libc-commits
libc-commits at lists.llvm.org
Wed Apr 30 05:42:37 PDT 2025
================
@@ -19,13 +19,35 @@
namespace LIBC_NAMESPACE_DECL {
-[[maybe_unused]] LIBC_INLINE BcmpReturnType inline_bcmp_aarch64(CPtr p1,
- CPtr p2,
- size_t count) {
- if (LIBC_LIKELY(count <= 32)) {
- if (LIBC_UNLIKELY(count >= 16)) {
- return aarch64::Bcmp<16>::head_tail(p1, p2, count);
- }
+[[maybe_unused]] LIBC_INLINE BcmpReturnType
+inline_bcmp_aarch64_no_fp(CPtr p1, CPtr p2, size_t count) {
+ return generic::Bcmp<uint64_t>::loop_and_tail_align_above(256, p1, p2, count);
+}
+
+#ifdef __ARM_NEON
+[[maybe_unused]] LIBC_INLINE BcmpReturnType
+inline_bcmp_aarch64_with_fp(CPtr p1, CPtr p2, size_t count) {
+ if (count <= 32) {
+ return aarch64::Bcmp<16>::head_tail(p1, p2, count);
+ }
+
+ if (count <= 64) {
+ return aarch64::Bcmp<32>::head_tail(p1, p2, count);
+ }
+
+ if (LIBC_UNLIKELY(count > 256)) {
+ if (auto value = aarch64::Bcmp<32>::block(p1, p2))
+ return value;
+ align_to_next_boundary<16, Arg::P1>(p1, p2, count);
+ }
+
+ return aarch64::Bcmp<32>::loop_and_tail(p1, p2, count);
+}
+#endif
+
+[[gnu::flatten]] LIBC_INLINE BcmpReturnType
+inline_bcmp_aarch64_dispatch(CPtr p1, CPtr p2, size_t count) {
+ if (LIBC_LIKELY(count <= 16)) {
----------------
gchatelet wrote:
> Do you want the shared code in a shared (possibly inline?)
I want full code duplication. I'm usually quite against this but in this particular case duplication makes sense. It allows working on an implementation without having to think about the impact for the other ones.
> Also what do you mean by `if (LIBC_UNLIKELY(count >= 16))` is gone?
Oops my bad, I meant, before we had:
```c++
if (LIBC_LIKELY(count <= 32)) {
if (LIBC_UNLIKELY(count >= 16)) {
return aarch64::Bcmp<16>::head_tail(p1, p2, count);
}
switch ....
```
now we have:
```c++
if (LIBC_LIKELY(count <= 16)) {
switch ...
```
I'd rather have an exact match with the previous implementation for the `with_fp` version if possible. Does it make sense?
https://github.com/llvm/llvm-project/pull/137592
More information about the libc-commits
mailing list