[libc-commits] [PATCH] D148717: [libc] Improve memcmp latency and codegen
Nafi Rouf via Phabricator via libc-commits
libc-commits at lists.llvm.org
Fri Jun 9 22:11:25 PDT 2023
nafi3000 accepted this revision.
nafi3000 added inline comments.
================
Comment at: libc/src/string/memory_utils/utils.h:157-159
+ // We perform the difference as an uint64_t.
+ const int64_t diff = static_cast<int64_t>(a) - static_cast<int64_t>(b);
+ // And reduce the uint64_t into an uint32_t.
----------------
nit: s/uint64_t/int64_t/ and s/uint32_t/int32_t/ in the comments.
================
Comment at: libc/src/string/memory_utils/utils.h:160
+ // And reduce the uint64_t into an uint32_t.
+ // TODO: provide a detailed explanation.
+ return static_cast<int32_t>((diff >> 1) | (diff & 0xFFFF));
----------------
For the explanation, please consider whether we can add some version of the following points:
```
For the int64_t to int32_t conversion we want the following properties:
- int32_t[31:31] == 1 iff diff < 0
- int32_t[31:0] == 0 iff diff == 0
We also observe that:
- When diff < 0: diff[63:32] == 0xffffffff and diff[31:0] != 0
- When diff > 0: diff[63:32] == 0 and diff[31:0] != 0
- When diff == 0: diff[63:32] == 0 and diff[31:0] == 0
- https://godbolt.org/z/8W7qWP6e5
- This implies that we can only look at diff[32:32] for determining the sign bit for the returned int32_t.
So, we do the following:
- int32_t[31:31] = diff[32:32]
- int32_t[30:0] = diff[31:0] == 0 ? 0 : non-0.
And, we can achieve the above by the expression below. We could have also used (diff64 >> 1) | (diff64 & 0x1) but (diff64 & 0xFFFF) is faster than (diff64 & 0x1). https://godbolt.org/z/j3b569rW1
```
We can also add all these in a separate diff.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D148717/new/
https://reviews.llvm.org/D148717
More information about the libc-commits
mailing list