[libc-commits] [PATCH] D148717: [libc] Improve memcmp latency and codegen
Guillaume Chatelet via Phabricator via libc-commits
libc-commits at lists.llvm.org
Mon Jun 12 06:22:38 PDT 2023
gchatelet added inline comments.
================
Comment at: libc/src/string/memory_utils/utils.h:160
+ // And reduce the uint64_t into an uint32_t.
+ // TODO: provide a detailed explanation.
+ return static_cast<int32_t>((diff >> 1) | (diff & 0xFFFF));
----------------
nafi3000 wrote:
> For the explanation, please consider whether we can add some version of the following points:
> ```
> For the int64_t to int32_t conversion we want the following properties:
> - int32_t[31:31] == 1 iff diff < 0
> - int32_t[31:0] == 0 iff diff == 0
>
> We also observe that:
> - When diff < 0: diff[63:32] == 0xffffffff and diff[31:0] != 0
> - When diff > 0: diff[63:32] == 0 and diff[31:0] != 0
> - When diff == 0: diff[63:32] == 0 and diff[31:0] == 0
> - https://godbolt.org/z/8W7qWP6e5
> - This implies that we can only look at diff[32:32] for determining the sign bit for the returned int32_t.
>
> So, we do the following:
> - int32_t[31:31] = diff[32:32]
> - int32_t[30:0] = diff[31:0] == 0 ? 0 : non-0.
>
> And, we can achieve the above by the expression below. We could have also used (diff64 >> 1) | (diff64 & 0x1) but (diff64 & 0xFFFF) is faster than (diff64 & 0x1). https://godbolt.org/z/j3b569rW1
> ```
>
> We can also add all these in a separate diff.
The explanation is fantastic, I copied it verbatim.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D148717/new/
https://reviews.llvm.org/D148717
More information about the libc-commits
mailing list