[libc-commits] [PATCH] D148717: [libc] Improve memcmp latency and codegen

Mon Jun 12 06:22:38 PDT 2023

gchatelet added inline comments.

================
Comment at: libc/src/string/memory_utils/utils.h:160
+  // And reduce the uint64_t into an uint32_t.
+  // TODO: provide a detailed explanation.
+  return static_cast<int32_t>((diff >> 1) | (diff & 0xFFFF));
----------------
nafi3000 wrote:
> For the explanation, please consider whether we can add some version of the following points:
> ```
> For the int64_t to int32_t conversion we want the following properties:
> - int32_t[31:31] == 1 iff diff < 0
> - int32_t[31:0] == 0 iff diff == 0
> 
> We also observe that:
> - When diff < 0: diff[63:32] == 0xffffffff and diff[31:0] != 0
> - When diff > 0: diff[63:32] == 0 and diff[31:0] != 0
> - When diff == 0: diff[63:32] == 0 and diff[31:0] == 0
> - https://godbolt.org/z/8W7qWP6e5
> - This implies that we can only look at diff[32:32] for determining the sign bit for the returned int32_t.
> 
> So, we do the following:
> - int32_t[31:31] = diff[32:32]
> - int32_t[30:0] = diff[31:0] == 0 ? 0 : non-0.
> 
> And, we can achieve the above by the expression below. We could have also used (diff64 >> 1) | (diff64 & 0x1) but (diff64 & 0xFFFF) is faster than (diff64 & 0x1). https://godbolt.org/z/j3b569rW1
> ```
> 
> We can also add all these in a separate diff.
The explanation is fantastic, I copied it verbatim.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148717/new/

https://reviews.llvm.org/D148717