[PATCH] D123631: [InstCombine] Fold memrchr calls with sequences of identical bytes.

Martin Sebor via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 6 16:47:11 PDT 2022


msebor added inline comments.


================
Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:962
+  // Truncate the string to search at most EndOff characters.
+  Str = Str.substr(0, EndOff);
+  if (Str.find_first_not_of(Str[0]) != StringRef::npos)
----------------
nikic wrote:
> Use the helper function that avoids the 32-bit truncation issue?
The issue can't happen here because `EndOff` is either restricted to  at most `Str.size()` earlier on, or, for variable offsets, left initialized to `UINT64_MAX` and then implicitly converted to `SIZE_MAX` AKA `std::npos` when passed to `substr`.


================
Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:970
+  // AKA
+  //   *S == C ? S + N - 1 : null.
+  Type *SizeTy = Size->getType();
----------------
nikic wrote:
> I don't get the AKA part of this comment.
When `N` is constant it's not zero at this point (constants zero and one are handled above).  It's might be a vestige of the constant zero handling being done later in one of the early patches.  Let me remove it.


================
Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:977
+  Value *CEqS0 = B.CreateICmpEQ(ConstantInt::get(Int8Ty, Str[0]), CharVal);
+  Value *And = B.CreateAnd(NNeZ, CEqS0);
+  Value *SizeM1 = B.CreateSub(Size, ConstantInt::get(SizeTy, 1));
----------------
nikic wrote:
> I think this should be using CreateLogicalAnd, in case `CharVal` is poison (unless we want to argue that this memrchr argument cannot be poison, even for a zero bound).
I don't have enough experience with `poison` to understand how it makes a difference but sure, `CreateLogicalAnd` sounds fine.

(Since `CharVal` is treated as an `unsigned char` by these functions, any value should be allowed here, including indeterminate/uninitialized.  What exactly that means in practical C terms has been debated but the official albeit sometimes contested position is that each evaluation of an indeterminate `unsigned char` might yield a different result).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123631/new/

https://reviews.llvm.org/D123631



More information about the llvm-commits mailing list