[PATCH] D123631: [InstCombine] Fold memrchr calls with sequences of identical bytes.

Thu Apr 14 09:20:08 PDT 2022

nikic added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:933
+  // C and N, fold memrchr(S, C, N) to
+  //   N != 0 && N <= sizeof S && *S == C ? S + N - 1 : null
+  // AKA
----------------
msebor wrote:
> nikic wrote:
> > Why do we need the `N <= sizeof S` comparison here? As that would be UB, I think we can omit it here.
> I did it for the same reason as the folding of the excessive bounds to null.  In my view, if it can be done cheaply it's preferable to avoid out of bounds accesses.  GCC takes this approach in some cases (e.g., for `strlen("123" + i)`) but not in others.  But unless these cases are handled consistently in LLVM I don't think it matters much one way or the other either.  I can remove the check if you prefer.  (It would be helpful to have some documented guidance on how these issues are expected to be handled in general,  or to try to come to a consensus if it's not yet clear.)
> 
> My motivation for recognizing this particular pattern is to optimize searches in zero-initialized arrays (including the trailing sequences of nulls following initial strings in bigger arrays, ending at the offset of one of the nulls).  The former are rare on their own but common as members of bigger aggregates.  They're yet to be handled by LLVM (the latter is only handled in `strlen`) but my hope is to add support for both to all the function folders.  I'd also like to extend this pattern to `memchr`, for the same reason.  (The non-null cases aren't as important but they naturally fall out of this.)
> 
> GCC doesn't have a built-in for `memrchr` so it doesn't do anything interesting with calls to it.  Its `memchr folder is very simplistic and doesn't support this pattern (unlike LLVM, it does handle searches in all kinds of constant aggregates).
> I did it for the same reason as the folding of the excessive bounds to null.  In my view, if it can be done cheaply it's preferable to avoid out of bounds accesses.  GCC takes this approach in some cases (e.g., for `strlen("123" + i)`) but not in others.  But unless these cases are handled consistently in LLVM I don't think it matters much one way or the other either.  I can remove the check if you prefer.  (It would be helpful to have some documented guidance on how these issues are expected to be handled in general,  or to try to come to a consensus if it's not yet clear.)

I don't think there is any official guidance for this, but my 2c is that we're free to exploit UB aggressively, but also don't need to go out of the way to do so, if there's no benefit. In D123628 the choice was between retaining the libcall (and thus sanitizer checks) or returning null. In this case, the libcall is going to be removed either way, and we're going to return an "incorrect" result either way, whether that is `s + n` or `null`. I don't think one of those values is preferable over the other. As such, we should make the choice which results in fewer instructions here, which is to omit the additional bounds check.

> My motivation for recognizing this particular pattern is to optimize searches in zero-initialized arrays (including the trailing sequences of nulls following initial strings in bigger arrays, ending at the offset of one of the nulls).  The former are rare on their own but common as members of bigger aggregates.  They're yet to be handled by LLVM (the latter is only handled in `strlen`) but my hope is to add support for both to all the function folders.  I'd also like to extend this pattern to `memchr`, for the same reason.  (The non-null cases aren't as important but they naturally fall out of this.)

Ah, I see, thanks for the context.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123631/new/

https://reviews.llvm.org/D123631