[PATCH] D123819: [InstCombine] Fold strlen and strnlen recursively.

Wed May 4 08:20:05 PDT 2022

nikic added a comment.

In D123819#3488750 <https://reviews.llvm.org/D123819#3488750>, @msebor wrote:

> In D123819#3487880 <https://reviews.llvm.org/D123819#3487880>, @nikic wrote:
>
>> As mentioned on the original review, I think we would be better off not passing the Bound to minStringLength() and instead doing one umin at the end -- this avoids the need to handle this in each branch. Though possibly I forgot some reason why that doesn't work from the previous discussion.
>
> You made this suggestion in the context of the select statement where I implemented it (see line 748).  Removing the `Bound` (and `MaxLen`) from the function completely would prevent handling variable offsets in `strlen` while punting on them in `strnlen` as you requested in D122686 <https://reviews.llvm.org/D122686>.  It would also prevent ultimately handling it in `strnlen` as proposed D123820 <https://reviews.llvm.org/D123820> in cases that the `strlen` folding cannot handle (arrays with embedded nuls where the bound is less than the length of any of the substrings).
>
> Splitting up the variable offset handling between two patches makes the work harder to think about and, it seems, also harder to review, rather than easier as you'd hoped.  Would it help if I merged them into one?

Okay, I think I have made things a bit confusing here: The bit I was originally concerned about is the special handling for the `NullTermIdx == ~((uint64_t)0)` case -- I don't think the implementation from D123820 <https://reviews.llvm.org/D123820> really makes sense. I don't think there's a strong need to split off the handling of variable bounds as long as that special case is not included.

Basically, what I would expect is something along these two patches:
https://gist.github.com/nikic/d847d507ee89f9a2ccb07e39c18ea356 - This extends strnlen support to all the cases strlen supports
https://gist.github.com/nikic/dfa8018bc82ab6347887c8c65249f084 - This adds handling for the special "null not contained in string" case

The latter patch doesn't make a distinction between strlen and strnlen: If there is no null in the string, we just return `-1`, which will result in the bound getting picked as the final result for the strnlen case, which is the only possible non-UB result. The strlen case is always UB, thus returning `-1` is fine there as well. Passing the Bound does allow us to distinguish these and avoid optimizing some UB cases -- but it also does cause additional complexity in this case, because it means the Bound is not handled in a single place.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123819/new/

https://reviews.llvm.org/D123819