[llvm] [Transforms] LoopIdiomRecognize recognize strlen and wcslen (PR #108985)
Henry Jiang via llvm-commits
llvm-commits at lists.llvm.org
Sat Mar 22 15:47:51 PDT 2025
mustartt wrote:
@mstorsjo The hang seems to have come from https://github.com/wine-mirror/wine/blob/0927c5c3da7cda8cf476416260286bd299ad6319/dlls/ntdll/string.c#L423
```c
size_t __cdecl strlen( const char *str )
{
const char *s = str;
while (*s) s++;
return s - str;
}
```
Where the strlen idiom is recognized and replaced by a call to it self. These type of scenarios are problematic in general.
For example, in GCC without `-ffreestanding` similar things can happen, where `memset` is accidentally defined as direct recursion.
```c
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
Copyright (C) 2018 Free Software Foundation, Inc.
void *memset(void *b, int c, size_t len) {
int i;
unsigned char *p = b;
i = 0;
while (len > 0) {
*p = c;
p++;
len--;
}
return b;
}
0000000000400590 <memset>:
400590: 48 85 d2 test %rdx,%rdx
400593: 74 1b je 4005b0 <memset+0x20>
400595: 48 83 ec 08 sub $0x8,%rsp
400599: 40 0f b6 f6 movzbl %sil,%esi
40059d: e8 ee ff ff ff callq 400590 <memset>
4005a2: 48 83 c4 08 add $0x8,%rsp
4005a6: c3 retq
4005a7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4005ae: 00 00
4005b0: 48 89 f8 mov %rdi,%rax
4005b3: c3 retq
4005b4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4005bb: 00 00 00
4005be: 66 90 xchg %ax,%ax
```
I'm not sure if there exists a solution in general that can solve all cases of this scenario. The best solution for this is probably to pass `-ffreestanding` or the `-no-builtin-<function>`.
Otherwise for this patch, what we can do is skip the loop idiom for functions if they have name matching `strlen` and `wcslen`. But we can still get into similar problems with indirect recursion.
```c
size_t strlen(const char* str) {
return foo(str);
}
__attribute__((noinline)) size_t foo(const char* str) {
const char *s = str;
while (*s) s++;
return s - str;
}
```
However, I envision that these type of scenarios will be exceedingly rare.
Does either of those 2 options work for you?
https://github.com/llvm/llvm-project/pull/108985
More information about the llvm-commits
mailing list