[llvm] [Transforms] LoopIdiomRecognize recognize strlen and wcslen (PR #108985)

Henry Jiang via llvm-commits llvm-commits at lists.llvm.org
Sat Mar 22 15:47:51 PDT 2025


mustartt wrote:

@mstorsjo The hang seems to have come from https://github.com/wine-mirror/wine/blob/0927c5c3da7cda8cf476416260286bd299ad6319/dlls/ntdll/string.c#L423

```c
size_t __cdecl strlen( const char *str )
{
    const char *s = str;
    while (*s) s++;
    return s - str;
}
```

Where the strlen idiom is recognized and replaced by a call to it self. These type of scenarios are problematic in general.

For example, in GCC without `-ffreestanding` similar things can happen, where `memset` is accidentally defined as direct recursion. 

```c
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
Copyright (C) 2018 Free Software Foundation, Inc.

void *memset(void *b, int c, size_t len) {
  int i;
  unsigned char *p = b;
  i = 0;
  while (len > 0) {
    *p = c;
    p++;
    len--;
  }
  return b;
}

0000000000400590 <memset>:
  400590:	48 85 d2             	test   %rdx,%rdx
  400593:	74 1b                	je     4005b0 <memset+0x20>
  400595:	48 83 ec 08          	sub    $0x8,%rsp
  400599:	40 0f b6 f6          	movzbl %sil,%esi
  40059d:	e8 ee ff ff ff       	callq  400590 <memset>
  4005a2:	48 83 c4 08          	add    $0x8,%rsp
  4005a6:	c3                   	retq   
  4005a7:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  4005ae:	00 00 
  4005b0:	48 89 f8             	mov    %rdi,%rax
  4005b3:	c3                   	retq   
  4005b4:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  4005bb:	00 00 00 
  4005be:	66 90                	xchg   %ax,%ax
```

I'm not sure if there exists a solution in general that can solve all cases of this scenario. The best solution for this is probably to pass `-ffreestanding` or the `-no-builtin-<function>`.

Otherwise for this patch, what we can do is skip the loop idiom for functions if they have name matching `strlen` and `wcslen`. But we can still get into similar problems with indirect recursion.

```c
size_t strlen(const char* str) {
  return foo(str);
}
__attribute__((noinline)) size_t foo(const char* str) {
  const char *s = str;
  while (*s) s++;
  return s - str;
}
```

However, I envision that these type of scenarios will be exceedingly rare. 
Does either of those 2 options work for you?






https://github.com/llvm/llvm-project/pull/108985


More information about the llvm-commits mailing list