<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/73456>73456</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[clang] On a fixed-size loop clang generates an individual copy of the body of the loop when specific optimization is enabled
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
rilysh
</td>
</tr>
</table>
<pre>
Hello,
As the title implies, for a certain size loop, clang misleadingly generates copies of the entire body of the loop, each one individually.
For example:
```c
#include <stdio.h>
int main(void)
{
unsigned int i, j;
for (i = 0; i < 29; i++)
fprintf(stdout, "i: %d\n", i);
}
```
With optimization level (`-O0`) clang generates:
```asm
.LBB0_1:
movl $0, -4(%rbp)
movl $0, -8(%rbp)
cmpl $29, -8(%rbp)
jae .LBB0_4
movq stdout@GOTPCREL(%rip), %rax
movq (%rax), %rdi
movl -8(%rbp), %edx
leaq .L.str(%rip), %rsi
movb $0, %al
callq fprintf@PLT
movl -8(%rbp), %eax
addl $1, %eax
movl %eax, -8(%rbp)
jmp .LBB0_1
```
(Ignore other labels)
It's clear that the generated assembly output is first setting the index to zero and then comparing if lower than 29. If not, increase it by one (add 1).
However, with optimization (performance-focused than the size), e.g. (`-O2`, `-O3`, `-Ofast`, etc.) clang generates:
```asm
movq stdout@GOTPCREL(%rip), %r14
movq (%r14), %rdi
leaq .L.str(%rip), %rbx
movq %rbx, %rsi
xorl %edx, %edx
xorl %eax, %eax
callq fprintf@PLT
movq (%r14), %rdi
movq %rbx, %rsi
movl $1, %edx
xorl %eax, %eax
callq fprintf@PLT
movq (%r14), %rdi
movq %rbx, %rsi
movl $2, %edx
xorl %eax, %eax
[ ... similar copies with different index value ... ]
movq (%r14), %rdi
movq %rbx, %rsi
movl $28, %edx
xorl %eax, %eax
callq fprintf@PLT
xorl %eax, %eax
```
In this assembly output, clang generates the entire body code (`fprintf()`) for each individual loop and sets up the index (which would be after incrementing the index). Note that this behavior only happens if the loop is `> 0` and `<= 28`. In comparison, GCC generates the following assembly output (with `-O2`, `-Ofast`, etc):
```asm
xorl %ebx, %ebx
movq stdout(%rip), %rdi
movl %ebx, %edx
movq %rbp, %rsi
xorl %eax, %eax
addl $1, %ebx
call fprintf@PLT
cmpl $29, %ebx
jne .L2
addq $8, %rsp
xorl %eax, %eax
```
This is quite similar to Clang's `-O0` output. With `-Os`, and `-Ofast`, GCC generates nearly identical assembly with a few changes.
I've tested it with Clang 17.0.1 and GCC 13.2. Here's the Godbolt link: https://godbolt.org/z/1xq4qfGzd
I've only tested this on x86-64 and RISC-V 64-bit platforms (although I don't think the platform may vary since it only happens with performance-wise optimizations are enabled).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMV02P2zgS_TX0pWBBouQPHXxIu9NJA8FmMBvsHBeUWLKYoUg1Sfmjf_2ClGTL6k52ZpHDAg21JRaLj8VXr4rMWnFQiDuyeiCrxwXrXK3Nzgh5sfWi0Pyy-4xSakL3JH4k8YcPFlyN4ISTCKJppUBL6B4qbYBBicYxocCKVwSpdeuHSsnUARphJTIu1EFe4IAKDXNoodStQAu6Cm5ROWEQ_Lrjp9ELsrIGrRCE4uIoeMekvEQ9qP75pA3gmTWtRJIO38g67v_K4Z2mQpWy4wgk3VvHhY5qkn6c-hHKQcOEInR71IITmg-jm4fRLO9UiBsHbyw8vu8kfZh6IXHuY0LoVgBJHyEm6QP4n3ugefhN6EP4y68T_JzWCOUqQrfWcd0575pQKkj6AQhdcbLaK0Kp_yxIcDSCe5zteArmD-Fq0K0TjXhlTmgFEo8oPTqyjpdfgz3Nh6O6ns7bMDLb9F-iLw8P8b-Tq0Wjj95dFntky8w7pitTtNftBQOAm832rU3ZtKMNzX9k9J0h9KtnV88vftYQsCz-9PXbb_vfP34Z5oowN8RxZdj5ftJgw84TGy7uIc9Q9FbIB08SWfAUfYmsM--taW_-imkICF0xOeycSem9jMefxb99-fZXUIz7YZyPoUvmY7fQh28_imvTwniq7zKJbp8PShsE7Wo0IFmB0k7o65_PjtCNhVIiM-Bq5kIOj4TiwKzFppAX0J1rOwfCQiWMdWDROaEOwVwojmdwGl7RaGCK-68KSt20zHgjUYHUJwwrKKB5BM8VKB2SRajSILMIwkFxCYpB6JZxDgmh-Z1gfNYnPKLxs05vEoTQbYum0qZhqsRlpcvOIu9X9CC9xA3ngNEhuqYSDam0h_CSTl8qZt3wjq6M_l6-eb6SOP9LHE-yuzmDSZK9Q3FPXhLnP-ZucZ75Cp9mxD5rI_tBnxWz_JgMsvOcmoH3E9m75_1_Q_9TUJ71YfCWEP8_kOjfgjQ8Vw8QRRFY0QjJzFg6A3W5qCo06ItRSJ4jkx0Ga7J6nDr5hVvY_qqw_nzv79WzZ5-Dws7l5NZt3BqMeVtRao5Dst5qrY9CXwF9zQ6txq3NCC1IkCGLzkLXTkSK0O2pFmUNJ91JDgUCqxyaXoUav-5U07wAwT-0w1EahYUCa3YU2oBW8gI1a1tU1ivc2Px4jfTg0o_gwxCAhPe9byvCRiJ4vsqj1cqH4dN-PwtCpaXUJ49nLsJ-E55Eb_XrXrJCv_HhjpJvdGpyllfy4ExExs7mrdrwOcumXvg7UtT-RIrecskXyZkkFBN6vs9O35QMjM_ns74rDOpJr_57ZNn2iqv9Xzn-zbNDWHjphMNr0jsNe8_wUGavvdtwkhH8cT1HO5zbQJe7s7wnh0Jm5AUE93QtmbzxI7CCQYUnKGumDmjv6uczoZsjgkPrS7twvX2AB8kmiqMkrO5XS9KIRvAZDQbgno-fNC-0dCCF-tN3t7VzbaiA9InQp0M_GmlzIPTpldCn5PySvVSfXvnd4iFrBgQhn7SC83a9XGdh7d-f_7lf_gvW2bIQDlrJnK_oNrQE0tW6O9TwDNznzCYkpPozgBstoWEXODJzAStUGZqKuzwNO572CSdh8a6RsMCMVyBWSPR3iWjBdynP05wtcJds4iSmq2yTLuodT6otq-Kq3OSr9ZammPKqrHKW0KKsWMkWYkdjmiYJXSd5nKVptEJeVutikydlmqc0J1mMDRMykvLY-NAthLUd7jZptlov-oYt3PQoLXsSUX_pMztvvyy6gyVZLIV19uYh3PT8pH7G6hG-Ks8JcUa-vN7z3sguU1MFLXV7vdDNL3dw8s2dbbEUlSjvmzBhx8gtOiN3M4oIV3dFVOqG0CePd_i3bI3-jqUj9Cls3xL6FCLwnwAAAP__UpNd5w">