<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/127365>127365</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Poor optimization of switch statement in Clang 19.1.0 compared to Clang 18.1.0
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
inicula
</td>
</tr>
</table>
<pre>
I was doing some tests/benchmarks regarding `switch` vs array look-ups and found this change in behavior from Clang 18.1.0 to Clang 19.1.0 (and current trunk): https://godbolt.org/z/o4TYYdr4e
Clang 18.1.0 optimizes that big switch as a constant lookup table:
```asm
lea rcx, [rip + .Lswitch.table.main]
.LBB0_12:
movzx edx, byte ptr [rbx + rax - 3]
xor edx, 128
mov rdx, qword ptr [rcx + 8*rdx]
inc byte ptr [rsp + rdx + 112]
```
On the other hand, Clang 19.1.0 generates a separate label for each switch case, and every label feeds into a main one:
```asm
.LBB0_28:
lea rcx, [rsp + 665] ; case 1: return 553;
jmp .LBB0_283
.LBB0_29:
lea rcx, [rsp + 653] ; case 2: return 541;
jmp .LBB0_283
; ..............................
.LBB0_283:
inc byte ptr [rcx]
inc rbp
cmp rbp, 300000000
je .LBB0_20
movzx ecx, byte ptr [rbx + rbp]
movsxd rdx, dword ptr [rax + 4*rcx]
add rdx, rax
mov rcx, r13
jmp rdx
```
This can tank the performance, for example if the branch predictor can't accurately predict which label you're going to access on the current iteration. In my example I'm generating random indexes and with `perf stat` I'm seeing almost 300 million branch misses (one for each `increment()` invocation).
Assuming that this change isn't an intentional trade-off made for a benefit in _some other usecases_, then this is a regression.
On my machine, the results of running that binary (same source code as the one in Godbolt) compiled with Clang 18 vs Clang 19 are as follows:
```
clang 18 = elapsed: 254ms sum: 28928
clang 19 = elapsed: 2813ms sum: 29184
```
So the binary generated by Clang 19 is about 11 times slower.
NOTE: The issue seems to be related to inlining, because if I add `__attribute__((noinline))` to the `increment()` function, then Clang 19 optimizes it with a lookup table, just like Clang 18, and the result is much faster than what I get with inlining allowed:
```
elapsed: 548ms sum: 28928
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyEVstu47gS_Rp6U4ghUZYtL7xwkpuLAI3pASabXgUlqmyxI5IakvKjv35AUnLsJD0jNKB2xDr1OnWK6Jzca6INK-9Z-TjDwbfGbqSWYuhwVpvmvHmGIzpojNR7cEYReHLeMf5UkxatQvvmwNIebRNOsGXmjtKLli0zODhAa_EMnTFvd0PvAHUDOzPoBnwrHYgW9Z5AaqipxYM0FnbWKHjoUO8hr-b5PANvpt_r-JvxKsCIwVrSHrwd9Bvja1ZsofW-d6zYMv7E-NPeNLXp_NzYPeNPvxh_MouXHz8auyCWbVm2vXFjei-V_EUOfIsearmHlAmgAwRhtPOofcxl6MFj3VFwFZHYMkv_0CmWbWF8OsL4tuLE-AOw8t7KHhi_h_m3BD6POHOFUrPykWXb-bf7--w15wl6AlLm8OsEANREoPrsCXpvI2J9iogWT3AHRUKZ7E7Gxvdol_PqFjRFlz7-fTS2uaCKhFoxvg3fE6rUIlrc-HcpI9skizzn6fSlJqlE3zX4lsD4liy0qJvg86axe9Jk0VMot6Mew_-hw5o62BkLhKKdWiLQUbAPRKAD2fN0jqhxILU3gBCKCkb_tkup1Ly6LfUXPRszXC5LVj4CK-6jf8gD5Sz5wWooy4IV91cwP1Uf35OT4t3h-rPDL5yVxY0zfu1skf-3s2A5_9fnqgTFbUhf9ll8ZIGt-ysbMcYQ_sofoMjG5zpMgusw46cLs8XvmV33t6xW5uBOzYW4zQ1xMRktAnEvIY8PNs014y2evhqHFInNiw8VDlafaP0SdQw1eNRvkeE92Z2xCrWIFI3UPaHqOwK5iydqi1q00FtqpPDGBnvGVx5QiCGwvjtPH-HYStGO7D6bgfGVJdhHNQ4kF4KcA5Nma5JE6cMcSaPn8KxBnS_-nxlfqWnOAoRF3RgFUjd0oiTPR-nbIOMhDXAefVDyZOiIghF2yjgfOgxKdp00espISefIBYU2mt6Hli0zqYUlRdozXgWxXmYg9cGIGCbj63mq5ta5QcXcggbf7Ag3lkiH8SYd7LADb7GhO7PbgcImuUSoSdNO-rBXXuPKSqIzOAqj5F5DW3xLOjmQQW8s7S05F2p2kSt1BoWilZpGA7Dkhs47MDuwg9aXQGup0Z5D3g4VgTODFQTCNBSWR5Q9Hdfc_9NKYnwNwqhedjQWfFpFYWdOoghoo_3OdJ05us8yxrKtmOxY8QjUYe-oCUrBy4Vy4AYVf1TrKPxiAv54uMqLq9PrvFp8JvpfJnE3pTppdQP1-T3gUMraDB7yHLxU5MB15kh2rOkf31_-Fzy8tKGfbqDAKOUCketQ3C4CegNSdzJUNyoCCRxcnJ3nOMBsmb2-ovdW1oOn19fIqEqbaESBXIlfPgX8Nfl2gxaJeiMVLjm83wOkT73B253PH-Dn4Dx08o0ufZuW0TtLQjHUIFrYofNkA1E0HANbnmFPI_SUKGBocezG5x5fdapcVJ_bejk7azZFsy7WOKNNvirWy1WZcz5rN9Wqyau6wEW23i2rqinLRV7jAldZVa85X8zkhme8zHi-zIqs5Hy-FBlSg6JZLNerbLVmi4wUym7edQcVblSz2MBNzlfFspxFfXLxGsl5pBnj4Rows5tgcFcPe8cWWSedd-8QXvqONn8aY6eqRzkI4zXu-aA_sXNheG4uC2F80Ca6XN_jZoPtNh_ugdK3Qz0XRjH-FJyPr7vemp8kPONPMZlwqR3zOWz4PwEAAP__bgFhjg">