<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/127365>127365</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Poor optimization of switch statement in Clang 19.1.0 compared to Clang 18.1.0
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          inicula
      </td>
    </tr>
</table>

<pre>
    I was doing some tests/benchmarks regarding `switch` vs array look-ups and found this change in behavior from Clang 18.1.0 to Clang 19.1.0 (and current trunk): https://godbolt.org/z/o4TYYdr4e

Clang 18.1.0 optimizes that big switch as a constant lookup table:

```asm
        lea     rcx, [rip + .Lswitch.table.main]
.LBB0_12:
        movzx   edx, byte ptr [rbx + rax - 3]
        xor     edx, 128
        mov     rdx, qword ptr [rcx + 8*rdx]
 inc     byte ptr [rsp + rdx + 112]
```

On the other hand, Clang 19.1.0 generates a separate label for each switch case, and every label feeds into a main one:

```asm
.LBB0_28:
        lea     rcx, [rsp + 665] ; case 1: return 553;
        jmp     .LBB0_283
.LBB0_29:
        lea rcx, [rsp + 653] ; case 2: return 541;
        jmp     .LBB0_283
; ..............................
.LBB0_283:
        inc     byte ptr [rcx]
 inc     rbp
        cmp     rbp, 300000000
        je      .LBB0_20
 movzx   ecx, byte ptr [rbx + rbp]
        movsxd  rdx, dword ptr [rax + 4*rcx]
        add     rdx, rax
        mov     rcx, r13
        jmp rdx
```

This can tank the performance, for example if the branch predictor can't accurately predict which label you're going to access on the current iteration. In my example I'm generating random indexes and with `perf stat` I'm seeing almost 300 million branch misses (one for each `increment()` invocation).

Assuming that this change isn't an intentional trade-off made for a benefit in _some other usecases_, then this is a regression.

On my machine, the results of running that binary (same source code as the one in Godbolt) compiled with Clang 18 vs Clang 19 are as follows:

```
clang 18 = elapsed: 254ms sum: 28928
clang 19 = elapsed: 2813ms sum: 29184
```

So the binary generated by Clang 19 is about 11 times slower.

NOTE: The issue seems to be related to inlining, because if I add `__attribute__((noinline))` to the `increment()` function, then Clang 19 optimizes it with a lookup table, just like Clang 18, and the result is much faster than what I get with inlining allowed:

```
elapsed: 548ms sum: 28928
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyEVstu47gS_Rp6U4ghUZYtL7xwkpuLAI3pASabXgUlqmyxI5IakvKjv35AUnLsJD0jNKB2xDr1OnWK6Jzca6INK-9Z-TjDwbfGbqSWYuhwVpvmvHmGIzpojNR7cEYReHLeMf5UkxatQvvmwNIebRNOsGXmjtKLli0zODhAa_EMnTFvd0PvAHUDOzPoBnwrHYgW9Z5AaqipxYM0FnbWKHjoUO8hr-b5PANvpt_r-JvxKsCIwVrSHrwd9Bvja1ZsofW-d6zYMv7E-NPeNLXp_NzYPeNPvxh_MouXHz8auyCWbVm2vXFjei-V_EUOfIsearmHlAmgAwRhtPOofcxl6MFj3VFwFZHYMkv_0CmWbWF8OsL4tuLE-AOw8t7KHhi_h_m3BD6POHOFUrPykWXb-bf7--w15wl6AlLm8OsEANREoPrsCXpvI2J9iogWT3AHRUKZ7E7Gxvdol_PqFjRFlz7-fTS2uaCKhFoxvg3fE6rUIlrc-HcpI9skizzn6fSlJqlE3zX4lsD4liy0qJvg86axe9Jk0VMot6Mew_-hw5o62BkLhKKdWiLQUbAPRKAD2fN0jqhxILU3gBCKCkb_tkup1Ly6LfUXPRszXC5LVj4CK-6jf8gD5Sz5wWooy4IV91cwP1Uf35OT4t3h-rPDL5yVxY0zfu1skf-3s2A5_9fnqgTFbUhf9ll8ZIGt-ysbMcYQ_sofoMjG5zpMgusw46cLs8XvmV33t6xW5uBOzYW4zQ1xMRktAnEvIY8PNs014y2evhqHFInNiw8VDlafaP0SdQw1eNRvkeE92Z2xCrWIFI3UPaHqOwK5iydqi1q00FtqpPDGBnvGVx5QiCGwvjtPH-HYStGO7D6bgfGVJdhHNQ4kF4KcA5Nma5JE6cMcSaPn8KxBnS_-nxlfqWnOAoRF3RgFUjd0oiTPR-nbIOMhDXAefVDyZOiIghF2yjgfOgxKdp00espISefIBYU2mt6Hli0zqYUlRdozXgWxXmYg9cGIGCbj63mq5ta5QcXcggbf7Ag3lkiH8SYd7LADb7GhO7PbgcImuUSoSdNO-rBXXuPKSqIzOAqj5F5DW3xLOjmQQW8s7S05F2p2kSt1BoWilZpGA7Dkhs47MDuwg9aXQGup0Z5D3g4VgTODFQTCNBSWR5Q9Hdfc_9NKYnwNwqhedjQWfFpFYWdOoghoo_3OdJ05us8yxrKtmOxY8QjUYe-oCUrBy4Vy4AYVf1TrKPxiAv54uMqLq9PrvFp8JvpfJnE3pTppdQP1-T3gUMraDB7yHLxU5MB15kh2rOkf31_-Fzy8tKGfbqDAKOUCketQ3C4CegNSdzJUNyoCCRxcnJ3nOMBsmb2-ovdW1oOn19fIqEqbaESBXIlfPgX8Nfl2gxaJeiMVLjm83wOkT73B253PH-Dn4Dx08o0ufZuW0TtLQjHUIFrYofNkA1E0HANbnmFPI_SUKGBocezG5x5fdapcVJ_bejk7azZFsy7WOKNNvirWy1WZcz5rN9Wqyau6wEW23i2rqinLRV7jAldZVa85X8zkhme8zHi-zIqs5Hy-FBlSg6JZLNerbLVmi4wUym7edQcVblSz2MBNzlfFspxFfXLxGsl5pBnj4Rows5tgcFcPe8cWWSedd-8QXvqONn8aY6eqRzkI4zXu-aA_sXNheG4uC2F80Ca6XN_jZoPtNh_ugdK3Qz0XRjH-FJyPr7vemp8kPONPMZlwqR3zOWz4PwEAAP__bgFhjg">