<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/99835>99835</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [InstCombine] Performance degradation caused by more branch after commit b5a9361 and #66740
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          cyyself
      </td>
    </tr>
</table>

<pre>
    ## Statement

Commit b5a9361c90ca43c715780ab4f7422fbc9d3a067b and PR #66740 (commit a7f962c00745c8e28991379985fcd6b51ac0d671) caused more likely branches generated. Result in 37% severe performance degradation running [Verilator](https://github.com/verilator/verilator) generated C++ codes for [XiangShan](https://github.com/OpenXiangShan/XiangShan). As we can compare these two results: https://github.com/cyyself/llvm-project/commit/a2489e3a8d85c95ed0adab605311da1339550c7a#commitcomment-144413753 https://github.com/cyyself/llvm-project/commit/40529f8114e2139748ce075539573e36c2bb281b#commitcomment-144413665.

## Reduced reproducer

Look at the following code:

```cpp
struct a_struct {
    bool b_1, b_2, b_3;
    unsigned int cv_1, cv_2, cv_3;
    unsigned int cav_1, cav_2, cav_3;
    unsigned int d;
};

void some_func(a_struct &a) {
    if (a.b_1 & (8U == (0x1f & a.cv_1))) [[unlikely]] {
        a.d = a.cav_1;
    }
    else if (a.b_2 & (7U == (0x1f & a.cv_2))) [[unlikely]] {
        a.d = a.cav_2;
    }
    else if (a.b_3 & (6U == (0x1f & a.cv_3))) [[unlikely]] {
        a.d = a.cav_3;
    }
}
```

Compile: `clang++ -O3 -c test.cpp -S`

We will get this code on the x86-64 target on LLVM main (at least for commit da0c8b275564f814a53a5c19497669ae2d99538d):

```asm
_Z9some_funcR8a_struct: # @_Z9some_funcR8a_struct
        .cfi_startproc
# %bb.0:
        movl    4(%rdi), %eax
        andl    $31, %eax
        cmpl    $8, %eax
        jne     .LBB0_3
# %bb.1:
        cmpb    $0, (%rdi)
        jne     .LBB0_2
.LBB0_3:
        movl    8(%rdi), %eax
        andl    $31, %eax
        cmpl    $7, %eax
        jne     .LBB0_6
...
```

As we can see, if the condition `a.b_1` is false, LLVM will generate a conditional jump to the next one even when we have an unlikely hint. Even worse, these conditional jumps are likely to be taken, causing the CPU to flush the pipeline if the branch predictor isn't working for its small size.

But if we revert commit b5a9361c90ca43c715780ab4f7422fbc9d3a067b and PR #66740 (commit a7f962c00745c8e28991379985fcd6b51ac0d671) or directly use the LLVM release version < 16. We will get the code like this:

```asm
_Z9some_funcR8a_struct:                 # @_Z9some_funcR8a_struct
        .cfi_startproc
# %bb.0:
        movl    4(%rdi), %eax
        andl    $31, %eax
        cmpl    $8, %eax
        sete    %al
        testb   %al, (%rdi)
        jne     .LBB0_1
# %bb.2:
        movl    8(%rdi), %eax
        andl    $31, %eax
...
```

It reduced the number of branches by half and is very friendly for CPU branch predictors since there are no likely branches.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUV1-P4ygS_zTkpdQWBv_jIQ-T7mtppTntaEa7d7qXFoZywowNFuD05D79CTv_Ojvdu1KPTtrIcjBVFL-CXxWFDMFsLeKalBtSPqzkFHfOr9XhELDvVq3ThzVhnDAOX6KMOKCNhD4Q-mF537thMBHaUgpe5UpQJQuu6rysGyrboqsLxrpWCc0lreoWpNXw6TMQxquqLigQ1qjFhKw7UTFFaV2UqkHWCJHzWoim7JSu2jKXiuqqzgkToOQUUMPgPEJvvmF_gNZLq3YYYIsWvYyoM_iMYeojGAu8JqyEgHv0CCP6zvlBWoWgceulltE4C36y1tgtkHLzO3rTy-g8KR8Ia3YxjoHwD4Q9Eva4NXE3tZlyA2GP-7Pmi7a44IB7wjaEbUA5jQE659MM_zbSbr_spP3TGX4d0V602eNVW2TwIcAzgpIWlBtG6RHiDgNCfHbgZ_-TWXjD_nGvCXvs-_1wN3r3FVVMgnljCHuUrGgEctnoplSiRE2llm1FS57nWuaci7KkqpaE8WVMeqONd3lRFDmvS_6--QtaMtE1eV4gy7moi0YhrcuSi7LmyCvF2pY1efvK_FVVZtecPfL5M-pJoQaPo3ep6a-VPjr3DWRMqwmd63v3nKiRtjB5cW2tosujxnHpCdFPKoJ8OjZIvVkEAACtcz20Tzlh99A-seWPE36lMtk5JjUYG0HtF121X5TV_k1teVKXJ3355gB9FpH64dKe33tnNAQ34FM3WUVYc_GIVTJx_IVnpkvRLLP2KU8K6aP5DQh_IPwhfdDveTcLZLY4JZYH5tSzmewSySkeyoeXptNPZjoZS6NnJ69dStjPH9gHvALDTmDq18Gw94FhfxkMP4GpXgfD3weG_xjMpXHi600aH02fqA2Jyb2022PSuvuVw52CiCFmahzh7svN0H8hPJu-hy2mYDFhjhFwdo6c7011VxUQpU9iZ-Hjx9__CYM0dl6RCD3KEOeceDwGtKSqaVldllXRNXkhSy5LlYtC1FUlJDItRMkbnVbolUCUYVh6nv4jzvT9fGbv7CTjQAr6isLRqMhUZ55ClD6O3qlz8gDCyrbN6NX8YnD7nlBRENYQVnpt5h28T6oov5_VpNVJjbCC5z8Qq2E8ipsfSL9aTKA-bjb0id-iya_RqGFsFzt0sXMF6kfW2NJ7Mv1Hv5qf5Vf9J35VRyRZ9gZbL2deQEwGTTezTTmrzXyUJxakREQqCiZAJ_swK870O9J1OZ5BXobJHr5OwwjRzeYsfk-cRcA9WnjepRfCTu4RpIVTVMLO2JjBP2Yd55d5ljP41nAAeSlYooMWIcpvaJc8PYV0wqSJ7z_9lsRdP4Xd3DGaEXtj8eToUuzA6FEbFZ0HEyxhdUwAviUrKaBMDBAG2fcQzH_xxQm4mWIy9YzgU0kUT8H3fy7jnAdtPKrYH2AKc-Wy7JDHlBcQ9ujDvJ38HvIqg5e5BpdUk9ZzzjzvyAe3v79lfggYcZaWsj93psTdnnv_SjLIbx1hPzkhvB3cv0Twx8psDsJpaNGD6y4VfnuAney7mYQmJJIcoPMGre4PM_NT_NyGSIBgUsEfd-kKkOLQutu7Q7bSa64FF3KF67xmecUoLavVbq2qTtctpRx11bCaCkUbTVFqFKKgDV2ZNaOsoDVjlBc1rzKpaFm0rVaFYLrLFSkoDtL0WSpyM-e3KxPChGshGl6uetliH-ZLGGMWn2EWEsbSncyv58K4nbaBFLQ3IYaLlWhiP9_efrEh3ruhNRZTlfDplUvO8erUHpbb03GZZBfR32SBeX1PEb6afL9-o4hPeP5Ywc9eBMIeFy_3a_a_AAAA___oESol">