<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60787>60787</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Optimizer emits different code for same C instructions
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          boazsegev
      </td>
    </tr>
</table>

<pre>
    I wrote the exact same C instructions in 4 semantically equal ways. I expected the final emitted code to be the same but it wasn't.

See: https://godbolt.org/z/voK7fdM5z

### Expected behavior:

1. All functions would be reduced to the exact same assembly code and therefore only one of the `static` functions would be emitted (code reduction for static functions).

2. Array math operations will be vectorized by the SLP Vectorizer.

### Actual behavior:

1. Each function is reduced to slightly different assembly code and 3 different functions are emitted.

2. SLP vectorization is missing.

I believe this shows both an issue with the instruction tree and an issue with the SLP vectorizer.

### Details

I wrapped the same C code in 4 different ways.

The code performs mathematical operations on a 4x4 `uint32_t` array (matrix). i.e. an array of 4 arrays each containing 4 `uint32_t` members.

Once I wrote the code using a `union` with a single `uint32_t[4]` member (`fio_u32x4`).

Once I wrote the code using a `union` with additional type arrays (`fio_u128` included: `uint8_t[16]; uint16_t[8]; uint32_t[4]; uint64_t[2]`).

Once I wrote the code using the code with the `fio_u128` union as an explicit array (`fio_u128[4]`).

Once I wrote the code using the code with the `fio_u128` union array wrapped in a type (`fio_u512` which included `fio_u128[4]`).

There are inlined functions that perform exactly the same loop (`for(i=0; i<4; ++i)`) and the same mathematical operations in the same order. They also perform `memcpy` using the same loops (though some might use 4 loops of 16 byte memcpy vs a single 64 byte memcpy due to the semantic differences).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Vluv4jYQ_jXmZbRR4lyABx44exZp1VZbaVd9XTnJhEzl2KntADm_vrJDIJyL1H2ohCCM5_LNNxdHWEtHhbhj-RPLn1dicK02u1KLF4tHPK1KXY-7r3A22iG4FgEvonJgRYfwGUhZZ4bKkVYWSEEGFjuhHFVCyhHwn0FIOIvRRvAV8NJj5bAObhpSQgJ25Lyk0jWC01BOMYL3cnBADs7CKsbXLmLxM4v30_d3RJbuoXWutyzdM35g_HDUdamli7Q5Mn54Yfxw0r-tm_qP_GVpy3g6feDLDKjEVpxIG-9qoZlEsJcSmkFdMzzrQXptMFgPlc9Ev-ZEWItdKccpJaFCtgYbbRC0kiNohaCbYMaK2DrhqGJF_F6UmR3GN8FbiOp1oNEGJtO7HePbB454BHtjxAidcC3oHo24-icpvfsTVk4bevEEjAHQ99__hL9mqYneZ21fOV_Vjzn7Iqr2BgvILtmyko6tkyPU1DRoULl3GEsXp3dahLkx8jpPj3vORsxRO7KW1PFB9yuUKAlPvs3Igm312UKpXQvC29gB4UyuDWQsehucwQnaW7Vl7A85e0YnSNpHKGcj-v46Dtd5ChSEObozEMZnafmjxUmxR9No09lQYexEGLtlpbUCAdkl8502kHIp_-l8r4nQF4xvOuEMXXznAEUY-fSmM91ANj1aQF_PSisnSJE6wht3HXYlmkeQ31SFsNwbAfHgKwIiOFCklbcOTArwJxIfXOdPGcuf7xE8YlbEDemfQ8ovGSvi103_q2HrmjxVQoIbe5wzXsRJwiOQquRQY-23zhXiJiBMCg8xfQIvSoog2yxEy0SuoiILIj7l9isZ3P7euu8VzJAdCOsLiZdeUkXuXu2l8o3b_yN-CDi3N_kmDOwuIOQJD0VoqWpv5MJ_AvjD79OwDUhJUlgvdoRrhZvHYtrKcrzPl9S6n0Fow_iGWPoc-7IQSz9n_oHxJ8afiPHtFHne4JODj-aM1F1HmxpNBD9aHEFIq29wWBF32FX9GIi68XlDFtrOtXo4tmC1j-ZXJQwWIbsq6AaSAsrRIUyu4GTvo1NkD0f1gPP9NF_Kt61SYbguVvUurbfpVqxwlxTrItvm2zxftbtNnq5jXmIscL2t4my7TrfrGissNgUWdbKiHY95GvMk50myTfKoSTdFXpRlXGxEvhEpy2LsBMlIylPnr-RV2Ju7Il5v1ispSpR2fukwO6_0qRyOlmWxJOvs3cyRk7j71jvq_IINV4BdrMfQj-E-fPtKshqM3L16SSDXDmVU6Y7xgw9y_fnUG_03Vo7xQwBqGT8ErP8GAAD__4eN9bo">