<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/64081>64081</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            SROA conflicts with the way clang lowers unions
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jacobly0
      </td>
    </tr>
</table>

<pre>
    ```c
int puts(const char *);
static void f(int cond) {
    union {
        struct {
            unsigned _BitInt(6) a;
            char b;
        } a;
        char b[2];
    } u;
    char a[2];
    __builtin_memcpy(&u.b, "a", 2);
    if (cond) {
        u.a.a = '!';
        u.a.b = 0;
 }
    __builtin_memcpy(a, &u, 2);
    puts(a);
}
int main() {
 f(0);
}
```
```
$ clang --version
clang version 16.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/16/bin
Configuration file: /etc/clang/clang.cfg
$ clang -O0 repro.c; ./a.out
a
$ clang -O1 repro.c; ./a.out
!
```
[godbolt](https://godbolt.org/z/edjhfT1qo)

I initially reduced to a simpler LLVM IR repro for this bug, but it's not clear to me if that is a bug in SROA or just invalid IR, so I backported it to C to demonstrate that clang is able to produce equivalent IR.

The issue appears to be that clang selects the union member with the largest alignment as the representation for the union, which in this case is arbitrary since they have the same alignment, and indeed swapping the order of the union fields avoids the bug.  Clang then uses this as the type of the `alloca`, which SROA takes to mean that the top two bits of `u.a.a` are not meaningful, when the source language says that those bits are perfectly meaningful when accessing `u.b[0]`.

Since this was reduced from another language, I can either change the way unions are lowered in that compiler (which currently uses the same method as clang, and I'm not sure what a better way would look like), or wait for an LLVM release that doesn't miscompile when lowering to the current IR.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8Vl1v4rAS_TXmZVQUDAR44KEtQkLaq5W6q_taTZwhcevYWX-U5f76q7Gz_dhWW1Uptc8cnzkz44Ah6M4S7cX6TqwPM0yxd37_hMo15lrNGtde96Kuyq8S1UFUt9pGGFMMQm6VsyGC6tGDkLdC7sTyroBCxKgVvDjdwlnILQcpZ1shdyA2EwgAIFnt7Mcl_gnRJxU_r5eQrLqFxzsdTzYKua2ZF19Pf4_O6ppPW2Jz-CJgAq_vpFgfPuwyPn1YyVj8Cvv42CRtoraPAw1qvAq5FbJO80bIexBSopCSP8r3jnGgPkNx9bNPOfE5zhHE8gBCboRc8PPvDBjTZEz1tic2h3-qwyKsTl-qmoqN79dfGbmwA2qbc3yvmatefRny2lBf_ytXoAzaDm5uXsgH7WzZKIvTEizqeTWvy85P9B1FsbyF39v6sV7djOrGaJt-33Q2TZDeE7YwuJYMA0cX9O-ydbIhojHUHrTnLSGPKXghj0Y3_DQvg5DHRS3ksdGTlntnz7pLHiNrOWtDUyRFJeQxS_3zd67O3afUvlfgafRursTyDuZCHnHuUiw4_Axf_APOnfClk-u7zrWNM5EbVG77GMcglrdCHoU8Tltz51np_1h7-9Sffy5-Oa5aYSgGgbY6ajTmCp7apKiF6AAh6GE05OHbt__-B04PRSKcnYfY6wBN6rifmhRBRyE3AayLoAyh5_iBuN9jjxF0AGQ4aAs_Hr7fgvPwlEIEbV_Q6BZOD8wUHJygQfU8Oh-pBR2Z554fLQ18F3mMVCiLcUzcGGLE6B1LB_qV9AsashFOD_P3ef7sCXQIiQDHkdAHDms-8AUypGKA2NN0cw00NOThomOfVw33YoiARnd24FOwwNkcCmTj1DPZpYmFk7v0WvVsQPZOYaCs3jc6evRXCNoq1kJX6PElf4KAA72dxCxoW9C2JWohXHActe0y0vmWPLjzO-VnTaYNgHxHF4lN6uYA9znT2JOFFCgUPVMS8TrSHxZRV2iMU8gN95pALl_EZwqlxmiLfznajRAvDhodA7OIuspXmqgrQE-5PThC2-6cTOEkWzJ1ySt213YJO878Gv4Qu0CFkjlG8mdS0VzfMRUaVIpCYEPyuXzLVzwYdfWhC35MPusAFwyvDX_2bgC0LvbkX2WwxBMotEA6b6gebVdqc8FrMbroMu5Cnnt28kO5YdQ8PEJui3MqeU-WlU-2T_UdKPau5QpM10qp8knIzZAtC8kTXJgUoaEYuR3xCheXTAvGuWcw-pl4qOU9T9YFdcz9h7aMridD3G9ZWOsoWCE3EQYdJpXFwJxCbiiX1U2CeYxm7X7Z7pY7nNF-Ue8qKddyt5j1-8V2VS1Xcquw2q3bhVrSrq2pblu5aprlrprpvazkstrI1WJXLZaL-XZdSbWh85p2Z7Vbb8WqogG1mfNFzLfVLM_ovl5V28XMYEMm5O8vUlq6lAHm9-v6MPN7jrlpUhfEqjI6xPDGEnU0tM_tqpw9G81z_TrGbGAZ-Zx1mEo5S97s_7pHdexTM1dueHtZ5GNH755IRSGPWVMQ8pg1_z8AAP__QDPrSQ">