<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/107239>107239</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Clang should lower unions to LLVM as byte array, not structure type
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
mkuron
</td>
</tr>
</table>
<pre>
Consider a C++ union like this:
```c++
struct S1 {
char s;
int i;
};
struct S2 {
int a;
int b;
};
union U {
S1 s1;
S2 s2;
};
```
Clang lowers it to LLVM-IR like this as LLVM has no concept of unions:
```llvm
%union.U = type { %struct.S1 }
%struct.S1 = type { i8, i32 }
```
This is dangerous whenever one of the structs contains alignment padding that doesn't coincide with the other structs' padding. In the above example, LLVM's perspective is that the second, third, and fourth byte of the union are unused, but if the union happens to contain an `S2`, they are in fact used. Whenever something in LLVM iterates over structure members, there is thus a danger of data loss. This has come up before in various contexts (#53710, #64081, #76017, probably others that I haven't seen) and the workaround tends to be to make LLVM locally reinterpret the structure type as an opaque byte array or fix up its size. This does not scale, however, and there are likely more places that need this workaround but don't have it yet.
@Artem-B pointed out on #53710 that the documentation says
> If it sees an aggregate, it assumes that loading/storing all fields of an aggregate is equivalent of loading/storing complete aggregate.
>
> From the LLVM docs for load https://llvm.org/docs/LangRef.html#id211 :
>
> > If the value being loaded is of aggregate type, the bytes that correspond to padding may be accessed but are ignored, because it is impossible to observe padding from the loaded aggregate value.
>
> For store it says: https://llvm.org/docs/LangRef.html#id216
>
> > If <value> is of aggregate type, padding is filled with undef.
and suggested I open this issue. Clang's way of lowering unions as structure types thus seems to be in violation of the assumptions that LLVM makes about structure types. Instead of applying workarounds throughout LLVM, the preferable solution would thus be to change Clang's union lowering to use byte arrays instead of structure types. The above example would then be represented as
```llvm
%union.U = type { [8 x i8] }
```
which does not entice LLVM to make any assumptions that certain bytes might be padding.
The code in Clang that is responsible for lowering unions is in `CGRecordLowering::lowerUnion`: https://github.com/llvm/llvm-project/blob/4497ec293a6e745be817dc88027169bd5e4f7246/clang/lib/CodeGen/CGRecordLayoutBuilder.cpp#L313-L378
It even says that the complicated heuristic for deciding which one of the union's members the type should be based on is unnecessary. It already contains some fallback paths where it generates an opaque byte array if it can't identify an appropriate member.
Pinging the people involved in #64081 and #76017: @jacobly0, @nikic, @efriedma-quic, @ivafanas.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycV11v2zoS_TXKyyCGTNmW_eCHfKwvAmSBRXu7-zwiRxZvKFIlKafaX78YUv5Im_vQBQpXVsjhnJlzztAYgj5aon2xfizWz3c4xs75ff82emfvGqem_ZOzQSvygPBUiMdCPMJotbNg9BtB7HQoqoeifC7Kh2JT5n8yL8xvQ_SjjPB1CUU9vwKQHXoIRXV5oW0Effle1M-X5_N-cbufl-NP25tPt-dkv91u_rqEsLzZ_FVAEJ9uviDKX58M2iMY904-gI4QHby-_vuf9y9frtUADOkldBjAOpDOShoiuDbX7ZNyGXPq51dinRYtvkFRPUOcBuLEoRDrXIZFKuPzZfXN29v1eluIJ9CVuFn8EcqfnKoOoNAeybsxwHtHlk7kwVnibGNHkMMHBhFR2wBo9NH2ZCMMqJS2R4gdRlCOgi1EHUE6baVWBO86dimGix35c6RC1OedC3ixaQE27kRAP7AfDHHeXL5C1AEG8mEgGfWJONd0VEqLpLOKl8ZO-_SAVkHrRh87aKZ4AZC7j56fxkBpaTNG0Ld_7nAYyAZu5wwU0EKxKb8Krlc6hqYURVtoUUbgWAv4z7lkwfUUOy6Htrn7OpLHSAHc6YJ-9AQ99Q35MAf1M64xAM6t4MwVRgTjQlhA6hNTSbqeYBygodblRE7oNTeOc6YfMUAhtoWo1lW9TEkXotqsyu1yfq435bLm58G7Bhsz5c7MZX2BDk-UexiIH3applyld-ff0LuRv5JVqVAN8WePb5TxGifRmAk8aRvJD57iDYMYeSInBi6tG_D7SLlR6D1O4Dy0-gfj0zFA0P-lGTozC6yLECRmdnTunWt-bnquIveGJWgm6Lk8g0FJMzRLpLI0b4AwCZTLeBk5y3miuJjFkj9X5YOP1N8_wuAYlgI3RnAWzmW-UlI5ObIuMDKjAk5hjlH9A15ajh6IEng8Hj0dMSYwOgKGMPbnXI1D1kYhDiE6z4RCY6DVZFRgZtzuZ-rQ91Gf0LAgXfvJbulYVFzm867FJa9rggfv-oQitVI5GaB1PoWDLsYheZY4FOLAVrVwno_gZYU4vKI9fqF20cXeFKLSSizZjB4-OWauBR90QsMEIJ38FBUpRsMAL-iYL7NMElPmCknnPYXBcevdxYV6nJiSKCWFQLm9Sa9H6_wse5I4htRotr5-cCHoxiQeuyaQP9ElXHsuyJzbNauU-OIG07WIjnWetBkzAaqH_6d6m8-Cz6UrqqeUAH_9u3qdMegArTaGVLbi0SpqP9Cb1RPG45ECM_sF3EA260SHMNIC0rhLRvzOEm3z5OPYeZKxmj_qe_ayQNSfXYKNSjuTdTG7cqL8EFOM1NTEOzaTwNNgjD-H5VkRIqFKkIfBTJzFVc4cxrvx2PHePD4ycQZPLXnkNgdnxpTEuxuNyolmF5MdO-8N3PmCc0YbHfv9jV0F0Nd0fkn1z59n2uVEsnyip8FToOQnGH7_LrB-3MIPHvHr578d7--dlt3VPMlGLWd5n20b7fRrIyT5NP-y4Hp97CKnfB7Zt_RhmNKp1OF8MUoRdICszyyu7CMfacMMS_P16Y8vJJ1Xr_MCFkr1kJZ_46UM5xcRHXXsxmYhXT8rav7vfvDuL5KxEIfGuKYQh9VqV5MUuwo3VK_WDW2XtZLbbSnq5WbXqDWt2lqsNoU4yNz7g9G88ckp-oNn4OGSIU5ujI-jNor8Qg5DIarXalndv1b1NtfjJQKdKHv_dS4kA9YSudkdjV6HqGWqiiKpk1Bzr25uXalMiYjzbSG9TQQIXaJSQ9Ag-5yzXM3RWmLjQz8t4CUCGk-opuu1je8n0KIxDco3GDB26b6XvepIdr6rfDqadZpeEvOs1Iq51E5pEg2Dd4PX7D450w8E-Ze2x3xDJBjIsRC0PTlzYrO3l-tJmuKX-0n1AMWq_Aula8yUrzGr0uo3Lednar0m1eP99_HyTp-wRYthcaf2ldpVO7yj_bIW69V6vSvLu26_Xi1FKTdSIK2bst4uaVcpImrWVU2bnbrTe1GKVbkrV8tquV1uFzU2691y01RVVUksGz66R20WZxO_Sy65X5a1qHZ3BhsyIf2OEmJmk-CfVH6fyNmMx1CsSqNDDNcQUUdD-6yeubWJ_WelzD8w2GivHWHQ6Ub0wXjuRm_2v62UhIEH0QzjtBf_CwAA__81IaBY">