<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/81978>81978</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64]: 128-bit Sequentially Consistent load allows reordering before prior store when armv8 and armv8.4 implementations are Mixed
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
lukeg101
</td>
</tr>
</table>
<pre>
Consider the following litmus test:
```
C SB
{ int128_t *x = 0; int128_t *y = 0}
P0 (_Atomic __int128 *x, _Atomic __int128 *y) {
atomic_store_explicit(x,1,memory_order_seq_cst);
__int128 r0 = atomic_load_explicit(y,memory_order_seq_cst);
}
P1 (_Atomic __int128 *x, _Atomic __int128 *y) {
atomic_store_explicit(y,1,memory_order_seq_cst);
__int128 r0 = atomic_load_explicit(x,memory_order_seq_cst);
}
exists (P0:r0=0 /\ P1:r0 = 0)
```
where `P0:r0 = 0` means thread `P0`, local variable `r0` has value `0`.
Building either P0/P1 for v8-a or v8.4 passes as expected. When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are:
```
{ P0:r0=0; P1:r0=1; }
{ P0:r0=1; P1:r0=0; }
{ P0:r0=1; P1:r0=1; }
```
However when compiling `P0`, to target armv8.4-a (https://godbolt.org/z/dxTrbGxoG) using clang trunk (`dmb ish; stp; dmb ish; ldp; dmb ish`), compiling the `store` on `P1` to target armv8.0-a using clang (`ldaxp;stlxp;cbnz` loop), and the `load` on `P1` to target armv8.4-a (`ldp;dmb ish`) using clang. When compiled the assembly is as follows:
```
P0:
MOV X6, #1
DMB ISH
STP X5, X6 [reg_containing_x]
DMB ISH
LDP X2, X1, [reg_containing_y]
DMB ISH
P1:
loop:
LDAXP X5, X6, [reg_containing_y]
STLXP W4, X5, X6, [reg_containing_y]
CBNZ W4, loop
LDP X2, X1, [reg_containing_x]
DMB ISH
```
The compiled program has the following outcomes when simulated under the AArch64 model (rename P0:X1 to P0:r0 and P1:X1 to P1:r0 to match with source outcomes):
```
{ P0:r0=0; P1:r0=0; } <--- Forbidden by source model, bug!
{ P0:r0=0; P1:r0=1; }
{ P0:r0=1; P1:r0=0; }
{ P0:r0=1; P1:r0=1; }
```
which is due to the fact the effects of `LDP` on `P1` can be reordered before the effects of `STLXP` on `P1` since there is no leading `DMB` barrier to prevent the reordering.
Since there is no barrier, we propose to fix the bug by adding said barrier before `LDP`:
```
DMB ISH; LDP; DMB ISH
```
Which prevents the buggy outcome under the AArch64 memory model.
Besides using a `DMB`, it is feasible to use `LDAR` - making the SC `LDP` stronger so it no longer reorders with `STLXP.` Note it is also possible to make `STLXP` stronger by adding a `DMB` after the loop (but that's not what is done for other atomic sizes)
I have validated this bug whilst discussing with Wilco from Arm's compiler teams.
This bug would not have been caught in normal execution, but only when multiple implementations are mixed together.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMV91v47gR_2vol4ENibZl58EP_oDvDti9Gs2iG_TFoMSxxC4l-kgqtvevL4aiEjmXbLdACxwQmBE13x-_GQnnVNkgrth8w-a7kWh9ZexKt9-wTJN0lBt5W21N45REC75COBmtzUU1JWjl69aBR-fZdM2SHUvWLEviX3jcwuMmvuh-FxtQjU_58uiB8fUV2HQHCZveX9_i9WI3ZD4kwPjyuPamVgUcjx1HEMP4Ft57cWP8Adgi2gAiUBydNxaPeD1rVSjP-JL4U8a3NdbG3o7GSrRHh38cC-cZf2DTXsCrcJsEG6NEbYQcCrz9hLC33qX_L-9u_2vvrv-9d3hVzjvy8JCw6dombLqjdO7ZfAuHNFzFpPOHu4q5L6hLhRaBZUkUE3myBGoUjQNfWRQyEmQJRU6bQmh4FlaJXAdeGxgq4eBZ6DZc0c1kqHfTKi2pylH5Ci0cEsb3hxROxsLzciwgnJMZnIVz6EA4wOsZC49yAl8rbMCputXCkwxfqa5PoG36Ptoyvt8yvmF8A7WRqOFkTQ3KO1CN8kpocF54JBeI3rS-MDWCOQFesWi9Mg2oJryL0S20aB2CcmRlrqTEBvJboHCmtQV2iibwpUIQ1MYoe7kOhMWP2pjadpA3ateYMzbdpfT4mu470vSeNPl50jdS39gTfn81F3xGCxeKdmHqs9IU7GHuvQEvbIkehK0pXWNBNVh5f3bkLN8zvi-NzI32E2NLxvffGd_L6xeb_3I1v1CDtY6kFlpQIm3bfCMJLEtknYNyFdnp_JmOwY2WdzdZKGu-HZhJWWFZErqVytE0wfKU_n9rdjIWd2Z0BmgprqTFeR3OIm--E7c25hzViUb2iqiJ_5OeGJ4gmyTemT-0IJZ45w12OqgP6lzfqP6Ei3PCfVRSIfE97nz-2z_gKSODGZ-m_e3u8wZ-e_w1Pj5-OcDTnGieMmDzjcXyWJjGC9Wopjxe2Xz3PiN82h3giQfONOj4E_ONzXfwPncPzi_Whui-mv5pt356Nexj8T3945dPTwf4Ogv0P8213fz-z8gU9P-0Zx-G5d2OIlx4yenZmtKKOuDk_eB_gYzLAOdQDsBtvbZFlc0isDG-tNiIGrt2f0qp8HoApxIN4Y3XcRZ4A7XwRQUX5asevnrFYdCsfzAmfghYPQoBm27H4zHsh2A5BEoKad6WjKd_TRi8VKqoqN1ki6GXKUui8N1MOJ2w8I7mBcuST7vD2-YvRAM5gsUww1FCjidj8R3mULNv2Z1qikBsw8RpDGgUMgLw7vOGaHJhraKSMHC2-IxNZ1pUqZrybuQ-_kli5Kc8XJAK8mxc8PSkrkFS3paUNCGDYieUfNEZvXnx_SMg6rtiuqF2ouNHffI1hDw643obytvLeH6nB8KmFEfv3YqBtFW7iKviNW7kr_JhjqNwirYWb4BGe_Bm_XeK7Rhq8a0fJI_bQZKdt6Yp0YIzJIYy0z3HuLuup_rETojnd-Mx6hTaGTgb96K4Ft_wrgxeFLyGfmA9iJOPISCsou7PW0q88IwvKK8eLpUIuqRpMKxUJixZ3b4JTn3venwQrN-gEs9IC5uSAWzCTkX5v1RKOw9SuaJ1IZTBva9KF6bbqda2DpojtFnwKGo3uUe-XppptQw2Bn050pgTbVl5WrYaY2uhXxewDiE8mEbfOjSsW-3VWSOo-qyxxsYLIgzrFdTqSpabEsndOwO635FcTeXD9EGMcJUukmWyyLJsOqpWOJ9NF_wkxDJL-APnJ8T8NBOiWCzzZbHIR2rFEz5LeJolD7MZn06SxTxHfpJFIWVWzFI2S7AWSk-0fq5p1xkp51pcLdOHxXKkRY7aha9Azhu8QHjJOKePQrsinnHelo7NEk275qsUr7wOn4-x3mngTNeQ8uU4Vx4e8Y8WG1pn9Q3Ch6TzhAO0jnQ7qBvgQd-1Z6uMhbAbdWEN-0mYFXFTeTe-nym-o9bq1ZsNT_mqzSeFqRnfk-XxGJ-t-RcWnvF98Ncxvg_x-HcAAAD__77chsY">