<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55199>55199</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
AArch64: 128b LSE atomic instructions not generated
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sebpop
</td>
</tr>
</table>
<pre>
Testcase https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AArch64/arm64-atomic-128.ll compiled with -mattr=+lse shows that several cases are not generating LSE instructions.
For fetch_and_max https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AArch64/arm64-atomic-128.ll#L347 LLVM generates armv8 ld/st exclusives `ldaxp`/`stlxp` which are an order of magnitude slower than LSE instructions on Graviton processors Neoverse-N1:
```
ldaxp x9, x8, [x0]
cmp x9, x2
cset w10, hi
cmp x8, x3
cset w11, gt
csel w10, w10, w11, eq
cmp w10, #0
csel x10, x8, x3, ne
csel x11, x9, x2, ne
stlxp w12, x11, x10, [x0]
```
This bug impacts performance of Rust programs for which LLVM generates code without LSE instructions.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzFU0tzmzAQ_jVw2bEHC_w6cHCSJhc3hzbTa0aIBdQRiGqFTf99FzlOnDg99FRGw6707Uufdgtb_s6fkLyShNB431OU7iJxz6vWvhmKubItb4w5nMWsd_YnKs_bwtiCRSt192bjORyLW1viA07nu51TzSpjTbp2lc2kt61Ws4XYzI0Bjt9rgyUcOR_MWum9i9K7SNwYLokaeyTwjfRAeEAn2YFLJZAOobMeauz41Ouuhv33L6A78m5QXtuO5lFyFyW7e-ugQq-aZ9mVz60c_8s9I5Hu02wN-_2Pr-eiwzXawwZMyU7kAUdlBtIHBqJVYko59iynMlcJeRN2cGy0asL9ZQfWlejAVtDKutN-KJkyY498xpx1V5SA7eDByYP2rPD9FBJZR_CIlsklnD0uJl4Cb1Pm0wpb4C9UNCnjNhK3MG6mf7S8GZNoefdmptpLI3EBEPpJHhfJBDX62ieEHNPPfBYTVPt3kLkI9yqCIf66Cv5iwE-RXAcZT-BrAfzv8DOzEP31bh_MwiudcgXsbP6S-R1VHwh-ajRBMdSg214qT9Cjq6xrZadweuFvA3cIv1ntZEvAyEsnfOgoxf0YZskO_m8jEZd5Wm7TrYy99gbzc-emO-BuLYLbqXvfd8_FwGEZD87k_zxKmmhAYmW5XGy3cZNni1QkW5lWG1kWy2pRFlKsEoFVtlIqy0RsZIGGcuYuEqLDI4QQrDORsc5FIkSSia1IU7Fcz5eyKhKViaxK1qt1VkRZgjy3Zj7VMbeujl0eSmKmiUGjydMbKIl03SGGdBxfDkykywmL3vZxyJyHyv8A0RebrA">