<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [AArch64] Under -O0, atomicrmw contains an extra store in the ldaxr/stlxr loop"
href="https://bugs.llvm.org/show_bug.cgi?id=48017">48017</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[AArch64] Under -O0, atomicrmw contains an extra store in the ldaxr/stlxr loop
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: AArch64
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>rofirrim@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=24112" name="attach_24112" title="LLVM IR snippet at -O0 (slightly simplified)">attachment 24112</a> <a href="attachment.cgi?id=24112&action=edit" title="LLVM IR snippet at -O0 (slightly simplified)">[details]</a></span>
LLVM IR snippet at -O0 (slightly simplified)
The following C++ snippet compiled under -O0
#include <atomic>
std::atomic<int> _value(0);
void foo() { _value += 1; }
generates the attached IR (slightly simplified). That IR is emitted under -O0
with the usual ldaxr/stlxr loop.
$ llc -O0 -mtriple aarch64 -o - myatomic.ll
...
.LBB0_1: // %atomicrmw.start
// =>This Inner Loop Header: Depth=1
ldr x10, [sp, #16] // 8-byte Folded Reload
ldr w9, [sp, #24] // 4-byte Folded Reload
ldaxr w8, [x10]
// kill: def $x8 killed $w8
// kill: def $w8 killed $w8 killed $x8
str w8, [sp, #12] // 4-byte Folded Spill (!!!)
add w9, w8, w9
stlxr w8, w9, [x10]
cbnz w8, .LBB0_1
...
When using this code in a ThunderX machine, this loop hangs.
That extra `str` instruction (which looks like a side-effect of the register
allocator) seems to make the exclusive access be lost and the code loops
forever. This might be fallout from the recent rewrite of RegAllocFast.
Now, this is odd because:
- That store accesses the stack while x10 is a global address, so they are far
enough that that str shouldn't make the exclusive access be lost.
- This problem doesn't happen in all aarch64 implementations: Raspberry Pi 4
or A64FX are unaffected. We have only been able to reproduce this reliably on a
ThunderX machine.
So to be honest I'm not sure if:
- This is a bug of that ThunderX.
- This is a bug in LLVM.
For the latter case, the Armv8-A spec (Issue E.a of the document) says in
§B2.9.5 that:
"LoadExcl / StoreExcl loops are guaranteed to make forward progress only if,
for any LoadExcl / StoreExcl loop within a single thread of execution, the
software meets all of the following conditions:
1. Between the Load-Exclusive and the Store-Exclusive, there are no explicit
memory accesses, preloads, direct or indirect System register writes, address
translation instructions, cache or TLB maintenance instructions, exception
generating instructions, exception returns, or indirect branches"
This could suggest that that store better not be inside that loop if we want to
guarantee progress in all aarch64 implementations. However I'm no expert in
this area and perhaps that loop is OK and we're observing a problem in our
particular aarch64 implementation.
clang/llvm 11.0 is unaffected.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>