[llvm-bugs] [Bug 48017] New: [AArch64] Under -O0, atomicrmw contains an extra store in the ldaxr/stlxr loop
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Oct 30 02:06:31 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=48017
Bug ID: 48017
Summary: [AArch64] Under -O0, atomicrmw contains an extra store
in the ldaxr/stlxr loop
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: AArch64
Assignee: unassignedbugs at nondot.org
Reporter: rofirrim at gmail.com
CC: arnaud.degrandmaison at arm.com,
llvm-bugs at lists.llvm.org, smithp352 at googlemail.com,
Ties.Stuij at arm.com
Created attachment 24112
--> https://bugs.llvm.org/attachment.cgi?id=24112&action=edit
LLVM IR snippet at -O0 (slightly simplified)
The following C++ snippet compiled under -O0
#include <atomic>
std::atomic<int> _value(0);
void foo() { _value += 1; }
generates the attached IR (slightly simplified). That IR is emitted under -O0
with the usual ldaxr/stlxr loop.
$ llc -O0 -mtriple aarch64 -o - myatomic.ll
...
.LBB0_1: // %atomicrmw.start
// =>This Inner Loop Header: Depth=1
ldr x10, [sp, #16] // 8-byte Folded Reload
ldr w9, [sp, #24] // 4-byte Folded Reload
ldaxr w8, [x10]
// kill: def $x8 killed $w8
// kill: def $w8 killed $w8 killed $x8
str w8, [sp, #12] // 4-byte Folded Spill (!!!)
add w9, w8, w9
stlxr w8, w9, [x10]
cbnz w8, .LBB0_1
...
When using this code in a ThunderX machine, this loop hangs.
That extra `str` instruction (which looks like a side-effect of the register
allocator) seems to make the exclusive access be lost and the code loops
forever. This might be fallout from the recent rewrite of RegAllocFast.
Now, this is odd because:
- That store accesses the stack while x10 is a global address, so they are far
enough that that str shouldn't make the exclusive access be lost.
- This problem doesn't happen in all aarch64 implementations: Raspberry Pi 4
or A64FX are unaffected. We have only been able to reproduce this reliably on a
ThunderX machine.
So to be honest I'm not sure if:
- This is a bug of that ThunderX.
- This is a bug in LLVM.
For the latter case, the Armv8-A spec (Issue E.a of the document) says in
§B2.9.5 that:
"LoadExcl / StoreExcl loops are guaranteed to make forward progress only if,
for any LoadExcl / StoreExcl loop within a single thread of execution, the
software meets all of the following conditions:
1. Between the Load-Exclusive and the Store-Exclusive, there are no explicit
memory accesses, preloads, direct or indirect System register writes, address
translation instructions, cache or TLB maintenance instructions, exception
generating instructions, exception returns, or indirect branches"
This could suggest that that store better not be inside that loop if we want to
guarantee progress in all aarch64 implementations. However I'm no expert in
this area and perhaps that loop is OK and we're observing a problem in our
particular aarch64 implementation.
clang/llvm 11.0 is unaffected.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20201030/514e0b59/attachment.html>
More information about the llvm-bugs
mailing list