[llvm-bugs] [Bug 46719] New: C++11 atomic exchange operation compiles into AArch64 store instruction

Tue Jul 14 22:58:44 PDT 2020

https://bugs.llvm.org/show_bug.cgi?id=46719

            Bug ID: 46719
           Summary: C++11 atomic exchange operation compiles into AArch64
                    store instruction
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: sunghwan.lee at sf.snu.ac.kr
                CC: llvm-bugs at lists.llvm.org

clang++ compiles a C++11 atomic exchange operation into a single AArch64 store
instruction whenever the value read by the exchange is never used.
It is a miscompilation since an acquire-fence may induce synchronization when
it follows a relaxed exchange operation, but not when it follows a store.
Following target codes are obtained by "clang++ -std=c++11 -O1"
source (C++):
================================
void foo(atomic<uint64_t> &X) {
  X.exchange(42, memory_order_relaxed);
}
================================
target (IR):
================================
define dso_local void @_Z3fooRSt6atomicImE(%"struct.std::atomic"* nocapture
nonnull align 8 dereferenceable(8) %X) local_unnamed_addr #0 {
entry:
  %_M_i.i = getelementptr inbounds %"struct.std::atomic",
%"struct.std::atomic"* %X, i64 0, i32 0, i32 0
  store atomic i64 42, i64* %_M_i.i monotonic, align 8
  ret void
}
================================
target (assembly):
================================
_Z3fooRSt6atomicImE:                    // @_Z3fooRSt6atomicImE
// %bb.0:                               // %entry
        mov     w8, #42
        str     x8, [x0]
        ret
================================
The following program demonstrates a new behavior introduced by this
miscompilation.
================================
uint64_t foo(atomic<uint64_t> &X, atomic<uint64_t> &Y) {
  X.exchange(42, memory_order_relaxed);
  atomic_thread_fence(memory_order_acquire);
  return Y.load(memory_order_relaxed);
}
uint64_t bar(atomic<uint64_t> &X, atomic<uint64_t> &Y) {
  Y.store(1, memory_order_relaxed);
  return X.fetch_add(1, memory_order_release);
}
================================
When "foo" and "bar" running in parallel (where both X and Y are initialized to
0), both "foo" and "bar" returning 0 at the same time is not allowed by C++11.
In particular, if "X.fetch_add" by "bar" read 0 from X and updated X to 1,
"X.exchange" by "foo" is forced to read 1 and update to 42 due to the atomicity
of the "fetch_add".
In this case, the acquire fence by "foo" induces a happens-before relation 
between "Y.store" by "bar" and "Y.load" by "foo".
However both functions returning 0 is allowed by AArch64 when the exchange
operation by "foo" is optimized into a store instruction.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200715/93eb03a5/attachment.html>