[llvm-dev] [CodeGen][AtomicExpandPass] Initial load not atomic?

Mon Jan 25 02:05:51 PST 2021

Hi,

I believe that this is still correct for your example.  The compare and 
exchange is sequentially consistent and so establishes ordering and and 
happens-before relationships.  The load is unordered but if it contains 
any value other than the load that happens in the compare-and-exchange 
then the compare fails and the loop retries.  On the second iteration, 
we have the value from the sequentially consistent atomic.  I believe 
this holds for other memory orderings as well.

This would be an invalid operation if it were done as a source-to-source 
transform in C, because C states that mixing atomic and non-atomic 
accesses to the same memory location is UB and so the transformed 
version would suffer from UB but the source version would not.  I can't 
find any equivalent text in the LLVM LangRef (we should probably 
explicitly clarify this) and my understanding is that LLVM semantics 
permit tearing and do not guarantee any ordering, but give, at worse, an 
unspecified value.  Even if we statically propagate that, I would hope 
that the worst thing that could happen is that the load is elided 
entirely and we start the loop with a cmpxchg on an arbitrary expected 
value which may succeed (if you are very lucky) but not in a way that 
the guess is observable outside or, more likely, the first cmpxchg 
around the loop fails and we are now in a completely valid and slightly 
more strongly ordered than necessary code path.

That said, I can't find anything in a quick re-skim of the atomics part 
of the spec to tell me if this is guaranteed.  This code probably 
originated on x86, where all variants of load (atomic or otherwise) are 
lowered to the same instruction.  I'm not sure what other architectures 
use cmpxchg lowering, rather than an LL/SC primitive, and aren't TSO.

David

On 24/01/2021 16:42, Itay Bookstein via llvm-dev wrote:
> Hey all,
> 
> While reading through AtomicExpand::insertRMWCmpXchgLoop I noticed
> that the initial load is documented to be atomic but the code below
> generates a non-atomic load. I'm not sure whether it's an actual
> problem or not - could that be mis-compiled or mis-optimized?
> 
> Inline documentation comment here
> https://github.com/llvm/llvm-project/blob/8b9df70bf7e7b812715a3dc9772719188e0df06c/llvm/lib/CodeGen/AtomicExpandPass.cpp#L1394
> 
> Example:
> 
> define dso_local zeroext i16 @foo(i16* %0, i16 zeroext %1, i16 zeroext
> %2) local_unnamed_addr #0 {
> %4 = atomicrmw volatile nand i16* %0, i16 %1 seq_cst
> ret i16 %4
> }
> 
> Running opt -atomic-expand gives:
> 
> define dso_local zeroext i16 @foo(i16* %0, i16 zeroext %1, i16 zeroext
> %2) local_unnamed_addr #0 {
> %4 = load i16, i16* %0, align 2
> br label %atomicrmw.start
> 
> atomicrmw.start: ; preds = %atomicrmw.start, %3
> %loaded = phi i16 [ %4, %3 ], [ %newloaded, %atomicrmw.start ]
> %5 = and i16 %loaded, %1
> %new = xor i16 %5, -1
> %6 = cmpxchg i16* %0, i16 %loaded, i16 %new seq_cst seq_cst
> %success = extractvalue { i16, i1 } %6, 1
> %newloaded = extractvalue { i16, i1 } %6, 0
> br i1 %success, label %atomicrmw.end, label %atomicrmw.start
> 
> atomicrmw.end: ; preds = %atomicrmw.start
> ret i16 %newloaded
> }
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>