[LLVMdev] Proposal: "load linked" and "store conditional" atomic instructions

Thu May 29 11:31:21 PDT 2014

On 05/29/2014 10:21 AM, Tim Northover wrote:
> Hi Philip,
>
> On 29 May 2014 17:03, Philip Reames <listmail at philipreames.com> wrote:
>> I have some reservations about this proposal.  I don't have anything
>> particularly concrete, but the idea of supporting both LL/SC and atomicrwm
>> in the IR concerns me from a complexity perspective.
> Well, I'll start by saying my particular optimisation use case looks
> like it's not enough to justify the addition. I've got something
> basically working for my efficiency worries, with less effort than I
> thought. So I'm a lot less keen on it myself than I was a few hours
> ago.
Good to know.
>
> But I'm still worried about how closely LLVM IR is tied to both C and
> X86 in this matter. A weak cmpxchg would go a long way to resolving
> this, but it's still difficult to see a path from an IR-level "cmpxchg
> weak" to optimal "atomicrmw lambda" support in LL/SC backends.
I share your concerns actually.  It doesn't effect my current usage so 
it's not a high priority for me, but from a idealist perspective, it is 
worrying.  On the other hand, overly generic IR is an evil in and of 
itself.  So it's a delicate balance.
>
> Given C like
>
>      void atomic_foo(int *addr) {
>        int oldval = *addr;
>        do {
>          newval = foo(oldval);
>        } while (__c11_compare_exchange_weak(addr, &oldval, newval));
>
> The cmpxchg representation would be something like:
>
>      define void @atomic_foo(int *addr) {
>      entry:
>          %firstval = load i32* %addr
>          br label %loop
>      loop:
>          %oldval = phi i32 [%firstval, %entry], [%wrongval, %loop]
>          %newval = call i32 @foo(i32 %oldval)
>          %res = cmpxchg weak i32* %addr, i32 %oldval, i32 %newval
>          %wrongval = extractvalue { i32, i1 } %res, 0
>          %success = extractvalue { i32, i1 } %res, 1
>          br i1 %success, label %end, label %loop
>      end:
>          ret void
>      }
>
> But the optimal LL/SC form would be more like:
>
>      define void @atomic_foo(int *addr) {
>      entry:
>          br label %loop
>      loop:
>          %oldval = load linked i32* %addr
>          %newval = call i32 @foo(i32 %oldval)
>          %success = store conditional i32 %newval, i32* %addr
>          br i1 %success, label %end, label %loop
>      end:
>          ret void
>      }
>
> That kind of analysis is a very big burden to put on any pass. On the
> other hand, mapping the other way doesn't seem much simpler either.
>
> I feel like there ought to be a good way to combine this with
> Haswell's xbegin/xend functionality in an even more generic IR
> construct too, but I can't quite come up with a representation. More
> thought needed.
I agree with both points, but particularly the more thought needed one.  :)

While it's tempting to introduce scoped atomic constructs which map 
nicely to all three, this looses much of the power of the xbegin/xend 
scheme.  Being able to spread transactions across function boundaries is 
essential.

I suspect we'll have to end up modelling the transaction boundaries as 
some form of memory fence.  This doesn't get all of their semantics, but 
it does prevent a number of illegal transforms.
xbegin -> loadstore, storestore fence after (i.e. stores can't float out 
of the atomic region!)
xend -> storestore, storeload fence before (nor this way)

You probably do want to allow load reordering into a transaction past an 
xend.  Doing so past a xbegin is legal (I think?), but likely not 
profitable.  It can turn a potentially succeeding transaction into an 
always failing one.  (Or an always succeeding one.)  There's a lot of 
cases to be explored here both w.r.t. legality and profitability.

It would also be good to get input from folks who've built previous 
compilers with T.M. constructs.  I just don't know enough about prior 
art to propose a good design.

>
>> Tim, for those of us not directly involved, could you share a selection of
>> bugs or other background?  I'd like to read through and try to get a better
>> sense for the problem you're trying to solve.
> My immediate concern was the C++11 compare_exchange which inserts an
> automatic and mostly redundant icmp after the result. Variants of the
> IR in my original message are representative, though possibly not
> exhaustive, of what might be seen.
>
> Of course, it's all much more speculative now. Except, possibly, "how
> should we handle compare_exchange_weak".
>
> Cheers.
>
> Tim.