[LLVMdev] Proposal: "load linked" and "store conditional" atomic instructions

Thu May 29 10:21:40 PDT 2014

Hi Philip,

On 29 May 2014 17:03, Philip Reames <listmail at philipreames.com> wrote:
> I have some reservations about this proposal.  I don't have anything
> particularly concrete, but the idea of supporting both LL/SC and atomicrwm
> in the IR concerns me from a complexity perspective.

Well, I'll start by saying my particular optimisation use case looks
like it's not enough to justify the addition. I've got something
basically working for my efficiency worries, with less effort than I
thought. So I'm a lot less keen on it myself than I was a few hours
ago.

But I'm still worried about how closely LLVM IR is tied to both C and
X86 in this matter. A weak cmpxchg would go a long way to resolving
this, but it's still difficult to see a path from an IR-level "cmpxchg
weak" to optimal "atomicrmw lambda" support in LL/SC backends.

Given C like

    void atomic_foo(int *addr) {
      int oldval = *addr;
      do {
        newval = foo(oldval);
      } while (__c11_compare_exchange_weak(addr, &oldval, newval));

The cmpxchg representation would be something like:

    define void @atomic_foo(int *addr) {
    entry:
        %firstval = load i32* %addr
        br label %loop
    loop:
        %oldval = phi i32 [%firstval, %entry], [%wrongval, %loop]
        %newval = call i32 @foo(i32 %oldval)
        %res = cmpxchg weak i32* %addr, i32 %oldval, i32 %newval
        %wrongval = extractvalue { i32, i1 } %res, 0
        %success = extractvalue { i32, i1 } %res, 1
        br i1 %success, label %end, label %loop
    end:
        ret void
    }

But the optimal LL/SC form would be more like:

    define void @atomic_foo(int *addr) {
    entry:
        br label %loop
    loop:
        %oldval = load linked i32* %addr
        %newval = call i32 @foo(i32 %oldval)
        %success = store conditional i32 %newval, i32* %addr
        br i1 %success, label %end, label %loop
    end:
        ret void
    }

That kind of analysis is a very big burden to put on any pass. On the
other hand, mapping the other way doesn't seem much simpler either.

I feel like there ought to be a good way to combine this with
Haswell's xbegin/xend functionality in an even more generic IR
construct too, but I can't quite come up with a representation. More
thought needed.

> Tim, for those of us not directly involved, could you share a selection of
> bugs or other background?  I'd like to read through and try to get a better
> sense for the problem you're trying to solve.

My immediate concern was the C++11 compare_exchange which inserts an
automatic and mostly redundant icmp after the result. Variants of the
IR in my original message are representative, though possibly not
exhaustive, of what might be seen.

Of course, it's all much more speculative now. Except, possibly, "how
should we handle compare_exchange_weak".

Cheers.

Tim.