[LLVMdev] Proposal: "load linked" and "store conditional" atomic instructions
Philip Reames
listmail at philipreames.com
Thu May 29 11:31:21 PDT 2014
On 05/29/2014 10:21 AM, Tim Northover wrote:
> Hi Philip,
>
> On 29 May 2014 17:03, Philip Reames <listmail at philipreames.com> wrote:
>> I have some reservations about this proposal. I don't have anything
>> particularly concrete, but the idea of supporting both LL/SC and atomicrwm
>> in the IR concerns me from a complexity perspective.
> Well, I'll start by saying my particular optimisation use case looks
> like it's not enough to justify the addition. I've got something
> basically working for my efficiency worries, with less effort than I
> thought. So I'm a lot less keen on it myself than I was a few hours
> ago.
Good to know.
>
> But I'm still worried about how closely LLVM IR is tied to both C and
> X86 in this matter. A weak cmpxchg would go a long way to resolving
> this, but it's still difficult to see a path from an IR-level "cmpxchg
> weak" to optimal "atomicrmw lambda" support in LL/SC backends.
I share your concerns actually. It doesn't effect my current usage so
it's not a high priority for me, but from a idealist perspective, it is
worrying. On the other hand, overly generic IR is an evil in and of
itself. So it's a delicate balance.
>
> Given C like
>
> void atomic_foo(int *addr) {
> int oldval = *addr;
> do {
> newval = foo(oldval);
> } while (__c11_compare_exchange_weak(addr, &oldval, newval));
>
> The cmpxchg representation would be something like:
>
> define void @atomic_foo(int *addr) {
> entry:
> %firstval = load i32* %addr
> br label %loop
> loop:
> %oldval = phi i32 [%firstval, %entry], [%wrongval, %loop]
> %newval = call i32 @foo(i32 %oldval)
> %res = cmpxchg weak i32* %addr, i32 %oldval, i32 %newval
> %wrongval = extractvalue { i32, i1 } %res, 0
> %success = extractvalue { i32, i1 } %res, 1
> br i1 %success, label %end, label %loop
> end:
> ret void
> }
>
> But the optimal LL/SC form would be more like:
>
> define void @atomic_foo(int *addr) {
> entry:
> br label %loop
> loop:
> %oldval = load linked i32* %addr
> %newval = call i32 @foo(i32 %oldval)
> %success = store conditional i32 %newval, i32* %addr
> br i1 %success, label %end, label %loop
> end:
> ret void
> }
>
> That kind of analysis is a very big burden to put on any pass. On the
> other hand, mapping the other way doesn't seem much simpler either.
>
> I feel like there ought to be a good way to combine this with
> Haswell's xbegin/xend functionality in an even more generic IR
> construct too, but I can't quite come up with a representation. More
> thought needed.
I agree with both points, but particularly the more thought needed one. :)
While it's tempting to introduce scoped atomic constructs which map
nicely to all three, this looses much of the power of the xbegin/xend
scheme. Being able to spread transactions across function boundaries is
essential.
I suspect we'll have to end up modelling the transaction boundaries as
some form of memory fence. This doesn't get all of their semantics, but
it does prevent a number of illegal transforms.
xbegin -> loadstore, storestore fence after (i.e. stores can't float out
of the atomic region!)
xend -> storestore, storeload fence before (nor this way)
You probably do want to allow load reordering into a transaction past an
xend. Doing so past a xbegin is legal (I think?), but likely not
profitable. It can turn a potentially succeeding transaction into an
always failing one. (Or an always succeeding one.) There's a lot of
cases to be explored here both w.r.t. legality and profitability.
It would also be good to get input from folks who've built previous
compilers with T.M. constructs. I just don't know enough about prior
art to propose a good design.
>
>> Tim, for those of us not directly involved, could you share a selection of
>> bugs or other background? I'd like to read through and try to get a better
>> sense for the problem you're trying to solve.
> My immediate concern was the C++11 compare_exchange which inserts an
> automatic and mostly redundant icmp after the result. Variants of the
> IR in my original message are representative, though possibly not
> exhaustive, of what might be seen.
>
> Of course, it's all much more speculative now. Except, possibly, "how
> should we handle compare_exchange_weak".
>
> Cheers.
>
> Tim.
More information about the llvm-dev
mailing list