[LLVMdev] Proposal: "load linked" and "store conditional" atomic instructions

Thu May 29 09:03:02 PDT 2014

I have some reservations about this proposal.  I don't have anything 
particularly concrete, but the idea of supporting both LL/SC and 
atomicrwm in the IR concerns me from a complexity perspective.  If we 
find that there is significant profit in supporting both, we can, but I 
would like to see documentation of significant benefit before changing 
the IR.

Tim, for those of us not directly involved, could you share a selection 
of bugs or other background?  I'd like to read through and try to get a 
better sense for the problem you're trying to solve.

Philip

On 05/29/2014 04:55 AM, Tim Northover wrote:
> Hi,
>
> I've been looking at improving atomicrmw & cmpxchg code more,
> particularly on architectures using the load-linked/store-conditional
> model.
>
> The summary is that current expansion for cmpxchg seems to happen too
> late for LLVM to make meaningful use of the opportunities it provides.
> I'd like to move it earlier and express it in terms of a first-class
> pair of "load linked" and "store conditional" instructions.
>
> More details are attached and inlined below. Currently the only code I
> have is for one or two AtomicExpand peepholes, so any suggestions or
> alternatives would be very welcome. It's possible this is entirely the
> wrong approach.
>
> What do people think?
>
> Cheers.
>
> Tim.
>
> The Problem
> -----------
>
> We now expand atomic instructions to load-linked & store-conditional intrinsics
> just before ISel, but this expansion opens up more optimisation opportunities
> that LLVM *would* be able to exploit if it saw the control flow earlier.
>
> For example the return value of the C++11 and C11 compare_exchange operations is
> actually whether the exchange succeeded, which leads to some common idioms in
> Clang-produced IR.
>
>  From "if(__c11_compare_exchange_strong(...))":
>      %loaded = cmpxchg i32* %addr, i32 %oldval, i32 %newval seq_cst seq_cst
>      %success = icmp eq i32 %loaded, %oldval
>      br i1 %success, label %true, label %false
> the control-flow here should be something like:
>      loop:
>        %loaded = load linked i32* %addr seq_cst
>        %trystore = icmp eq %loaded, %oldval
>        br i1 %trystore, label %store.cond, label %false
>      store.cond:
>        %success = store conditional i32 %newval, i32* %addr seq_cst
>        br i1 %success, label %true, label %loop
>
>  From "return __c11_compare_exchange_strong(...);":
>      %loaded = cmpxchg i32* %addr, i32 %oldval, i32 %newval seq_cst seq_cst
>      %success = icmp eq i32 %loaded, %oldval
>      %res = zext i1 %success to i32
>      ret i32 %res
> here, we'd want something like:
>      loop:
>        %loaded = load linked i32* %addr seq_cst
>        %trystore = icmp eq %loaded, %oldval
>        br i1 %trystore, label %store.cond, label %false
>      store.cond:
>        %success = store conditional i32 %newval, i32* %addr seq_cst
>        br i1 %success, label %true, label %loop
>      true:
>        ret i32 1
>      false:
>        ret i32 0
>
> In both cases the compare is largely redundant in LL/SC architectures. You know
> by the control flow graph what its result will be, and if LLVM sees this early
> enough it can produce something close to optimal code.
>
> Unfortunately, most of LLVM has finished by the time this expansion occurs, so
> we end up with redundant compares.
>
> Now, it's certainly possible to put various peephole optimisations into the
> expansion pass to cover these cases (at least), but to me that feels like
> fragile duplication of effort we've already made in the mid-end.
>
> My Proposal
> -----------
>
> I think the best way to solve this longer term is to add explicit "load linked"
> and "store conditional" instructions to the LLVM IR, with equal status to
> "cmpxchg" and "atomicrmw".
>
> We could then simplify the AtomicExpandLoadLinked pass to use these, change it
> to use something like a TargetTransformInfo hook instead of TargetLowering and
> schedule it much earlier in the mid-end. X86 & SystemZ would simply request no
> expansion happen.
>
> This means Clang could still emit cmpxchg and atomicrmw, which closely match C &
> C++ semantics, but most of LLVM would see the control-flow and operations in a
> form much closer to how the targers will actually emit them.
>
> In addition, other languages would be able to provide more general atomic
> operations, if they want (slightly more complex bit twiddling seems like it
> could be particularly useful in some algorithms). Current atomic instructions
> cover (most of) C and C++, but nothing else.
>
> Potential issues
> ----------------
>
> 1. Can invalid transformations be done on them? Separating an LL from its
>     corresponding SC by too much almost guarantees failure (an OS task switch
>     happens, clearing the monitor).
> 2. We'd need to document their interaction with LLVM's memory model. I'm not
>     sure I've seen a good happens-before model that explicitly uses LL/SC. The
>     common ones people came up with in the Java & C++11 standardisation process
>     seem to assume atomic RMW operations instead.
> 3. People might be assuming StoreInst produces no value and not calling
>     ReplaceAllUses. A separate StoreConditionalInst is one option, but that risks
>     the reverse, people not considering it a store when it is.
> 4. Not all targets can support them. There are very few LLVM instructions like
>     that, which is a little concerning. But not all targets can support current
>     atomics either.
>
>     If we make vulnerability to ABA problems target-specific in our documentation
>     then X86 & SystemZ could use the same trick as for min/max to implement them
>     generically: a load/cmpxchg loop.
>
> Alternatives
> ------------
>
> 1. Enhancing MachineCSE and friends so they can optimise this.
>      - Works for "%old = atomicrmw OP %addr, %inc; OP %old, %inc" already.
>      - Cmpxchg is more difficult. It's the control-flow optimisations that are
>        particularly powerful in these cases.
>      - It's more duplicated effort between different parts of LLVM.
>
> 2. Peephole optimisations in the AtomicExpandLoadLinked pass.
>      - Feels like a hack, spotting particular patterns that the mid-end can
>        already cope with.
>      - Deals with toy functions well, but it's not quite clear that those are the
>        only options in more general code (after inlining, say).
>      - Potentially safest option: atomicrmw & cmpxchg preserved until
>        very late. Nothing's going to come in and convert valid code to invalid.
>
> 3. Moving expansion further forward but still using intrinsics.
>      - The intrinsics need to have fully general side effects to be correct:
>        effectively an "asm volatile" for optimisation purposes, which is very
>        heavy-handed for LLVM's other optimisations.
>      - Still need target hooks to create the calls, because intrinsics don't get
>        type lowered and so you can't write a fully generic one (e.g. an i64 ldrex
>        on ARM needs to return { i32, i32 }).
>
> 4. Change the cmpxchg operation to return (say) { iN, i1 } where the second
>     value indicates success.
>      - Probably good idea anyway as part of support for weak compare-exchange
>        operations.
>      - Doesn't actually help this issue much: it's mostly control-flow
>        optimisations that seem to be helpful, and this would leave us with phi
>        nodes in need of materialisation when lowered.
>      - Might make the peephole optimisations in 2 slightly simpler.
>
> 5. Some kind of generalisation that can cover both LL/SC and transactional
>     memory to avoid doing this again if and when that works out?
>      - I can't think of one that's both neat and practically implementable for
>        both.
>      - Suggestions welcome.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140529/8b05f284/attachment.html>