<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">I have some reservations about this

      proposal.  I don't have anything particularly concrete, but the

      idea of supporting both LL/SC and atomicrwm in the IR concerns me

      from a complexity perspective.  If we find that there is

      significant profit in supporting both, we can, but I would like to

      see documentation of significant benefit before changing the IR. 

      <br>

      <br>

      Tim, for those of us not directly involved, could you share a

      selection of bugs or other background?  I'd like to read through

      and try to get a better sense for the problem you're trying to

      solve.  <br>

      <br>

      Philip<br>

      <br>

      On 05/29/2014 04:55 AM, Tim Northover wrote:<br>

    </div>

    <blockquote

cite="mid:CAFHTzf+=Y5m7SH5sVxq8fQVfg0MKkJWz8T6VPO2gNT4Yy-CGhQ@mail.gmail.com"

      type="cite">

      <pre wrap="">Hi,

I've been looking at improving atomicrmw & cmpxchg code more,

particularly on architectures using the load-linked/store-conditional

model.

The summary is that current expansion for cmpxchg seems to happen too

late for LLVM to make meaningful use of the opportunities it provides.

I'd like to move it earlier and express it in terms of a first-class

pair of "load linked" and "store conditional" instructions.

More details are attached and inlined below. Currently the only code I

have is for one or two AtomicExpand peepholes, so any suggestions or

alternatives would be very welcome. It's possible this is entirely the

wrong approach.

What do people think?

Cheers.

Tim.

The Problem

-----------

We now expand atomic instructions to load-linked & store-conditional intrinsics

just before ISel, but this expansion opens up more optimisation opportunities

that LLVM *would* be able to exploit if it saw the control flow earlier.

For example the return value of the C++11 and C11 compare_exchange operations is

actually whether the exchange succeeded, which leads to some common idioms in

Clang-produced IR.

>From "if(__c11_compare_exchange_strong(...))":

    %loaded = cmpxchg i32* %addr, i32 %oldval, i32 %newval seq_cst seq_cst

    %success = icmp eq i32 %loaded, %oldval

    br i1 %success, label %true, label %false

the control-flow here should be something like:

    loop:

      %loaded = load linked i32* %addr seq_cst

      %trystore = icmp eq %loaded, %oldval

      br i1 %trystore, label %store.cond, label %false

    store.cond:

      %success = store conditional i32 %newval, i32* %addr seq_cst

      br i1 %success, label %true, label %loop

>From "return __c11_compare_exchange_strong(...);":

    %loaded = cmpxchg i32* %addr, i32 %oldval, i32 %newval seq_cst seq_cst

    %success = icmp eq i32 %loaded, %oldval

    %res = zext i1 %success to i32

    ret i32 %res

here, we'd want something like:

    loop:

      %loaded = load linked i32* %addr seq_cst

      %trystore = icmp eq %loaded, %oldval

      br i1 %trystore, label %store.cond, label %false

    store.cond:

      %success = store conditional i32 %newval, i32* %addr seq_cst

      br i1 %success, label %true, label %loop

    true:

      ret i32 1

    false:

      ret i32 0

In both cases the compare is largely redundant in LL/SC architectures. You know

by the control flow graph what its result will be, and if LLVM sees this early

enough it can produce something close to optimal code.

Unfortunately, most of LLVM has finished by the time this expansion occurs, so

we end up with redundant compares.

Now, it's certainly possible to put various peephole optimisations into the

expansion pass to cover these cases (at least), but to me that feels like

fragile duplication of effort we've already made in the mid-end.

My Proposal

-----------

I think the best way to solve this longer term is to add explicit "load linked"

and "store conditional" instructions to the LLVM IR, with equal status to

"cmpxchg" and "atomicrmw".

We could then simplify the AtomicExpandLoadLinked pass to use these, change it

to use something like a TargetTransformInfo hook instead of TargetLowering and

schedule it much earlier in the mid-end. X86 & SystemZ would simply request no

expansion happen.

This means Clang could still emit cmpxchg and atomicrmw, which closely match C &

C++ semantics, but most of LLVM would see the control-flow and operations in a

form much closer to how the targers will actually emit them.

In addition, other languages would be able to provide more general atomic

operations, if they want (slightly more complex bit twiddling seems like it

could be particularly useful in some algorithms). Current atomic instructions

cover (most of) C and C++, but nothing else.

Potential issues

----------------

1. Can invalid transformations be done on them? Separating an LL from its

   corresponding SC by too much almost guarantees failure (an OS task switch

   happens, clearing the monitor).

2. We'd need to document their interaction with LLVM's memory model. I'm not

   sure I've seen a good happens-before model that explicitly uses LL/SC. The

   common ones people came up with in the Java & C++11 standardisation process

   seem to assume atomic RMW operations instead.

3. People might be assuming StoreInst produces no value and not calling

   ReplaceAllUses. A separate StoreConditionalInst is one option, but that risks

   the reverse, people not considering it a store when it is.

4. Not all targets can support them. There are very few LLVM instructions like

   that, which is a little concerning. But not all targets can support current

   atomics either.

   If we make vulnerability to ABA problems target-specific in our documentation

   then X86 & SystemZ could use the same trick as for min/max to implement them

   generically: a load/cmpxchg loop.

Alternatives

------------

1. Enhancing MachineCSE and friends so they can optimise this.

    - Works for "%old = atomicrmw OP %addr, %inc; OP %old, %inc" already.

    - Cmpxchg is more difficult. It's the control-flow optimisations that are

      particularly powerful in these cases.

    - It's more duplicated effort between different parts of LLVM.

2. Peephole optimisations in the AtomicExpandLoadLinked pass.

    - Feels like a hack, spotting particular patterns that the mid-end can

      already cope with.

    - Deals with toy functions well, but it's not quite clear that those are the

      only options in more general code (after inlining, say).

    - Potentially safest option: atomicrmw & cmpxchg preserved until

      very late. Nothing's going to come in and convert valid code to invalid.

3. Moving expansion further forward but still using intrinsics.

    - The intrinsics need to have fully general side effects to be correct:

      effectively an "asm volatile" for optimisation purposes, which is very

      heavy-handed for LLVM's other optimisations.

    - Still need target hooks to create the calls, because intrinsics don't get

      type lowered and so you can't write a fully generic one (e.g. an i64 ldrex

      on ARM needs to return { i32, i32 }).

4. Change the cmpxchg operation to return (say) { iN, i1 } where the second

   value indicates success.

    - Probably good idea anyway as part of support for weak compare-exchange

      operations.

    - Doesn't actually help this issue much: it's mostly control-flow

      optimisations that seem to be helpful, and this would leave us with phi

      nodes in need of materialisation when lowered.

    - Might make the peephole optimisations in 2 slightly simpler.

5. Some kind of generalisation that can cover both LL/SC and transactional

   memory to avoid doing this again if and when that works out?

    - I can't think of one that's both neat and practically implementable for

      both.

    - Suggestions welcome.

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>