[LLVMdev] Plan to optimize atomics in LLVM

Fri Aug 8 10:53:48 PDT 2014

I am planning in doing in IR, but with target specific-passes (such as
X86ExpandAtomicPass), that just share some of the code (possibly by having
each of the target-specific passes inherit from and override a
target-independent pass).

The reasons for doing it in IR are the following:
- easier sharing of target-independent code
- easier dealing with control-flow (especially useful for advanced fence
elimination)
But it must be target-dependent as for example on Power a seq_cst store has
a fence before it, while on ARM it has a fence both before and after it
(per http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html)

For this exact reason, I am planning on splitting
AtomicExpandLoadLinkedPass in a target-independent and a target-dependent
file: the current pass claims to be target-independent but is actually
designed for ARM: for example it puts a release-fence before a seq_cst CAS,
which would be unsound on Power if the backend was more agressive and using
lwsync for release_fences. Since these fences are not chosen based on the
LLVM fences semantics, but on the hardware memory model, I was thinking of
inserting target-specific intrinsics (dmb/isb on ARM, hwsync/lwsync/isync
on Power), to make it clearer that these passes are target-specific and
unsound outside of their target.

Another thing I would have to move to this IR pass is the insertion of
fences around atomic stores/loads when insertFencesForAtomic==true. It is
currently happening in SelectionDAGBuilder, which makes it impossible to do
fence elimination at the IR level.

Is it reasonable, or is there some rule against using hardware-specific
intrinsics at the hardware level (or some other problem with this approach)?

Cheers,
Robin Morisset

On Thu, Aug 7, 2014 at 11:34 PM, Tim Northover <t.p.northover at gmail.com>
wrote:

> > Longer term, I hope to improve the fence elimination of the ARM backend
> with
> > a kind of PRE algorithm. Both of these improvements to the ARM backend
> > should be fairly straightforward to port to the POWER architecture later,
> > and I hope to also do that.
> >
> > Any reason these couldn't be done at the IR level?
>
> I definitely agree here. At the time, it was a plausible idea (the
> barriers didn't even exist in IR most of the time). But the
> implementation was always going to be much more complicated and less
> portable than in IR, and what we actually have is very flawed in its
> own right (only applies to ARM mode, unmaintained,
>
> Actually, I think we decided to remove it a while back, but I haven't
> gotten around to it yet.
>
> Cheers.
>
> Tim.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140808/a581880f/attachment.html>