[LLVMdev] Plan to optimize atomics in LLVM

Robin Morisset morisset at google.com
Fri Aug 8 14:10:06 PDT 2014


On Fri, Aug 8, 2014 at 1:49 PM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 08/08/2014 11:42 AM, Tim Northover wrote:
>
>> I am planning in doing in IR, but with target specific-passes (such as
>>> X86ExpandAtomicPass)
>>> that just share some of the code
>>>
>> This would more normally be done via target hooks in LLVM, though the
>> principle is sound.
>>
>>  But it must be target-dependent as for example on Power a
>>> seq_cst store has a fence before it, while on ARM it has a fence
>>> both before and after it (per http://www.cl.cam.ac.uk/~
>>> pes20/cpp/cpp0xmappings.html)
>>>
>> That certainly seems to suggest some kind of parametrisation.
>>
> An alternate way of saying this might be that both ARM and Power require
> the store to be fenced before and after.  On Power the fence after is
> implicit, where on ARM it is not.  (Is this actually correct?  I don't know
> either of these models well.)
>
> Could you use that framing to factor the arch specific and general parts?
>  I'd really love to have a generic barrier combine pass which can work on
> the IR semantics independent of the architecture barrier semantics.


More precisely, Both ARM and Power require a barrier between every store
seq_cst and every later load seq_cst (among lots of other requirements).
On Power the mapping achieves this by a full fence before every load
seq_cst, whereas ARM uses a full fence after ever store seq_cst.

I would also love to have a generic barrier combine pass, but I strongly
doubt it is at all possible.


>
>>  Is it reasonable, or is there some rule against using hardware-specific
>>> intrinsics at the hardware level (or some other problem with this
>>> approach)?
>>>
>> Lots of the changes sound like they're going in the right direction.
>> I'd particularly pleased to see other architectures using (via
>> whatever adaptations are necessary) the atomic expansion pass; I think
>> that could significantly simplify other backends.
>>
>> I'm a little concerned about changing the "fence XYZ" conversion into
>> target intrinsics, but it looks likely it'd be necessary for
>> performance even if the current scheme does turn out to be correct so
>> I say go for it!
>>
> I would say there's a burden of justification that the target intrinsic
> approach is substantially better performance wise.  This doesn't have to be
> extensive, but something should be presented. (If the generic approach is
> actually possible.)


For one simple example: acquire loads on ARM that are followed by a
dependent branch can be implemented by putting an isb fence at each target
of the branch (I can lookup the reference for this if you want), which is
supposedly cheaper (I am hoping to get some benchmarks on this and similar
things soon). But all the C11 fences, including the acquire fence require a
full dmb barrier. So it is impossible to express this optimized mapping of
acquire loads in a target-independent way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140808/4f85ac1b/attachment.html>


More information about the llvm-dev mailing list