[llvm-dev] [RFC] Injecting new element atomic memory intrinsics into MemIntrinsic class hierarchy

Thu Aug 17 10:56:28 PDT 2017

I'll comment that Daniel and I have discussed this pretty extensively 
offline.  I am now convinced that option 1 is our best choice out of a 
set of not entirely great choices.  If no one has any comments, that's 
the approach we'll end up pursuing, but we wanted to give the community 
a chance to chime in here.  Maybe someone can come up with a better API 
design than we've managed. :)

Philip

On 08/17/2017 08:32 AM, Daniel Neilson via llvm-dev wrote:
>
> Hi all,
>  We somewhat recently created/updated element-wise unordered-atomic 
> versions of the memcpy, memmove, and memset memory intrinsics:
> Memcpy: https://reviews.llvm.org/rL305558
> Memmove: https://reviews.llvm.org/rL307796
> Memset: https://reviews.llvm.org/rL307854
>
>  These intrinsics are semantically similar to the regular versions. 
> The main difference is that the underlying operation is performed 
> entirely with unordered atomic loads & stores of a given size.
>
>  Our ultimate purpose is to enable exploitation of the memory 
> intrinsic idioms and optimizations in languages like Java that make 
> extensive use of unordered-atomic loads & stores. To this end, we will 
> eventually have to add these new intrinsics to the instruction class 
> hierarchy, and teach passes how to deal with them; talking about how 
> to do this is the purpose of this RFC.
>
>  We have started adding canary tests to some passes, and will continue 
> to do so in preparation for adding the element atomic intrinsics to 
> the instruction class hierarchy. These canary tests are simply 
> placeholders that show that the pass in question currently does 
> nothing with the new element atomic memory instrinics, and 
> could/should generally start failing once the unordered-atomic memory 
> intrinsics are added to the instruction class hierarchy — telling us 
> which passes need to be updated. For example: 
> https://reviews.llvm.org/rL308247
>
>  For adding the new intrinsics to the instruction class hierarchy, the 
> plan will be to add them one at a time — add the intrinsic to the 
> hierarchy, and flush out all problems before adding the next. We’re 
> thinking that it would be best to start with memset, then memcpy, then 
> memmove; memset is kind of it’s own thing, and there are some passes 
> that change memcpy/memmove into a memset so it would be beneficial to 
> have that in place before memcpy/memmove, and there are also passes 
> that change memmove into memcpy but I haven’t found any that go the 
> other way (I can’t imagine why you’d want to) so it would be good to 
> have memcpy squared away before doing memmove.
>
>  There are a few options that we have thought of for adding the new 
> memory intrinsics to the instruction class hierarchy. We have a 
> preference for the first option, but would appreciate 
> thoughts/opinions/discussion on the best way to insert the element 
> atomic intrinsics into the hierarchy. For background, the relevant 
> portion of the current hierarchy looks like:
>
> MemIntrinsic
> * MemSetInst
> * MemTransferInst
>  ** MemCpyInst
>  ** MemMoveInst
>
> Option 1)
> -------------
>  Do not add any new classes to the hierarchy. Instead, teach each 
> class to recognize the unordered-atomic variant as belonging to the 
> class (ex: element unordered-atomic memcpy is recognized as a 
> MemCpyInst, MemTransferInst, and a MemIntrinsic), and add a query 
> method like ‘isUnorderedAtomic’ that will tell you whether the 
> underlying intrinsic is one of the element atomic variants. 
> Alternatively, the method could be a more generic ‘getOrdering()’ that 
> would return an AtomicOrdering; though a memory intrinsic that is 
> required to be implemented with ordered-atomic instructions strikes me 
> as unlikely to ever be desired.
>
>  There is precedent for encoding the intrinsic property as a query 
> method both within the current memory intrinsic hierarchy itself (the 
> isVolatile() method) and the load/store instruction classes.
>
>  The way forward with this option would be to:
>   (i) add the method to MemInstrinsic (returning false); then
>   (ii) update all passes, one at a time, to bail-out if 
> isUnorderedAtomic() is true; then
>   (iii) add an intrinsic to the class hierarchy; then
>   (iv) update passes to exploit isUnorderedAtomic() one at a time
>   (v) repeat (iii) & (iv) for each unordered-atomic intrinsic
>
> Pros:
>  Significant reduction in copy/paste code over other options; passes 
> that use visitors would require some refactoring or some copy/paste to 
> mirror behaviour if the unordered-atomic intrinsics are separate classes.
>  Lends itself to a clear & simple incremental path-forward; each pass 
> can be updated independently of the others which will make reviews 
> more achievable due to not requiring people with working knowledge 
> much beyond the single pass.
>  Easy to leverage new work on memory intrinsic optimization for the 
> unordered-atomic variants; even near-zero cost for passes that won’t 
> be affected by unordered-atomicity.
> Cons:
>  One more attribute to check when implementing a pass that handles a 
> memory intrinsic; could lead to some unexpected bugs where we change 
> an unordered-atomic intrinsic into a regular intrinsic.
>
>  The con here is pretty concerning. We risk making it easy for a 
> developer to forget about the element-atomic variants when introducing 
> an optimization that converts a memmove/memcpy/memset. Such an 
> optimization could, conceivably, result in an unordered-atomic 
> intrinsic being erroneously converted to drop its atomicity; such a 
> bug would only exhibit as a race condition in the resulting program, 
> and could be an absolute beast to debug. The only way to prevent such 
> bugs would be very careful code reviews.
>
> Option 2)
> -------------
>  Add separate classes to the hierarchy for each intrinsic. There are a 
> few different ways that this could be organized (note: UA* = 
> unordered-atomic version of intrinsic, to save typing):
>
> a) Completely disjoint hierarchy
> UAMemIntrinsic
> * UAMemSetInst
> * UAMemTransferInst
>  ** UAMemCpyInst
>  ** UAMemMoveInst
> MemIntrinsic
> * MemSetInst
> * MemTransferInst
>  ** MemCpyInst
>  ** MemMoveInst
>
>  This organization has the benefit of not interacting/interfering with 
> the regular memory intrinsics in any way. That’s also a big downside; 
> most/many passes are going to be able to function on unordered-atomic 
> memory intrinsics the same as they do for the regular memory 
> intrinsics. In some cases this will just mean adding another few isa<> 
> checks at strategic positions, but in many cases this will mean code 
> clones and/or clever refactoring (ex: for passes that incorporate 
> visitors).
>  Being completely disjoint also makes this option super easy to add in 
> incrementally; no pass will know what to do with an unordered-atomic 
> intrinsic until explicitly taught. Of course that’s also a downside as 
> it means that optimizing the unordered-atomic memory intrinsics always 
> has to be explicitly added — nothing comes for free from the regular 
> memory intrinsic work.
>
> b) Disjoint sub-hierarchy of MemIntrinsic
> MemIntrinsic
> * MemSetInst
> * UAMemSetInst
> * MemTransferInst
>  ** MemCpyInst
>  ** MemMoveInst
> * UAMemTransferInst
>  ** UAMemCpyInst
>  ** UAMemMoveInst
>
>  Similar to the (2a) option in a lot of ways, but with some added 
> complications for even the regular memory intrinsics. The hierarchy 
> immediately below MemIntrinsic becomes much wider, so it is no longer 
> sufficient to check isa<MemTransferInst> to narrow down memcpy/memmove 
> vs memset. Also, similar to (2a), there’s the potential for a lot of 
> code clones in passes that incorporate visitors.
>
>  The way forward with this option would be similar to 2a with the 
> exception that we’d have to go through the whole codebase to make sure 
> that "if (! isa<MemTransferInst>) then Inst is memset” and equivalent 
> checks resolve correctly.
>
> c) Child classes of the regular intrinsics
> MemIntrinsic
> * MemSetInst
>  ** UAMemSetInst
> * MemTransferInst
>  ** MemCpyInst
>   *** UAMemCpyInst
>  ** MemMoveInst
>    *** UAMemMoveInst
>
>  There’s not a tonne of difference between this option and option (1), 
> above. All of the existing visitors would recognize unordered-atomic 
> and regular memory intrinsics, so there wouldn’t need to be any code 
> clones to support the new intrinsics.
>  However, passes that care about the atomicity of the intrinsic would 
> have to add isa<> checks to ensure that they’re not acting on the 
> unordered-atomic memory intrinsic when they don’t want to be. It would 
> also be trickier than option (1) to check, at the level of 
> MemIntrinsic/MemTransferInst, whether the intrinsic we have is 
> unordered-atomic; we’d have to explicitly check all of the isa<>’s 
> rather than querying the isUnorderedAtomic() property. We could add 
> the isUnorderedAtomic() query method to this option, but then it’s not 
> much different from option (1).
>
>  The way forward for this option would be very similar to that for 
> option 1.
>
> d) Direct parent classes of the regular intrinsics
> MemIntrinsic
> * UAMemSetInst
>  ** MemSetInst
> * MemTransferInst
>  ** UAMemCpyInst
>   *** MemCpyInst
>  ** UAMemMoveInst
>   *** MemMoveInst
>
>   This is an idea that came up in our discussions. A benefit is that 
> it leaves the existing regular memory intrinsics as leaf classes in 
> the hierarchy; so, if we currently check for, say, isa<MemCpyInst> 
> then we don’t have to change that code because it can’t be 
> unordered-atomic and have that test be true. The benefit is that it 
> would make it harder for a pass to accidentally convert an 
> unordered-atomic intrinsic into a non-atomic intrinsic. It does 
> complicate code that checks whether an intrinsic is a memset by 
> checking for not isa<MemTransferInst>, and other code like that.
>
> e) Multiple inheritance
>
>  This is a variant on options 2b to 2d. To make it possible to easily 
> use isa<> to query for whether or not a memory intrinsic is 
> unordered-atomic or not we could add an interface-like class that all 
> unordered-atomic memory intrinsic classes would multiply inherit from. 
> I’m not sure if there is precedence for this sort of thing in the 
> instruction class hierarchy, though; I haven’t reviewed the entire 
> hierarchy, but all of the parts that I have looked at appear to be 
> single inheritance.
>
> ===
>
> *Note about the align parameter attribute.*
>
>  When creating the unordered-atomic memory intrinsics we decided to 
> use the align attribute on pointer parameters, rather than supplying a 
> single alignment parameter. This adheres to the way that LLVM wants to 
> move going forward, but is different from the way that the memcpy, 
> memmove, and memset intrinsics are currently implemented. This could, 
> potentially, be a complication when integrating the unordered-atomic 
> memory intrinsics into the class hierarchy.  Ideally, it would be good 
> to resurrect the 2.5 yr old work to remove the alignment parameter 
> from memcpy, memmove, and memset:
> commit 8b170f7f290843dc3849eaa75b6f74a87a7a2de6
> Author: Pete Cooper <peter_cooper at apple.com 
> <mailto:peter_cooper at apple.com>>
> Date:   Wed Nov 18 22:17:24 2015 +0000
>
> That was reverted due to bot failures, but the history on exactly what 
> those failures are appears, to me, to be lost:
>
> commit 6d024c616a2a4959f8dfe5c64d27f89b394cf042
> Author: Pete Cooper <peter_cooper at apple.com 
> <mailto:peter_cooper at apple.com>>
> Date:   Thu Nov 19 05:56:52 2015 +0000
>
>  Baring that, there is still a way forward for integrating the 
> unordered-atomic memory intrinsics into the class hierarchy.  The 
> getAlignment() method of MemIntrinsic would have to return the minimum 
> of the pointer-arg alignment attributes for unordered-atomic 
> intrinsics, and the setAlignment() method would have to set both 
> parameter attributes to the given value.
>
> ===
>
>  Thoughts? Opinions?
>
> Thanks,
>  Daniel
>
> ---
> Daniel Neilson, Ph.D.
> Azul Systems
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/304e6a02/attachment.html>