[llvm-dev] RFC: Adding a !thread.private metadata

JF Bastien via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 14 16:44:26 PDT 2018



> On Sep 14, 2018, at 4:33 PM, Philip Reames <listmail at philipreames.com> wrote:
> 
> 
> 
> On 09/14/2018 04:22 PM, JF Bastien wrote:
>> That sounds fine to me. I agree it seems interesting, and kinda low-gain.
>> 
>> It’s an attribute and we’d be able to drop it without losing correctness. We can drop this functionality if it doesn’t pan out which would be harder if we went with a new memory order.
> Just to check, you meant to say "metadata" not "attribute" right?

Yes.


>> I think this should be exposable as a clang attribute in C++ as well. I’m not saying it’s a good idea, but if you do implement the optimization I’d like to see what it looks like for users to opt-in to this.
> I'll leave that part to you.  :)
>> Won’t you be putting this on most allocas because most don’t escape?
> I wasn't planning on adding the metadata based on analysis.  I was thinking more a utility function along the lines of the following:
> bool isKnownThreadPrivateAccess(Instruction *I, ..analysis info...)
> 
> Where the implementation would end up using capture tracking for things like allocas, but have a fast path return if the instruction itself had the metadata.
> 
>> 
>> Is there a problem with link-once ODR functions using this info differently?
> Probably.  Derefinement is a real pain, but also an entirely separate issue.  :)
>> 
>> One downside with an attribute: can we annotate “epochs” where a value sometimes is single-thread, and other times is shared? I don’t think so, but it might be fine.
> I can't come up with a good model for this attribute/metadata wise. At least, not one which gives me anything useful from an optimization standpoint.
> 
> I imagine we would end up with a isKnownThreadPrivateBefore(Instruction *I, ...analysis...) variant though.  (Similar to what we have for pointer capturing.)
> 
> However, relying on such a result is generally really dangerous because we don't have a good way to model a publication fence at the moment.  That's definitely a separate issue, so let's separate that if you don't mind.
>> 
>> 
>>> On Sep 14, 2018, at 4:13 PM, Philip Reames <listmail at philipreames.com> wrote:
>>> 
>>> Problem
>>> 
>>> LLVM's memory model for NonAtomic accesses is generally fairly weak, but explicitly disallows inserting stores that didn't occur in the original program.  This is required for any potentially shared location, but is overkill for any memory location which is provably only accessed by a single thread.
>>> 
>>> My particular motivating example is a single thread private field in our implementation, but there are numerous languages which provide thread private storage options and right now, LLVM has no good way to represent them.
>>> 
>>> (Just to set expectations appropriately: the example which made me write this up is purely a "hey, that's interesting" case at the moment.  It's not a major blocking item or anything.  As such, I'm mostly throwing this out for discussion because it's interesting.)
>>> 
>>> Proposed Solution
>>> 
>>> Add a new metadata type which applies to memory accessing instructions (store, load, atomicrmw, etc...) and indicates that the memory location accessed is known to be accessed only by a single thread everywhere it is dereferenceable.
>>> 
>>> The framing is very similar to the one we use for !invariant.load and for much the same reasons.  If we can prove a location is dereferenceable, we want to be able to insert a store along any dereferenceable path through the function without worrying whether the original location was known to execute or not.  At the moment, the main transform to leverage this would be load store promotion in LICM which would be taught that inserting a loop exit is legal, even if the store didn't execute within the dynamic execution of the loop, if the metadata is present.
>>> 
>>> Alternatives and Discussion
>>> 
>>> LLVM IR has existing support for thread local storage, but this doesn't solve our problem.  There's nothing that presents one thread from capturing the address of it's thread local copy and publishing that address in a location visible to other threads. Given a thread local variable and a nocapture result, we can conclude the location is thread private.  (Same for allocas, mallocs, etc...)
>>> 
>>> As just noted, there are places where we can infer that an access is thread private.  I think it makes sense to expose this as an analysis utility or pass.  We have bits of this already existing in LICM which could be pulled out, renamed, and reused.  There are various other transforms we could implement for thread private locations (e.g. replace an atomicrmw on a thread private with a load, op, store sequence), but I'm not sure these are actually worth implementing at the moment.
>>> 
>>> We could extend the memory model with a weaker access type.  I think our current NotAtomic is a good default, but we could consider adding a ThreadPrivate specifier which is weaker than the existing NotAtomic in exactly the same way that the metadata implies.  This is a reasonable implementation strategy, but might be a bit more work than I can practically commit to at the moment.
>>> 
>>> Hal recently brought up the idea of a nosync function attribute. If I understand the intended semantics properly, such functions aren't guaranteed to access strictly thread private locations. They're simply required not to synchronize; that is, they are allowed to access shared variables in a racy manner.
>>> 
>>> Philip
>>> 
>>> 
>>> 
> 



More information about the llvm-dev mailing list