[llvm-dev] RFC: Adding a !thread.private metadata

Fri Sep 14 16:13:22 PDT 2018

Problem

LLVM's memory model for NonAtomic accesses is generally fairly weak, but 
explicitly disallows inserting stores that didn't occur in the original 
program.  This is required for any potentially shared location, but is 
overkill for any memory location which is provably only accessed by a 
single thread.

My particular motivating example is a single thread private field in our 
implementation, but there are numerous languages which provide thread 
private storage options and right now, LLVM has no good way to represent 
them.

(Just to set expectations appropriately: the example which made me write 
this up is purely a "hey, that's interesting" case at the moment.  It's 
not a major blocking item or anything.  As such, I'm mostly throwing 
this out for discussion because it's interesting.)

Proposed Solution

Add a new metadata type which applies to memory accessing instructions 
(store, load, atomicrmw, etc...) and indicates that the memory location 
accessed is known to be accessed only by a single thread everywhere it 
is dereferenceable.

The framing is very similar to the one we use for !invariant.load and 
for much the same reasons.  If we can prove a location is 
dereferenceable, we want to be able to insert a store along any 
dereferenceable path through the function without worrying whether the 
original location was known to execute or not.  At the moment, the main 
transform to leverage this would be load store promotion in LICM which 
would be taught that inserting a loop exit is legal, even if the store 
didn't execute within the dynamic execution of the loop, if the metadata 
is present.

Alternatives and Discussion

LLVM IR has existing support for thread local storage, but this doesn't 
solve our problem.  There's nothing that presents one thread from 
capturing the address of it's thread local copy and publishing that 
address in a location visible to other threads. Given a thread local 
variable and a nocapture result, we can conclude the location is thread 
private.  (Same for allocas, mallocs, etc...)

As just noted, there are places where we can infer that an access is 
thread private.  I think it makes sense to expose this as an analysis 
utility or pass.  We have bits of this already existing in LICM which 
could be pulled out, renamed, and reused.  There are various other 
transforms we could implement for thread private locations (e.g. replace 
an atomicrmw on a thread private with a load, op, store sequence), but 
I'm not sure these are actually worth implementing at the moment.

We could extend the memory model with a weaker access type.  I think our 
current NotAtomic is a good default, but we could consider adding a 
ThreadPrivate specifier which is weaker than the existing NotAtomic in 
exactly the same way that the metadata implies.  This is a reasonable 
implementation strategy, but might be a bit more work than I can 
practically commit to at the moment.

Hal recently brought up the idea of a nosync function attribute. If I 
understand the intended semantics properly, such functions aren't 
guaranteed to access strictly thread private locations. They're simply 
required not to synchronize; that is, they are allowed to access shared 
variables in a racy manner.

Philip