[PATCH] D101701: [nofree] Refine concurrency requirements

Sun Jul 25 14:13:03 PDT 2021

jdoerfert added a comment.

In D101701#2902884 <https://reviews.llvm.org/D101701#2902884>, @nhaehnle wrote:

> You really also need `nocapture` on the argument of `argmemonly_nofree_but_not_nosync` for this to work, right?

yes, probably.

In D101701#2903310 <https://reviews.llvm.org/D101701#2903310>, @nhaehnle wrote:

>> Finally, we started to track threads explicitly already, partially using domain knowledge, which allows us to reason about the interaction between threads 
>> (https://reviews.llvm.org/D106397#C2702144NL1110). So even in the presence of synchronizations (atomics, barriers, etc), we can use other attributes
>> (argmemonly, nofree, ...) and such thread tracking to make useful deductions. This is not possible if we interleave the `argmemonly/nofree/...` semantics
>> with `nosync`. The above optimization is a real thing for a very common scenario on GPUs and also CPUs:
>>
>>   run_in_parallel {
>>     if (threadid == 0)
>>       effect();
>>     barrier();
>>   
>>     ... parallel stuff
>>   
>>     if (threadid == 0)
>>       effect();
>>     barrier();
>>    ...
>>   }
>
> I've been staring at this for quite some time now and I don't understand how it relates to this discussion. Can you be more explicit about which functions here have e.g. argmemonly but not nosync, and how that is used in an optimization?

If we merge the semantics of `nosync` into `nofree`, `argmemonly`, etc. to make them "global" instead of local, anything that contains a barrier/atomic/volatile/convergent operation will loose those attributes. Do you agree?
Now, if we start to look at barrier/atomic/convergent in more detail, e.g., by tracking the main thread on a GPU device to basically ensure there is no concurrent access to things, we would loose out on the `nofree`, `argmemonly`, etc. attributes of the functions the main thread calls in "critical regions".
In my example, let's assume effect is called only by the main thread in the two critical regions and it contains an atomic update of a global. Backing `nosync` into `nofree`, ... will prevent us to annotate `effect` with such arguments as it is locally not decidable if it is always called from critical regions. We can however determine it doesn't itself call free. Later, when we determine that the call sites are in critical regions we have the `nofree`, ... attributes available and we can act on them.

In D101701#2902902 <https://reviews.llvm.org/D101701#2902902>, @nhaehnle wrote:

> But maybe that's just a variation on what you had in mind. In any case, this kind of side channel is clearly ridiculous and should just be closed. I doubt many people would oppose that :)

I would not oppose it, I want it to be nosync, though I gave up on my patch. I fear that opens up security problems because the real malloc/free reuse memory while the abstract machine does not need to. Either way, this is something to keep in mind.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101701/new/

https://reviews.llvm.org/D101701