[PATCH] D101701: [nofree] Refine concurrency requirements

Tue Jul 27 14:34:38 PDT 2021

nhaehnle added a comment.

In D101701#2903331 <https://reviews.llvm.org/D101701#2903331>, @jdoerfert wrote:

> In D101701#2903310 <https://reviews.llvm.org/D101701#2903310>, @nhaehnle wrote:
>
>>> Finally, we started to track threads explicitly already, partially using domain knowledge, which allows us to reason about the interaction between threads 
>>> (https://reviews.llvm.org/D106397#C2702144NL1110). So even in the presence of synchronizations (atomics, barriers, etc), we can use other attributes
>>> (argmemonly, nofree, ...) and such thread tracking to make useful deductions. This is not possible if we interleave the `argmemonly/nofree/...` semantics
>>> with `nosync`. The above optimization is a real thing for a very common scenario on GPUs and also CPUs:
>>>
>>>   run_in_parallel {
>>>     if (threadid == 0)
>>>       effect();
>>>     barrier();
>>>   
>>>     ... parallel stuff
>>>   
>>>     if (threadid == 0)
>>>       effect();
>>>     barrier();
>>>    ...
>>>   }
>>
>> I've been staring at this for quite some time now and I don't understand how it relates to this discussion. Can you be more explicit about which functions here have e.g. argmemonly but not nosync, and how that is used in an optimization?
>
> If we merge the semantics of `nosync` into `nofree`, `argmemonly`, etc. to make them "global" instead of local, anything that contains a barrier/atomic/volatile/convergent operation will loose those attributes. Do you agree?

Yes (convergent operations can be nosync, but the overall point remains).

> Now, if we start to look at barrier/atomic/convergent in more detail, e.g., by tracking the main thread on a GPU device to basically ensure there is no concurrent access to things, we would loose out on the `nofree`, `argmemonly`, etc. attributes of the functions the main thread calls in "critical regions".
> In my example, let's assume effect is called only by the main thread in the two critical regions and it contains an atomic update of a global. Backing `nosync` into `nofree`, ... will prevent us to annotate `effect` with such arguments as it is locally not decidable if it is always called from critical regions. We can however determine it doesn't itself call free. Later, when we determine that the call sites are in critical regions we have the `nofree`, ... attributes available and we can act on them.

Okay, I see now where you're coming from. From the caller's perspective, `effect` isn't nosync, so it may synchronize with some other thread. There's in general no reason why that thread has to be part of the same workgroup or wave. However:

1. One could envision a future refinement of nosync with an attribute that labels `effect` as not communicating outside e.g. a workgroup.

2. There are other caller-based ways in which the possible reach of synchronization could be limited. For example, if there was a way to indicate that all synchronization is tied to memory locations, and `effect` is argmemonly and called with an uncaptured pointer, then it can't synchronize with any thread outside of the parallel region either and the argmemonly is still useful.

So I think this a good argument in favor of having attributes like nofree and argmemonly talk only about what happens in the calling thread.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101701/new/

https://reviews.llvm.org/D101701