[PATCH] D101976: [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL

Thu May 6 13:41:26 PDT 2021

jdoerfert added a comment.

In D101976#2742788 <https://reviews.llvm.org/D101976#2742788>, @JonChesterfield wrote:

> In D101976#2742188 <https://reviews.llvm.org/D101976#2742188>, @jdoerfert wrote:
>
>> In D101976#2742166 <https://reviews.llvm.org/D101976#2742166>, @JonChesterfield wrote:
>>
>>> What are the required semantics of the barrier operations? Amdgcn builds them on shared memory, so probably needs a change to the corresponding target_impl to match
>>
>> I have *not* tested AMDGCN but I was not expecting a problem. The semantics I need here is: 
>>  warp N, thread     0 hits a barrier instruction I0
>>  warp N, threads 1-31 hit  a barrier instruction I1
>>  the entire warp synchronizes and moves on.
>
> One hazard is the amdgpu devicertl only has one barrier. D102016 <https://reviews.llvm.org/D102016> makes it simpler to add a second. I'd guess we want named_sync to call one barrier and syncthreads to call a different one, so we should probably rename those functions. The LDS barrier implementation needs to know how many threads to wait for, we may be OK passing 'all the threads' down from the __syncthreads entry point.
>
> The other is the single instruction pointer per wavefront, like pre-volta nvidia cards (which I believe we also expect to work). I'm not sure whether totally independent barriers will work, or whether we'll need to arrange for thread 0 and thread 1-31 to call the two different barriers at the same point in control flow.

So what do you wnat me to change for this patch now?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101976/new/

https://reviews.llvm.org/D101976