[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

Fri Aug 21 13:56:32 PDT 2015

That’s an interesting case.  AFAIK, inlining should be restricted here for the reason you mention.  Inlining should only be valid if it doesn’t duplicate the barrier.  Which nvcc shows this behavior?

Adding Vinod and Yuan for comment.

> On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com> wrote:
> 
> Hi Justin, 
> 
> Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, 
> 
> void foo() {
>   __syncthreads();
> }
> 
> if (threadIdx.x % 2 == 0) {
>   ...
>   foo();
> } else {
>   ...
>   foo();
> }
> 
> Before inlining, all threads meet at one __syncthreads(). After inlining
> 
> if (threadIdx.x % 2 == 0) {
>   ...
>   __syncthreads();
> } else {
>   ...
>   __syncthreads();
> }
> 
> The __syncthreads call is duplicated, and it's no longer guaranteed that all threads can meet one __syncthreads(). 
> 
> Any thoughts? 
> 
> Jingyue

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------