[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

Bjarke Roune via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 9 14:46:46 PDT 2015


Hi Justin, Yuan and Vinod,

It seems that what __syncthreads() requires in CUDA C++ (as opposed to PTX)
is to be executed uniformly across all threads in the block and not just
the warp. If so, it would be helpful if there were a precise statement
about when a statement is considered to be executed uniformly in CUDA C++.
Is there a precise statement somewhere from NVIDIA about this? I haven't
found one so far.

In particular, it's not clear to me at what point diverging threads are
considered to have joined up again in CUDA C++. My best guess is that this
is at the immediate post-dominator of the statement that starts the
divergence, with the caveat that there is an implicit shared CFG node
following each return statement in a function.

Bjarke

On Fri, Aug 21, 2015 at 4:51 PM, Jingyue Wu <jingyue at google.com> wrote:

> Looking at this section
> <http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-bar> in
> the PTX ISA, there's a sentence saying:
>
> > In conditionally executed code, a bar instruction should only be used if
> it is known that all threads evaluate the
> > condition identically (the warp does not diverge).
>
> Does that mean __syncthreads should only be called uniformly when no
> threads diverge? If so, my sync.cu example is undefined. The reason is
> that, although every threads reach __syncthreads, they are reaching them
> divergently:
> 1. threads diverge at the "if" statement
> 2. the warp runs __syncthreads() with half of the threads enabled
> 3. the warp jumps back to the "else" branch
> 4. the warp runs __syncthreads() with the other half of the threads enabled
>
> If my understanding is correct (__syncthreads() can only be called when
> the warp doesn't diverge), unrolling a loop that contains a __syncthreads()
> and inlining a function that may call __syncthreads() are fine. Am I right?
>
> Jingyue
>
>
>
> On Fri, Aug 21, 2015 at 3:11 PM, Jingyue Wu <jingyue at google.com> wrote:
>
>> I'm using 7.0. I am attaching the reduced example.
>>
>> nvcc sync.cu -arch=sm_35 -ptx
>>
>> gives
>>
>>         // .globl       _Z3foov
>> .visible .entry _Z3foov(
>>
>> )
>> {
>>         .reg .pred      %p<2>;
>>         .reg .s32       %r<3>;
>>
>>
>>         mov.u32         %r1, %tid.x;
>>         and.b32         %r2, %r1, 1;
>>         setp.eq.b32     %p1, %r2, 1;
>>         @!%p1 bra       BB7_2;
>>         bra.uni         BB7_1;
>>
>> BB7_1:
>>         bar.sync        0;
>>         bra.uni         BB7_3;
>>
>> BB7_2:
>>         bar.sync        0;
>>
>> BB7_3:
>>         ret;
>> }
>>
>> As you see, bar.sync is duplicated.
>>
>> On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski <
>> jholewinski at nvidia.com> wrote:
>>
>>> That’s an interesting case.  AFAIK, inlining should be restricted here
>>> for the reason you mention.  Inlining should only be valid if it doesn’t
>>> duplicate the barrier.  Which nvcc shows this behavior?
>>>
>>> Adding Vinod and Yuan for comment.
>>>
>>> > On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com> wrote:
>>> >
>>> > Hi Justin,
>>> >
>>> > Is a compiler allowed to inline a function that calls __syncthreads? I
>>> saw nvcc does that, but not sure it's valid though. For example,
>>> >
>>> > void foo() {
>>> >   __syncthreads();
>>> > }
>>> >
>>> > if (threadIdx.x % 2 == 0) {
>>> >   ...
>>> >   foo();
>>> > } else {
>>> >   ...
>>> >   foo();
>>> > }
>>> >
>>> > Before inlining, all threads meet at one __syncthreads(). After
>>> inlining
>>> >
>>> > if (threadIdx.x % 2 == 0) {
>>> >   ...
>>> >   __syncthreads();
>>> > } else {
>>> >   ...
>>> >   __syncthreads();
>>> > }
>>> >
>>> > The __syncthreads call is duplicated, and it's no longer guaranteed
>>> that all threads can meet one __syncthreads().
>>> >
>>> > Any thoughts?
>>> >
>>> > Jingyue
>>>
>>>
>>>
>>> -----------------------------------------------------------------------------------
>>> This email message is for the sole use of the intended recipient(s) and
>>> may contain
>>> confidential information.  Any unauthorized review, use, disclosure or
>>> distribution
>>> is prohibited.  If you are not the intended recipient, please contact
>>> the sender by
>>> reply email and destroy all copies of the original message.
>>>
>>> -----------------------------------------------------------------------------------
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/197d133d/attachment.html>


More information about the llvm-dev mailing list