[Openmp-dev] nested parallelism in libomptarget-nvptx

Mon Sep 10 07:28:49 PDT 2018

Hi Doru,

On 2018-09-10 15:32, Gheorghe-Teod Bercea wrote:
> Hi Jonas,
> 
> The experiments in the paper that are under the nested parallelism
> section really do use the nested parallelism scheme. "teams
> distribute" activated all the threads in the team.

I disagree: Only the team master executes the loop body of a "teams 
distribute" region. CUDA activates all (CUDA) threads at kernel launch, 
but that's really not the point.

> Nested parallelism is activated every time you have an outer region
> with all threads active, calling an inner region that needs to have
> all threads active. No matter which directives you assign the second
> level parallelism to, the scheme for it will use the warp-wise
> execution.
> 
> If  you have:
> 
> #target teams distribute
> {
>     // all threads active

This looks like an error? It's the same directive as below, but exhibits 
a different behavior?

>     # parallel for
>     {
>         // all threads active - this uses nested parallelism since it
> was called from a region where all threads were active
>     }
> }
> 
> # target teams distribute
> {
>      // one thread per team active
>      # parallel for
>      {
>         // all threads active
>         # parallel for
>         {
>             // all threads active - this uses nested parallelism since
> it was called from a region where all thread are active
>         }
>      }
> }

That's the pattern I'm looking for. Can you link me to a benchmark that 
uses this scheme?

Jonas