[Openmp-dev] nested parallelism in libomptarget-nvptx
Jonas Hahnfeld via Openmp-dev
openmp-dev at lists.llvm.org
Mon Sep 10 07:28:49 PDT 2018
Hi Doru,
On 2018-09-10 15:32, Gheorghe-Teod Bercea wrote:
> Hi Jonas,
>
> The experiments in the paper that are under the nested parallelism
> section really do use the nested parallelism scheme. "teams
> distribute" activated all the threads in the team.
I disagree: Only the team master executes the loop body of a "teams
distribute" region. CUDA activates all (CUDA) threads at kernel launch,
but that's really not the point.
> Nested parallelism is activated every time you have an outer region
> with all threads active, calling an inner region that needs to have
> all threads active. No matter which directives you assign the second
> level parallelism to, the scheme for it will use the warp-wise
> execution.
>
> If you have:
>
> #target teams distribute
> {
> // all threads active
This looks like an error? It's the same directive as below, but exhibits
a different behavior?
> # parallel for
> {
> // all threads active - this uses nested parallelism since it
> was called from a region where all threads were active
> }
> }
>
> # target teams distribute
> {
> // one thread per team active
> # parallel for
> {
> // all threads active
> # parallel for
> {
> // all threads active - this uses nested parallelism since
> it was called from a region where all thread are active
> }
> }
> }
That's the pattern I'm looking for. Can you link me to a benchmark that
uses this scheme?
Jonas
More information about the Openmp-dev
mailing list