[llvm-bugs] [Bug 48330] New: [OpenMP] Teams distribute parallel for nested inside parallel for

via llvm-bugs llvm-bugs at lists.llvm.org
Sun Nov 29 09:23:05 PST 2020


https://bugs.llvm.org/show_bug.cgi?id=48330

            Bug ID: 48330
           Summary: [OpenMP] Teams distribute parallel for nested inside
                    parallel for
           Product: OpenMP
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Runtime Library
          Assignee: unassignedbugs at nondot.org
          Reporter: rofirrim at gmail.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 24219
  --> https://bugs.llvm.org/attachment.cgi?id=24219&action=edit
Testcase

Hi all,

the testcase (based on offloading/parallel_offloading_map.cpp) below seems to
not to distribute all the iterations of the innermost loop. I would expect all
the elements of array `tmp` be updated, however we seem to distribute just a
single iteration because we believe we have more threads than the ones we
actually have.

I can reproduce this with libomptarget.rtl.x86_64.so (running in the host)

Running the testcase with OMP_NUM_THREADS=1 works correctly.

Running with OMP_NUM_THREADS=2 I obtain this (for each iteration of the
outer-loop).

[TARGET][0] || tmp[0] <- 1
[TARGET][0] || tmp[1] <- 1
[TARGET][0] || tmp[2] <- 1
[TARGET][0] || tmp[3] <- 1
Error at tmp[4]
Error at tmp[5]
Error at tmp[6]
Error at tmp[7]

Running with OMP_NUM_THREADS=4 I obtain this

[TARGET][0] || tmp[0] <- 1
[TARGET][0] || tmp[1] <- 1
Error at tmp[2]
Error at tmp[3]
Error at tmp[4]
Error at tmp[5]
Error at tmp[6]
Error at tmp[7]

And so on.

My expectation is that all the iterations of M should be executed.

I tried to debug a bit and I'm not sure to understand all of it. So far I see
that __kmpc_fork_teams invokes __kmp_fork_call which decides that
nthreads is going to be 1.

if (parent_team->t.t_active_level >=                           
    master_th->th.th_current_task->td_icvs.max_active_levels) {
  nthreads = 1;                                                
} else {                                                       

Then __kmp_invoke_teams_master → __kmp_teams_master → __kmp_fork_call which
again sets nthreads to 1 (for the same reason). Now we go through the
serialized parallel code path of __kmp_fork_call and this time we eventually
invoke the microtask. The microtask eventually invokes __kmpc_for_static_init_4
with `*plower == 0` and `*pupper == 7` which seems correct. However when
computing the chunk, we are confused by the fact that team->t.t_nproc is not 1.

We seem to be looking at the parent team because this is a distribute schedule

if (schedtype > kmp_ord_upper) {
  // we are in DISTRIBUTE construct
  schedtype += kmp_sch_static -
               kmp_distribute_static; // AC: convert to usual schedule type
  tid = th->th.th_team->t.t_master_tid;
  team = th->th.th_team->t.t_parent;  // this team was the one available
} else {

And now we compute a smaller chunk even if, apparently, we will execute with a
single thread. I am not sure at what point we got the number of threads wrong.

I'm using the following command line against a standalone build of openmp
(based on the mentioned test from lit)

clang++ -O0 -g -fno-experimental-isel -fopenmp -pthread  \
   -I <top-llvm-srcdir>/openmp/libomptarget/test \
   -I <openmp-builddir>/libomptarget/../runtime/src \
   -L <openmp-builddir>/libomptarget \
   -L <openmp-builddir>/libomptarget/../runtime/src \
   -fopenmp-targets=x86_64-pc-linux-gnu t.cpp -o t \
   -Wl,-rpath,<openmp-builddir>/libomptarget/../runtime/src 

OMP_NUM_THREADS=2 ./t

Kind regards,

// -- t.cpp
#include "omp.h"
#include <cassert>
#include <cstdio>

int main(int argc, char *argv[]) {
  constexpr const int N = 4, M = 8;

  bool error = false;

#pragma omp parallel for
  for (int i = 0; i < N; ++i) { // outer-loop
    int tmp[M] = {0};
    // This optional critical helps debugging, you can remove it.
    #pragma omp critical
    {
#pragma omp target teams distribute parallel for map(tofrom : tmp)
      for (int j = 0; j < M; ++j) {
        printf("[TARGET][%d] || tmp[%d] <- 1\n", omp_get_thread_num(), j);
        tmp[j] += 1;
      }

      // Check
      for (int j = 0; j < M; ++j) {
        if (tmp[j] != 1) {
          printf("Error at tmp[%d]\n", j);
          error = true;
        }
      }
    } // critical
  }

  printf("%s\n", error ? "ERROR" : "PASS");

  return 0;
}
// -- end of t.cpp

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20201129/5c2ff744/attachment.html>


More information about the llvm-bugs mailing list