[Openmp-commits] [PATCH] D77609: [OpenMP] Added the support for hidden helper task in RTL

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Jan 22 08:33:24 PST 2021


JonChesterfield added a comment.

The information I've got on the possible race is:
When this patch is applied (by git's automerge, I think) to the rocm stack, a test located at:
https://github.com/ROCm-Developer-Tools/aomp/blob/master/test/smoke/devices/devices.c
fails in unpredictable fashion.

I've reproduced the test here as it's fairly short, but it uses some functions on the device that aren't implemented in trunk.

  #include <stdio.h>
  #include <omp.h>
  
  int main() {
    int num_devs = omp_get_num_devices();
    for (int device_num = 0; device_num < num_devs ; device_num++) {
  #pragma omp target device(device_num) nowait
  #pragma omp teams num_teams(2) thread_limit(4)
  #pragma omp parallel num_threads(2)
      {
        // need to pass the total device number to all devices, per module load
        int num_threads = omp_get_num_threads();
        int num_teams   = omp_get_num_teams();
        int num_devices = omp_get_num_devices(); // not legal in 4.5
  
        // need to pass the device id to the device starting the kernel
        int thread_id   = omp_get_thread_num();
        int team_id     = omp_get_team_num();
        int device_id   = omp_get_device_num();  // no API in omp 4.5
  
        // assume we have homogeneous devices
        int total_threads = num_devices * num_teams * num_threads;
        int gthread_id    = (device_id * num_teams * num_threads) + (team_id * num_threads) + thread_id;
  
        // print out id
        printf("Hello OpenMP 5 from \n");
        printf(" Device num  %d of %d devices\n", device_id, num_devices);
        printf(" Team num    %d of %d teams  \n", team_id,   num_teams);
        printf(" Thread num  %d of %d threads\n", thread_id, num_threads);
        printf(" Global thread %d of %d total threads\n", gthread_id, total_threads);
      };
    };
  #pragma omp taskwait
    printf("The host device num is %d\n", omp_get_device_num());
    printf("The initial device num is %d\n", omp_get_initial_device());
    printf("The number of devices are %d\n", num_devs);
  }

I would guess that there's an interaction between the existing threading model, the above patch, and rocm's implementation of printf which yields disaster.

I'm going to see if I can run it on nvptx, but no promises - my cuda machine is not in a good way.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77609/new/

https://reviews.llvm.org/D77609



More information about the Openmp-commits mailing list