[Openmp-commits] [PATCH] D77609: [OpenMP] Added the support for hidden helper task in RTL
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Jan 22 08:33:24 PST 2021
JonChesterfield added a comment.
The information I've got on the possible race is:
When this patch is applied (by git's automerge, I think) to the rocm stack, a test located at:
https://github.com/ROCm-Developer-Tools/aomp/blob/master/test/smoke/devices/devices.c
fails in unpredictable fashion.
I've reproduced the test here as it's fairly short, but it uses some functions on the device that aren't implemented in trunk.
#include <stdio.h>
#include <omp.h>
int main() {
int num_devs = omp_get_num_devices();
for (int device_num = 0; device_num < num_devs ; device_num++) {
#pragma omp target device(device_num) nowait
#pragma omp teams num_teams(2) thread_limit(4)
#pragma omp parallel num_threads(2)
{
// need to pass the total device number to all devices, per module load
int num_threads = omp_get_num_threads();
int num_teams = omp_get_num_teams();
int num_devices = omp_get_num_devices(); // not legal in 4.5
// need to pass the device id to the device starting the kernel
int thread_id = omp_get_thread_num();
int team_id = omp_get_team_num();
int device_id = omp_get_device_num(); // no API in omp 4.5
// assume we have homogeneous devices
int total_threads = num_devices * num_teams * num_threads;
int gthread_id = (device_id * num_teams * num_threads) + (team_id * num_threads) + thread_id;
// print out id
printf("Hello OpenMP 5 from \n");
printf(" Device num %d of %d devices\n", device_id, num_devices);
printf(" Team num %d of %d teams \n", team_id, num_teams);
printf(" Thread num %d of %d threads\n", thread_id, num_threads);
printf(" Global thread %d of %d total threads\n", gthread_id, total_threads);
};
};
#pragma omp taskwait
printf("The host device num is %d\n", omp_get_device_num());
printf("The initial device num is %d\n", omp_get_initial_device());
printf("The number of devices are %d\n", num_devs);
}
I would guess that there's an interaction between the existing threading model, the above patch, and rocm's implementation of printf which yields disaster.
I'm going to see if I can run it on nvptx, but no promises - my cuda machine is not in a good way.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77609/new/
https://reviews.llvm.org/D77609
More information about the Openmp-commits
mailing list