[Openmp-dev] GPU Target Offloading - Cannot Find GPU

Qiongsi Wu via Openmp-dev openmp-dev at lists.llvm.org
Wed Jan 30 06:21:38 PST 2019


Thanks Jonas!

That helped a lot and the memory consumption shown by Nvidia-smi is now much closer to what I expected!

The blog post is great! Maybe it should be a part of the official clang OpenMP documentation?

Sincerely,
Qiongsi



On Jan 30, 2019, at 2:18 AM, Jonas Hahnfeld <hahnjo at hahnjo.de<mailto:hahnjo at hahnjo.de>> wrote:

Hi,

On 2019-01-29 17:51, Qiongsi Wu via Openmp-dev wrote:
Hi everyone!
I am following Jonas Hahnfeld’s blog post
(https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html)
to build clang to test some simple target offloading workloads.
The complete code can be found at the end.
I tried clang 7.0.1 and llvm trunk (9.0.0). omp_get_num_devices()
returned 0 for both compilers although the server I am running this on
has Quadro P4000 on it. As an attempt to fix the issue, I added
compute compatibility 61 and set it to default. This did not help.
Additionally, the binary built by trunk cannot even offload to CPU
targets, reporting error
"Libomptarget fatal error 1: default offloading policy must switched
to mandatory or disabled”
May I get some help from the community on what to look into to get
this microbenchmark to run on the GPU?
Thanks!
######################################################################################################
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>
#define VALUE_TYPE double
#define ARR_SIZE  128 * 1024 * 1024
struct timespec timespec_diff(struct timespec t1, struct timespec t2)
{
   struct timespec diff;
   diff.tv_nsec = t1.tv_nsec - t2.tv_nsec;
   diff.tv_sec = t1.tv_sec - t2.tv_sec;
   if (diff.tv_nsec < 0) {
       diff.tv_nsec += 1000000000;
       diff.tv_sec -= 1;
   }
   return diff;
}
double timespec_to_sec(struct timespec t) {
   double sec = (double)t.tv_sec;
   double nsec = (double)t.tv_nsec;
   return sec + nsec * 1E-9;
}
void gen_rand_number(VALUE_TYPE *arr, int size) {
   for (int i = 0; i < size; i++) {
       arr[i] = (VALUE_TYPE)rand();
   }
}
void cpu_add(VALUE_TYPE *out, VALUE_TYPE *in_1, VALUE_TYPE *in_2, int
size) {
   for (int i = 0; i < size; i++) {
       out[i] = in_1[i] + in_2[i];
   }
}
void omp_cpu_add(VALUE_TYPE *out, VALUE_TYPE *in_1, VALUE_TYPE *in_2,
int size) {
#pragma omp parallel for
   for (int i = 0; i < size; i++) {
       out[i] = in_1[i] + in_2[i];
   }
}
void omp_gpu_add(VALUE_TYPE *out, VALUE_TYPE *in_1, VALUE_TYPE *in_2,
int size) {
#pragma omp target map(to:in_1, in_2) map(from:out)

I think the compiler will have trouble to correctly map / transfer the arrays here. Can you try specifying array sections?
Should be as easy as "map(to:in_1[0:size], in_2[0:size]) map(from:out[0:size])"

Hope this helps and fixes the problem :)
Jonas

#pragma omp teams
#pragma omp distribute parallel for
   for (int i = 0; i < size; i++) {
       out[i] = in_1[i] + in_2[i];
   }
}
int main() {
   VALUE_TYPE *a = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
   VALUE_TYPE *b = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
   VALUE_TYPE *c_1 = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
   VALUE_TYPE *c_2 = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
   VALUE_TYPE *c_3 = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
   struct timespec start, end, diff;
   gen_rand_number(a, ARR_SIZE);
   gen_rand_number(b, ARR_SIZE);
   printf("number of omp devices %d\n", omp_get_num_devices());
   printf("number of omp teams %d\n", omp_get_num_teams());
   clock_gettime(CLOCK_MONOTONIC, &start);
   cpu_add(c_1, a, b, ARR_SIZE);
   clock_gettime(CLOCK_MONOTONIC, &end);
   printf("CPU took %f seconds. \n",
timespec_to_sec(timespec_diff(end, start)));
   clock_gettime(CLOCK_MONOTONIC, &start);
   omp_cpu_add(c_2, a, b, ARR_SIZE);
   clock_gettime(CLOCK_MONOTONIC, &end);
   printf("CPU omp took %f seconds. \n",
timespec_to_sec(timespec_diff(end, start)));
   clock_gettime(CLOCK_MONOTONIC, &start);
   for (int i = 0; i < 100; i++) {
       omp_gpu_add(c_3, a, b, ARR_SIZE);
   }
   clock_gettime(CLOCK_MONOTONIC, &end);
   printf("GPU omp offload took %f seconds. \n",
timespec_to_sec(timespec_diff(end, start)) / 100.0);
}
######################################################################################################
Sincerely,
Qiongsi
_______________________________________________
Openmp-dev mailing list
Openmp-dev at lists.llvm.org<mailto:Openmp-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20190130/95f155b1/attachment-0001.html>


More information about the Openmp-dev mailing list