[Openmp-dev] GPU Target Offloading - Cannot Find GPU

Tue Jan 29 08:51:33 PST 2019

Hi everyone!

I am following Jonas Hahnfeld’s blog post (https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html) to build clang to test some simple target offloading workloads.

The complete code can be found at the end.

I tried clang 7.0.1 and llvm trunk (9.0.0). omp_get_num_devices() returned 0 for both compilers although the server I am running this on has Quadro P4000 on it. As an attempt to fix the issue, I added compute compatibility 61 and set it to default. This did not help.

Additionally, the binary built by trunk cannot even offload to CPU targets, reporting error

"Libomptarget fatal error 1: default offloading policy must switched to mandatory or disabled”

May I get some help from the community on what to look into to get this microbenchmark to run on the GPU?

Thanks!

######################################################################################################
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>

#define VALUE_TYPE double
#define ARR_SIZE  128 * 1024 * 1024

struct timespec timespec_diff(struct timespec t1, struct timespec t2) {
    struct timespec diff;

    diff.tv_nsec = t1.tv_nsec - t2.tv_nsec;
    diff.tv_sec = t1.tv_sec - t2.tv_sec;
    if (diff.tv_nsec < 0) {
        diff.tv_nsec += 1000000000;
        diff.tv_sec -= 1;
    }

    return diff;
}

double timespec_to_sec(struct timespec t) {
    double sec = (double)t.tv_sec;
    double nsec = (double)t.tv_nsec;
    return sec + nsec * 1E-9;
}

void gen_rand_number(VALUE_TYPE *arr, int size) {
    for (int i = 0; i < size; i++) {
        arr[i] = (VALUE_TYPE)rand();
    }
}

void cpu_add(VALUE_TYPE *out, VALUE_TYPE *in_1, VALUE_TYPE *in_2, int size) {
    for (int i = 0; i < size; i++) {
        out[i] = in_1[i] + in_2[i];
    }
}

void omp_cpu_add(VALUE_TYPE *out, VALUE_TYPE *in_1, VALUE_TYPE *in_2, int size) {
#pragma omp parallel for
    for (int i = 0; i < size; i++) {
        out[i] = in_1[i] + in_2[i];
    }
}

void omp_gpu_add(VALUE_TYPE *out, VALUE_TYPE *in_1, VALUE_TYPE *in_2, int size) {
#pragma omp target map(to:in_1, in_2) map(from:out)
#pragma omp teams
#pragma omp distribute parallel for
    for (int i = 0; i < size; i++) {
        out[i] = in_1[i] + in_2[i];
    }
}

int main() {
    VALUE_TYPE *a = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
    VALUE_TYPE *b = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
    VALUE_TYPE *c_1 = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
    VALUE_TYPE *c_2 = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);
    VALUE_TYPE *c_3 = malloc(sizeof(VALUE_TYPE) * ARR_SIZE);

    struct timespec start, end, diff;

    gen_rand_number(a, ARR_SIZE);
    gen_rand_number(b, ARR_SIZE);

    printf("number of omp devices %d\n", omp_get_num_devices());
    printf("number of omp teams %d\n", omp_get_num_teams());

    clock_gettime(CLOCK_MONOTONIC, &start);
    cpu_add(c_1, a, b, ARR_SIZE);
    clock_gettime(CLOCK_MONOTONIC, &end);
    printf("CPU took %f seconds. \n", timespec_to_sec(timespec_diff(end, start)));

    clock_gettime(CLOCK_MONOTONIC, &start);
    omp_cpu_add(c_2, a, b, ARR_SIZE);
    clock_gettime(CLOCK_MONOTONIC, &end);
    printf("CPU omp took %f seconds. \n", timespec_to_sec(timespec_diff(end, start)));

    clock_gettime(CLOCK_MONOTONIC, &start);
    for (int i = 0; i < 100; i++) {
        omp_gpu_add(c_3, a, b, ARR_SIZE);
    }
    clock_gettime(CLOCK_MONOTONIC, &end);
    printf("GPU omp offload took %f seconds. \n", timespec_to_sec(timespec_diff(end, start)) / 100.0);
}

######################################################################################################

Sincerely,
Qiongsi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20190129/5d56aa4d/attachment-0001.html>