[Openmp-dev] OpenMP GPU Target Offload in Clang

Tue Aug 21 08:28:31 PDT 2018

Hi, OpenMP dev community!

Recently I tried setting up the OpenMP benchmarks for SPEC ACCEL and test it with clang, but I ran into several difficulties.

The core of the issue is that I was not able to get the workload onto the GPUs.  I wrote the following small test

//////////////////////////////////////////////////////////////////////////////////////////////////////////////

#define DATATYPE unsigned long long

/*gpu offload openmp*/
DATATYPE reduce_gpu_omp(DATATYPE *arr, size_t size) {
    DATATYPE result = IDENTITY;
#pragma omp target data map(tofrom:arr[:size]) map(tofrom:result)
    {
#pragma omp target teams distribute parallel for reduction(+:result) schedule(static, 1)
        for (size_t i = 0; i < size; i++) {
            result += arr[i];
        }
    }
    return result;
}

//////////////////////////////////////////////////////////////////////////////////////////////////////////////

And compiled that with clang trunk with the following commands:

clang -O3 -fopenmp -omptargets=nvptx64sm_35-nvidia-linux -Wall -o reduce reduce.c
clang -O3 -fopenmp -omptargets=nvptx64sm_35-nvidia-linux-cuda -Wall -o reduce reduce.c
clang -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Wall -o reduce reduce.c

The offloading to GPU was unsuccessful for all these commands. That said, the CPU load did go up when the kernel above was run,  so the offloading did happen, but the computation was offloaded to the CPU, not the GPU.

My speculation is that I missed some steps setting up the compiler/libraries and the offloading did not happen correctly. Or it could be the fact that reductions were not supported across teams (as stated here https://clang.llvm.org/docs/OpenMPSupport.html).

In the end, I would like to ask two questions:

  1.  What is a good candidate of llvm based compiler to test OpenMP GPU offloading?  Should clang-ykt be used instead of clang trunk?
  2.  What is the recommended procedure for compiler and linker flags to build programs with GPU offloading? Maybe I am not searching correctly, but I was not able to find a documentation on how that is supposed to be done. Additionally, will the compiler show some warning if offloading to GPU is unsuccessful?

Thanks for your help!

Sincerely,
Qiongsi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20180821/2d6e9254/attachment.html>