[Openmp-dev] OpenMP GPU Target Offload in Clang

Qiongsi Wu via Openmp-dev openmp-dev at lists.llvm.org
Tue Aug 21 10:29:24 PDT 2018

Hi Alexey!

Thanks for the clarification! I am moving on to testing stencil computations instead.



From: Alexey Bataev <a.bataev at outlook.com>
Sent: Tuesday, August 21, 2018 12:36:32 PM
To: Qiongsi Wu; openmp-dev at lists.llvm.org
Subject: Re: [Openmp-dev] OpenMP GPU Target Offload in Clang

Hi Qiongsi,

1. The correct form of compiler invocation is the 3rd one.

2. You're trying to use feature that is not supported by the compiler yet - reductions across the teams. Generally speaking, teams reductions currently causes crash at the runtime and then execution on the host.

Best regards,
Alexey Bataev

21.08.2018 11:28, Qiongsi Wu via Openmp-dev пишет:

Hi, OpenMP dev community!

Recently I tried setting up the OpenMP benchmarks for SPEC ACCEL and test it with clang, but I ran into several difficulties.

The core of the issue is that I was not able to get the workload onto the GPUs.  I wrote the following small test


#define DATATYPE unsigned long long

/*gpu offload openmp*/
DATATYPE reduce_gpu_omp(DATATYPE *arr, size_t size) {
#pragma omp target data map(tofrom:arr[:size]) map(tofrom:result)
#pragma omp target teams distribute parallel for reduction(+:result) schedule(static, 1)
        for (size_t i = 0; i < size; i++) {
            result += arr[i];
    return result;


And compiled that with clang trunk with the following commands:

clang -O3 -fopenmp -omptargets=nvptx64sm_35-nvidia-linux -Wall -o reduce reduce.c
clang -O3 -fopenmp -omptargets=nvptx64sm_35-nvidia-linux-cuda -Wall -o reduce reduce.c
clang -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Wall -o reduce reduce.c

The offloading to GPU was unsuccessful for all these commands. That said, the CPU load did go up when the kernel above was run,  so the offloading did happen, but the computation was offloaded to the CPU, not the GPU.

My speculation is that I missed some steps setting up the compiler/libraries and the offloading did not happen correctly. Or it could be the fact that reductions were not supported across teams (as stated here https://clang.llvm.org/docs/OpenMPSupport.html).

In the end, I would like to ask two questions:

  1.  What is a good candidate of llvm based compiler to test OpenMP GPU offloading?  Should clang-ykt be used instead of clang trunk?
  2.  What is the recommended procedure for compiler and linker flags to build programs with GPU offloading? Maybe I am not searching correctly, but I was not able to find a documentation on how that is supposed to be done. Additionally, will the compiler show some warning if offloading to GPU is unsuccessful?

Thanks for your help!


Openmp-dev mailing list
Openmp-dev at lists.llvm.org<mailto:Openmp-dev at lists.llvm.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20180821/182e72f6/attachment-0001.html>

More information about the Openmp-dev mailing list