[Openmp-dev] when is offloading to NVIDIA targets available?

Jonas Hahnfeld via Openmp-dev openmp-dev at lists.llvm.org
Wed Nov 1 11:39:01 PDT 2017


Hi,

some more comments inline in addition to what George replied. In 
general, there has been the exact same discussion some weeks ago: 
http://lists.llvm.org/pipermail/openmp-dev/2017-October/001850.html

Am 2017-11-01 13:20, schrieb Siegmar Gross via Openmp-dev:
> [...]
> 
> Unfortunately I still have the same problems which I reported in Bug 
> 34104
> nearly two months ago, if I try to offload to a NVIDIA target.

First, please use the "OpenMP" product and its component "Clang Compiler 
Support" for these kind of bug reports. I think that's the place that 
most of the people monitor...

> I know that
> OPENMP_ENABLE_LIBOMPTARGET isn't enabled by default at the moment.
> Nevertheless, I would be grateful if somebody can tell me when 
> offloading
> to NVIDIA targets will be available. Does somebody know, why I get a 
> wrong
> value for the number of devices if I use the CPU version?

Depends on what "wrong" means for you. In the view of the compiler and 
runtime library, they report the "right" values ;-)

> 
> loki introduction 107 clang -fopenmp
> -fopenmp-targets=x86_64-pc-linux-gnu dot_prod_accelerator_OpenMP.c

This tells the compiler to generate code to offload to the x86 host. 
Hence, the compiled binary can only deal with that particular target 
which is the base assumption for the runtime library...

> -lomptarget
> loki introduction 108 a.out
> Number of processors:     24
> Number of devices:        4

... and that's why it reports 4 "artificial" devices here. This value is 
hard-coded and meant for debugging, because apparently you don't gain 
much from offloading to the host...

> Default device:           0
> sum = 6.000000e+08
> 
> The output is wrong, because I have two six-core processors (24
> hwthreads) and one NVIDIA GPU. gcc-7.1.0 reports correct values.
> 
> loki introduction 109 gcc -fopenmp dot_prod_accelerator_OpenMP.c
>    loki introduction 110 a.out
> Number of processors:     24
> Number of devices:        1

Based on your general question, I assume that you built GCC to compile 
for Nvidia GPUs by default? There is a difference here: With Clang, you 
specify the target when compiling your application, while for GCC you 
enable a target when you build the compiler.

I hope this answers most of your questions.
Jonas


More information about the Openmp-dev mailing list