<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

</head>

<body dir="ltr">

<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">

<p style="margin-top:0;margin-bottom:0">Hi Joachim! <br>

</p>

<p style="margin-top:0;margin-bottom:0"><br>

</p>

<p style="margin-top:0;margin-bottom:0">Thanks for your help! I missed the cmake flags for OpenMP targets when building. Additionally, I found that my libelf was not installed properly. After rebuilding clang and removing the reduction code, the code offload

 was successful. <br>

</p>

<p style="margin-top:0;margin-bottom:0"><br>

</p>

<p style="margin-top:0;margin-bottom:0">The debugging code you suggested works like a charm!

<br>

</p>

<p style="margin-top:0;margin-bottom:0"><br>

</p>

<p style="margin-top:0;margin-bottom:0">Sincerely,</p>

<p style="margin-top:0;margin-bottom:0">Qiongsi<br>

</p>

</div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Joachim Protze <protze.joachim@gmail.com><br>

<b>Sent:</b> Tuesday, August 21, 2018 12:37:02 PM<br>

<b>To:</b> Qiongsi Wu; OpenMP-dev<br>

<b>Subject:</b> Re: [Openmp-dev] OpenMP GPU Target Offload in Clang</font>

<div> </div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">

<div class="PlainText">Hi Qiongsi,<br>

<br>

On 08/21/2018 05:28 PM, Qiongsi Wu via Openmp-dev wrote:<br>

> Hi, OpenMP dev community!<br>

> <br>

> <br>

> Recently I tried setting up the OpenMP benchmarks for SPEC ACCEL and <br>

> test it with clang, but I ran into several difficulties.<br>

> <br>

> <br>

> The core of the issue is that I was not able to get the workload onto <br>

> the GPUs.  I wrote the following small test<br>

> <br>

> <br>

> //////////////////////////////////////////////////////////////////////////////////////////////////////////////<br>

> <br>

> #define DATATYPE unsigned long long<br>

> <br>

> /*gpu offload openmp*/<br>

> DATATYPE reduce_gpu_omp(DATATYPE *arr, size_t size) {<br>

>      DATATYPE result = IDENTITY;<br>

> #pragma omp target data map(tofrom:arr[:size]) map(tofrom:result)<br>

>      {<br>

> #pragma omp target teams distribute parallel for reduction(+:result) <br>

> schedule(static, 1)<br>

>          for (size_t i = 0; i < size; i++) {<br>

>              result += arr[i];<br>

>          }<br>

>      }<br>

>      return result;<br>

> }<br>

> <br>

> //////////////////////////////////////////////////////////////////////////////////////////////////////////////<br>

> <br>

<br>

When compiling your code, but leaving out the reduction, I can execute <br>

this on a GPU. With the reduction the code seems to hang for me. <br>

(Posting a full compile-able example next time would be preferred!)<br>

<br>

This is how I compiled:<br>

<br>

clang -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -o reduce reduce.c<br>

<br>

To see whether the code is actually executed on the device, you can add <br>

this to the loop for debugging:<br>

<br>

if (i==0) printf("omp_is_initial_device=%i\n", omp_is_initial_device());<br>

<br>

> And compiled that with clang trunk with the following commands:<br>

> <br>

> clang -O3 -fopenmp -omptargets=nvptx64sm_35-nvidia-linux -Wall -o reduce <br>

> reduce.c<br>

> clang -O3 -fopenmp -omptargets=nvptx64sm_35-nvidia-linux-cuda -Wall -o <br>

> reduce reduce.c<br>

> clang -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Wall -o reduce <br>

> reduce.c<br>

> <br>

> The offloading to GPU was unsuccessful for all these commands. That <br>

> said, the CPU load did go up when the kernel above was run,  so the <br>

> offloading did happen, but the computation was offloaded to the CPU, not <br>

> the GPU.<br>

> <br>

> My speculation is that I missed some steps setting up the <br>

> compiler/libraries and the offloading did not happen correctly. Or it <br>

> could be the fact that reductions were not supported across teams (as <br>

> stated here <a href="https://clang.llvm.org/docs/OpenMPSupport.html">https://clang.llvm.org/docs/OpenMPSupport.html</a>).<br>

> <br>

> In the end, I would like to ask two questions:<br>

> <br>

>  1. What is a good candidate of llvm based compiler to test OpenMP GPU<br>

>     offloading?  Should clang-ykt be used instead of clang trunk?<br>

<br>

I used clang trunk, compiled like:<br>

<br>

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$INSTALL  \<br>

       -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_60 \<br>

       -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,60 \<br>

       $SRC<br>

<br>

<br>

>  2. What is the recommended procedure for compiler and linker flags to<br>

>     build programs with GPU offloading? Maybe I am not searching<br>

>     correctly, but I was not able to find a documentation on how that is<br>

>     supposed to be done. Additionally, will the compiler show some<br>

>     warning if offloading to GPU is unsuccessful?<br>

<br>

Successful / unsuccessful is a runtime decision. You will get an error <br>

and execution aborts, once this patch is submitted:<br>

<br>

<a href="https://reviews.llvm.org/D50522">https://reviews.llvm.org/D50522</a><br>

<br>

Best<br>

Joachim<br>

<br>

> <br>

> <br>

> Thanks for your help!<br>

> <br>

> Sincerely,<br>

> Qiongsi<br>

> <br>

> <br>

> <br>

> _______________________________________________<br>

> Openmp-dev mailing list<br>

> Openmp-dev@lists.llvm.org<br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>

> <br>

<br>

</div>

</span></font></div>

</body>

</html>