<div dir="ltr">Hi,<div><br></div><div><div>I'm working on a project which requires OpenMP offloading to Nvidia GPUs using Clang. </div><div><br></div><div>System specification</div><div><br></div><div>OS - Ubuntu 16.04 LTS</div><div>Clang -version 4.00</div><div>Processor - Intel(R) Core(TM) i7 -4700MQ CPU</div><div>Cuda -version - 9.0</div><div>Nvidia GPU - GeForce 740M (sm_capability - 35)</div><div><br></div><div>But the problem is I when I execute a sample program to test OpenMP offloading to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host.</div><div><br></div><div>Please find the sample program  attached herewith, This a small C program written to multiply 2 matrices. </div><div>The reason to claim that target region is being executed in both host and target-device is due to the abnormal output received from the print function residing in the target region. (My processor has 4 cores capable of handling 2 hardware level threads per core.).</div><div><br></div><div>Please find the image of the command line output attached herewith.</div><div><br></div><div>the program was compiled with -</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda<br></blockquote><div><br></div><div><br></div><div>I can not figure out whether runtime believes that the GPU execution is not completing successfully?. So the target region is being executed in the host again. </div><div><br></div><div>Thank you!</div>-- <br><div class="gmail_signature"><div dir="ltr"><div dir="ltr"><div style="font-size:12.8px"><b><font color="#000000" face="georgia, serif">Piyumi Rameshka</font></b></div></div></div></div>

</div></div>