<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body bgcolor="#FFFFFF" text="#000000">

<p><br>

</p>

<div class="moz-cite-prefix">On 10/31/19 10:54 AM, Luo, Ye wrote:<br>

</div>

<blockquote type="cite" cite="mid:DM6PR09MB3548BF5277EBEC67300B8F36A3630@DM6PR09MB3548.namprd09.prod.outlook.com">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

Hi Hal,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

My experience of llvm/clang so far shows:</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

1. all the target offload is blocking synchronous using the default stream. nowait is not supported.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

2. all the memory transfer calls invoke cudaMemcpy. There are no async calls.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

3. I had an experiment in the past turning on <samp>CUDA_API_PER_THREAD_DEFAULT_STREAM</samp> in libomptarget.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

Then I use multiple host threads to do individual blocking synchronous offload. I got it sort of running and saw multple streams but the code crashes due to memory corruption probably due to some data race in libomptarget.</div>

</blockquote>

<p><br>

</p>

<p>Thanks, Ye. That's consistent with Alexey's comments.</p>

<p><br>

</p>

<p>Is there already a bug open on this? If not, we should open one.</p>

<p><br>

</p>

<p>Alexey, the buffer-reuse optimizations in Clang that you mentioned, how much memory/overhead do they save? Is it worth keeping them in some mode?</p>

<p><br>

</p>

<p> -Hal<br>

</p>

<p><br>

</p>

<blockquote type="cite" cite="mid:DM6PR09MB3548BF5277EBEC67300B8F36A3630@DM6PR09MB3548.namprd09.prod.outlook.com">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

Best,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

Ye<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Finkel, Hal J.

<a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a><br>

<b>Sent:</b> Wednesday, October 30, 2019 1:40 PM<br>

<b>To:</b> Alessandro Gabbana <a class="moz-txt-link-rfc2396E" href="mailto:gbblsn@unife.it">

<gbblsn@unife.it></a>; <a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">

cfe-dev@lists.llvm.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:cfe-dev@lists.llvm.org">

<cfe-dev@lists.llvm.org></a>; Luo, Ye <a class="moz-txt-link-rfc2396E" href="mailto:yeluo@anl.gov">

<yeluo@anl.gov></a>; Doerfert, Johannes <a class="moz-txt-link-rfc2396E" href="mailto:jdoerfert@anl.gov">

<jdoerfert@anl.gov></a><br>

<b>Subject:</b> Re: [cfe-dev] openmp 4.5 and cuda streams</font>

<div> </div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">

<div class="PlainText">[+Ye, Johannes]<br>

<br>

I recall that we've also observed this behavior. Ye, Johannes, we had a <br>

work-around and a patch, correct?<br>

<br>

  -Hal<br>

<br>

On 10/30/19 12:28 PM, Alessandro Gabbana via cfe-dev wrote:<br>

> Dear All,<br>

><br>

> I'm using clang 9.0.0 to compile a code which offloads sections of a <br>

> code on a GPU using the openmp target construct.<br>

> I also use the nowait clause to overlap the execution of certain <br>

> kernels and/or host<->device memory transfers.<br>

> However, using the nvidia profiler I've noticed that when I compile <br>

> the code with clang only one cuda stream is active,<br>

> and therefore the execution gets serialized. On the other hand, when <br>

> compiling with XLC I see that kernels are executed<br>

> on different streams. I could not understand if this is the expected <br>

> behavior (e.g. the nowait clause is currently not supported),<br>

> or if I'm missing something. I'm using a NVIDIA Tesla P100 GPU and <br>

> compiling with the following options:<br>

><br>

> -target x86_64-pc-linux-gnu -fopenmp <br>

> -fopenmp-targets=nvptx64-nvidia-cuda <br>

> -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_60<br>

><br>

> best wishes<br>

><br>

> Alessandro<br>

><br>

> _______________________________________________<br>

> cfe-dev mailing list<br>

> <a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>

> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" moz-do-not-send="true">

https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

<br>

-- <br>

Hal Finkel<br>

Lead, Compiler Technology and Programming Languages<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

<br>

</div>

</span></font></div>

</blockquote>

<pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

</body>

</html>