<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Variable not globalized on the device when nested inside parallel region."
   href="https://bugs.llvm.org/show_bug.cgi?id=51095">51095</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Variable not globalized on the device when nested inside parallel region.
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>OpenMP
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>a.bataev@hotmail.com
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>huberjn@ornl.gov
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>jdoerfert@anl.gov, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>In this example (<a href="https://godbolt.org/z/W4WMPr5xq">https://godbolt.org/z/W4WMPr5xq</a>) the variable `x` is shared
between all the threads by writing its pointer to a global value that is read
by all the threads. This should be legal according to OpenMP, but when the
variable is placed directly inside the parallel region, rather than inside of a
function that's called in parallel, it will not be globalized. When I compile
and the first version on my nvptx64 machine I get the following:

$ clang++ version1.cpp -fopenmp-targets=nvptx64 -fopenmp
$ ./a.out

Thread 0: 1
Thread 1: 1
Thread 2: 1
...
Thread 125: 1
Thread 126: 1
Thread 127: 1

The second version where `x` is directly in the parallel region gives me this:

$ clang++ version1.cpp -fopenmp-targets=nvptx64 -fopenmp
$ ./a.out

Thread 0: 0
Thread 1: 1
Thread 2: 2
...
Thread 125: 125
Thread 126: 126
Thread 127: 127

A call to `__kmpc_alloc_shared` is not inserted for the variable `x` in the
second version, leading to the value not being sharable between the threads.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>