<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - PTX code generation with cuda10: shfl without .sync is not supported"
   href="https://bugs.llvm.org/show_bug.cgi?id=43505">43505</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>PTX code generation with cuda10: shfl without .sync is not supported
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>9.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>CUDA
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>smithc11@rpi.edu
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=22605" name="attach_22605" title="tarball with source code, compilation instructions, and temporary files">attachment 22605</a> <a href="attachment.cgi?id=22605&action=edit" title="tarball with source code, compilation instructions, and temporary files">[details]</a></span>
tarball with source code, compilation instructions, and temporary files

Overview:

  Using Clang 9.0.0 for PTX code generation from Thrust source code results in
the following error with Cuda 10.1.243:

ptxas /var/tmp/testcase-min-af0932.s, line 4521; error   : Instruction 'shfl'
without '.sync' is not supported on .target sm_70 and higher from PTX ISA
version 6.4

Steps to Reproduce: 

   Compile the 'attached testcase-min.cu' with the following command:

   $ clang++ -O0 --cuda-gpu-arch=sm_70 testcase-min.cu

Actual Results: 

   The following error is output during the build:

ptxas /var/tmp/testcase-min-af0932.s, line 4521; error   : Instruction 'shfl'
without '.sync' is not supported on .target sm_70 and higher from PTX ISA
version 6.4
...
ptxas fatal   : Ptx assembly aborted due to errors
clang-9: error: ptxas command failed with exit code 255 (use -v to see
invocation)

Expected Results: 

    A working binary.

Build Date & Hardware: 

    System: LLNL Lassen system; 2x IBM Power9 host processors + 4x Nvidia V100
per node

$ clang++ --version
clang version 9.0.0 (/builddir/build/BUILD/ibm-llvm/tools/clang
63a7d47678dad8b206a08bdfa9380ebdb147e888) (/builddir/build/BUILD/ibm-llvm
d99a7ea8cd2b634d0dcb13c44d06c4bdd4436c4e)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/tce/packages/clang/clang-upstream-2019.08.15/release/bin

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:52_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

$ echo $CUDA_HOME 
/usr/tce/packages/cuda/cuda-10.1.243

Additional Builds and Platforms: 

    I did not compile the problematic code with Clang on another Volta equipped
system.

Additional Information: 

    Temporary files generated with '-save-temps' are included in the attached
tarball.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>