<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Poor performance of Clang-7.0.0rc1 OpenMP target regions compared to Clang-ykt"
   href="https://bugs.llvm.org/show_bug.cgi?id=38565">38565</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Poor performance of Clang-7.0.0rc1 OpenMP target regions compared to Clang-ykt
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>OpenMP
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Clang Compiler Support
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>csdaley@lbl.gov
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=20715" name="attach_20715" title="Source code, IR files, compiler commands and performance results">attachment 20715</a> <a href="attachment.cgi?id=20715&action=edit" title="Source code, IR files, compiler commands and performance results">[details]</a></span>
Source code, IR files, compiler commands and performance results

Hello all,

I have been testing an OpenMP target offload version of the STREAM
microbenchmark on Nvidia V100. I achieve a memory bandwidth of only 450 GB/s
using Clang-7.0.0rc1 compared to 750 GB/s when using Clang-ykt. I have compiled
STREAM with -O2 optimization for both compilers. I have attached a tarball
containing the optimized IR files for both Clang-ykt and Clang-7.0.0rc1. One
thing that is apparent is that there is nearly an order of magnitude more code
in the Clang-7.0.0rc1 IR file:

$ wc -l ykt/stream-openmp-nvptx64-nvidia-cuda.ll
7.0.0rc1/stream-openmp-nvptx64-nvidia-cuda.ll 
     308 ykt/stream-openmp-nvptx64-nvidia-cuda.ll
    2226 7.0.0rc1/stream-openmp-nvptx64-nvidia-cuda.ll

I have also included output files showing the exact compiler commands used and
performance results from Nvidia profiler. The Nvidia profiler shows that the
offloaded compute kernels use 16-18 registers in Clang-ykt and 26-31 registers
in Clang-7.0.0rc1.

I have observed the same poor performance on platforms using a). Haswell CPUs
and Nvidia V100s and b). Power 9 CPUs and Nvidia V100s. The files in the tar
ball were obtained on the platform using Haswell CPUs and Nvidia V100s.

Any help understanding this poor performance is appreciated.
Thanks,
Chris</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>