[llvm-bugs] [Bug 38565] New: Poor performance of Clang-7.0.0rc1 OpenMP target regions compared to Clang-ykt

via llvm-bugs llvm-bugs at lists.llvm.org
Tue Aug 14 10:16:43 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=38565

            Bug ID: 38565
           Summary: Poor performance of Clang-7.0.0rc1 OpenMP target
                    regions compared to Clang-ykt
           Product: OpenMP
           Version: unspecified
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Clang Compiler Support
          Assignee: unassignedclangbugs at nondot.org
          Reporter: csdaley at lbl.gov
                CC: llvm-bugs at lists.llvm.org

Created attachment 20715
  --> https://bugs.llvm.org/attachment.cgi?id=20715&action=edit
Source code, IR files, compiler commands and performance results

Hello all,

I have been testing an OpenMP target offload version of the STREAM
microbenchmark on Nvidia V100. I achieve a memory bandwidth of only 450 GB/s
using Clang-7.0.0rc1 compared to 750 GB/s when using Clang-ykt. I have compiled
STREAM with -O2 optimization for both compilers. I have attached a tarball
containing the optimized IR files for both Clang-ykt and Clang-7.0.0rc1. One
thing that is apparent is that there is nearly an order of magnitude more code
in the Clang-7.0.0rc1 IR file:

$ wc -l ykt/stream-openmp-nvptx64-nvidia-cuda.ll
7.0.0rc1/stream-openmp-nvptx64-nvidia-cuda.ll 
     308 ykt/stream-openmp-nvptx64-nvidia-cuda.ll
    2226 7.0.0rc1/stream-openmp-nvptx64-nvidia-cuda.ll

I have also included output files showing the exact compiler commands used and
performance results from Nvidia profiler. The Nvidia profiler shows that the
offloaded compute kernels use 16-18 registers in Clang-ykt and 26-31 registers
in Clang-7.0.0rc1.

I have observed the same poor performance on platforms using a). Haswell CPUs
and Nvidia V100s and b). Power 9 CPUs and Nvidia V100s. The files in the tar
ball were obtained on the platform using Haswell CPUs and Nvidia V100s.

Any help understanding this poor performance is appreciated.
Thanks,
Chris

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180814/65e5cc06/attachment-0001.html>


More information about the llvm-bugs mailing list