[llvm-bugs] [Bug 28955] New: Performance issues in RAJA LULESH

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Aug 12 13:44:42 PDT 2016


https://llvm.org/bugs/show_bug.cgi?id=28955

            Bug ID: 28955
           Summary: Performance issues in RAJA LULESH
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: CUDA
          Assignee: unassignedclangbugs at nondot.org
          Reporter: poliakoff1 at llnl.gov
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

I hope you'll excuse a mildly vague bug report, I'm trying to push forward with
Clang at the lab and have been quite impressed with the usability.

Small issue: In NVCC the threadIdx structures are a dim3, in clang it’s
apparently a uint3. Tiny tiny difference, but it did show up in our build
process. At this point I'm probably just going to make the variable which was a
dim3 have type decltype(threadIdx) and call it a day, but any of your users who
assume a dim3 might be inconvenienced.

Larger issue: we’re seeing some really bad performance, and wanted to talk with
you folks about that. We ran a standard LULESH problem through it and saw a
factor of 20 degradation in performance. Will Killian did some analysis for us
and saw that we were seeing bad inlining and incredibly high register usage (as
high as 385 in one of our kernels). I haven’t hardened our build process enough
to be able to give you a “just make it go” reproducer script, but wanted to
make you aware of the issue. We're also running into Christian (Trott's)
problem of the different __device__ attribute placement, which we *can* hide
with a lambda, but which is likely to be a source of friction for people
looking to "port" code to compile with Clang. We're also going to start pushing
the NVIDIA folks.

Anyway, vague as this might be I wanted to let you know, while I imagine the
register pressure is well known, I wanted to highlight it. I think if we can
bump up the performance, the rather incredible usability improvements will
drive a lot of people to test it out.

Here's a link to the function which is having such bad register pressure
(https://github.com/LLNL/RAJA/blob/coverity-scan/test/LULESH-v2.0/LULESH-v2.0_RAJA-variants/LULESH-v2.0_RAJA-IndexSet/lulesh.cc#L297),
I'll see if I can get a branch up which you would be able to build, as well as
a version of LULESH so you can play around with it, but I wanted to post this
to open a discussion of the performance problems

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160812/331a9d9d/attachment.html>


More information about the llvm-bugs mailing list