[PATCH] Add support for CUDA unroll pragma
meheff at google.com
Thu Jun 26 09:49:48 PDT 2014
On Thu, Jun 26, 2014 at 7:47 AM, Aaron Ballman <aaron.ballman at gmail.com>
> On Wed, Jun 25, 2014 at 4:28 PM, Mark Heffernan <meheff at google.com> wrote:
> > This pragma is only supported when compiling in CUDA mode (-x cuda).
> The "#pragma unroll" and "#pragma unroll N" have identical semantics to
> "#pragma clang loop unroll(enable)" and "#pragma clang loop
> unroll_count(N)" respectively.
> I am really uncomfortable with the idea of having two pragmas with
> identical semantics but differing syntax. I would prefer that we have
> a single syntax since it is *identical* functionality. I suspect this
> pragma is driven by the CUDA standard? If so, has the idea been
> explored of simply supporting CUDA's syntax outside of CUDA mode, and
> dropping #pramga loop unroll?
CUDA defines the unroll pragma this way. For GPU code, supporting it is
essential. For about half of our internal benchmarks performance is > 10x
slower if the pragma is not supported, and this is probably not atypical
for tuned code. Also, making the pragma syntax identical to CUDA is
clearly important for compatibility with nvcc and existing code. For
non-CUDA code the performance impact is likely to be much smaller, but
still nice to have. Given that and the constraint of not wanting redundant
pragma syntax, supporting the CUDA-style syntax ("#pragma unroll" and
"#pragma unroll N") universally may be the way to go. At the very least
the CUDA syntax should be supported in CUDA mode.
>From a previous review Richard Smith (cc'd) had a strong opinion about
placing the loop pragmas inside of the clang namespace ("#pragma clang
loop..."). Clearly this is incompatible with "#pragma unroll ...".
Richard, any thoughts on this?
For comparison, Intel and IBM compilers have the following syntax:
GCC and MSVC don't provide an unroll pragma, though unroll optimization
parameters could be adjusted at a function-level granularity with "#pragma
GCC optimize ..." or "#pragma optimize ..." respectively.
I don't like how much duplication is happening in this code given that
there's already existing machinery in place to do all of this.
Once we determine what syntax we want to support I can work on reducing
the redundancy in the code. Thanks for your comments.
> > Mark
> > http://reviews.llvm.org/D4297
> > Files:
> > docs/ReleaseNotes.rst
> > include/clang/Basic/Attr.td
> > include/clang/Basic/AttrDocs.td
> > include/clang/Basic/DiagnosticSemaKinds.td
> > include/clang/Basic/TokenKinds.def
> > include/clang/Parse/Parser.h
> > include/clang/Sema/CudaUnrollHint.h
> > lib/CodeGen/CGStmt.cpp
> > lib/CodeGen/CodeGenFunction.h
> > lib/Parse/ParsePragma.cpp
> > lib/Parse/ParseStmt.cpp
> > lib/Sema/SemaStmtAttr.cpp
> > test/CodeGen/cuda-pragma-unroll.cu
> > test/Misc/ast-print-cuda-pragmas.cu
> > test/PCH/cuda-pragma-unroll.cu
> > test/Parser/cuda-pragma-unroll.cu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-commits