[PATCH] Add support for CUDA unroll pragma

Thu Jun 26 09:56:03 PDT 2014

On Thu, Jun 26, 2014 at 12:49 PM, Mark Heffernan <meheff at google.com> wrote:
> On Thu, Jun 26, 2014 at 7:47 AM, Aaron Ballman <aaron.ballman at gmail.com>
> wrote:
>>
>> On Wed, Jun 25, 2014 at 4:28 PM, Mark Heffernan <meheff at google.com> wrote:
>> > This pragma is only supported when compiling in CUDA mode (-x cuda).
>> > The "#pragma unroll" and "#pragma unroll N" have identical semantics to
>> > "#pragma clang loop unroll(enable)" and "#pragma clang loop unroll_count(N)"
>> > respectively.
>>
>> I am really uncomfortable with the idea of having two pragmas with
>> identical semantics but differing syntax. I would prefer that we have
>> a single syntax since it is *identical* functionality. I suspect this
>> pragma is driven by the CUDA standard? If so, has the idea been
>> explored of simply supporting CUDA's syntax outside of CUDA mode, and
>> dropping #pramga loop unroll?
>
>
> CUDA defines the unroll pragma this way.  For GPU code, supporting it is
> essential.  For about half of our internal benchmarks performance is > 10x
> slower if the pragma is not supported, and this is probably not atypical for
> tuned code.  Also, making the pragma syntax identical to CUDA is clearly
> important for compatibility with nvcc and existing code.  For non-CUDA code
> the performance impact is likely to be much smaller, but still nice to have.
> Given that and the constraint of not wanting redundant pragma syntax,
> supporting the CUDA-style syntax ("#pragma unroll" and "#pragma unroll N")
> universally may be the way to go.  At the very least the CUDA syntax should
> be supported in CUDA mode.

That's basically what I was leaning towards as well. If they have
identical semantics, then let's go with the CUDA syntax because that's
specified syntax instead of invented syntax. Given that there's
nothing CUDA-specific about it, I think it's acceptable to use this
syntax outside of CUDA as well.

> From a previous review Richard Smith (cc'd) had a strong opinion about
> placing the loop pragmas inside of the clang namespace ("#pragma clang
> loop...").  Clearly this is incompatible with "#pragma unroll ...".
> Richard, any thoughts on this?

I agree with Richard's opinion for the invented syntax, but this is a
bit different since this syntax is specified by NVidia and supported
on compilers other than clang. I am okay with this pragma not being in
the clang namespace since that makes the code compatible with other
compilers, presuming that clang's behavior matches that of other
compilers implementing the same syntax. But I'm also curious as to
Richard's take.

~Aaron