[PATCH] D40275: [CUDA] Report "unsupported VLA" errors only on device side.

Tue Nov 28 10:27:35 PST 2017

tra added inline comments.

================
Comment at: clang/lib/Sema/SemaType.cpp:2188
+               !Context.getTargetInfo().isVLASupported() &&
+               shouldDiagnoseTargetSupportFromOpenMP()) {
+      // Some targets don't support VLAs.
----------------
rjmccall wrote:
> tra wrote:
> > tra wrote:
> > > rjmccall wrote:
> > > > Please write this check so that it trips in an "ordinary" build on a target that just happens to not support VLAs, something like:
> > > > 
> > > >   else if (!Context.getTargetInfo().isVLASupported() && shouldDiagnoseTargetSupportFromOpenMP())
> > > > 
> > > > If you want to include the explicit OpenMP check there, it would need to be:
> > > > 
> > > >   else if (!Context.getTargetInfo().isVLASupported() && (!getLangOpts().OpenMP || shouldDiagnoseTargetSupportFromOpenMP()))
> > > > 
> > > > but I think the first looks better.
> > > > 
> > > > The CUDA and OpenMP paths here seem to be trying to achieve analogous things; it's unfortunate that we can't find a way to unify their approaches, even if we'd eventually want to use different diagnostic text.  I imagine that the target-environment language restrictions are basically the same, since they arise for the same fundamental reasons, so all the places using CUDADiagIfDeviceCode are likely to have a check for shouldDiagnoseTargetSupportFromOpenMP() and vice-versa.
> > > The problem is that in CUDA we can't just do 
> > > ```
> > > if (!Context.getTargetInfo().isVLASupported() && shouldDiagnoseTargetSupportFromOpenMP())
> > >    Diag(Loc, diag::err_vla_unsupported);
> > > ```
> > > 
> > > In some situations diag messages will only be emitted if we attempt to generate unsupported code on device side.
> > > Consider 
> > > ```
> > > __host__ __device__ void foo(int n) {
> > >   int vla[n];
> > > }
> > > ```
> > > 
> > > When Sema sees this code during compilation, it can not tell whether there is an error. Calling foo from the host code is perfectly valid. Calling it from device code is not. `CUDADiagIfDeviceCode` creates 'postponed' diagnostics which only gets emitted if we ever need to generate code for the function on device.
> > > 
> > > So, while CUDA and OpenMP do similar things, they are not quite the same.  If your goal to generalize CUDA and OpenMP handling, then it would have to be folded into diagnostic-emitting itself and we'll need an analog of CUDADiagIfDeviceCode which can handle both OpenMP and CUDA. 
> > > E.g. something like this:
> > > ```
> > > ??? DiagIfDeviceCode(???) {
> > >    if (OpenCL || (OpenMP && shouldDiagnoseTargetSupportFromOpenMP()))
> > >        Diag(...);
> > >    else if (CUDA)
> > >        CUDADiagIfDeviceCode()
> > > } 
> > > 
> > > ...
> > > 
> > > if (!Context.getTargetInfo().isVLASupported()) 
> > >    DiagIfDeviceCode();
> > > 
> > > ```
> > > 
> > > Would that work for you?
> > There's another issue with this approach -- diagnostics itself. Each dialect has its own. Specifically CUDA diags have details that are relevant only to CUDA. I suspect OpenMP has something specific as well. If we insist emitting only one kind of error for particular case across all dialects, we'll have to stick to bare bones "feature X is not supported" which will not have sufficient details to explain why the error was triggered in CUDA.
> > 
> > IMO dialect-specific handling of cuda errors in this case is the lesser evil. 
> > 
> > I'll update the patch to handle non-cuda cases the way  you suggested.
> If there really is interesting language-specific information to provide in a diagnostic, I agree that it's hard to avoid having different code for different targets.  On the other hand, the CUDA-specific information in this specific diagnostic seems unnecessary — does the user really care about whether the function was  '__device__' vs. '__host__ __device__'? in fact, isn't the latter a bit misleading? — and I feel like a generic "cannot use variable-length arrays when compiling for device 'nvptx64'" would be perfectly satisfactory in both CUDA and OpenMP, and that's probably true for almost all of these diagnostics.
> 
> On a completely different note, I do want to point out that the only thing you actually *need* to ban here is declaring a local variable of VLA type.  There's no reason at all to ban VLA types in general; they just compile to extra arithmetic.
Agreed. While host/device attributes are part of a function signature in CUDA, in this case only 'device' part makes the difference and therefore common-style reporting the way you suggest would be sufficient to indicate the error in the device-side code.

As for the VLA types, clang can currently compile code with VLA arguments: https://godbolt.org/g/43hVu9
Clang explicitly does not support GCC's VLA-in-structure extension. 
Are there any other use cases you can think of?

https://reviews.llvm.org/D40275