[PATCH] D39505: [OpenMP] Show error if VLAs are not supported

Mon Nov 13 23:24:37 PST 2017

rjmccall added inline comments.

================
Comment at: include/clang/Basic/TargetInfo.h:944
+  /// \brief Whether target supports variable-length arrays.
+  bool isVLASupported() const { return VLASupported; }
+
----------------
Hahnfeld wrote:
> rjmccall wrote:
> > ABataev wrote:
> > > ABataev wrote:
> > > > Hahnfeld wrote:
> > > > > rjmccall wrote:
> > > > > > Hahnfeld wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > Hahnfeld wrote:
> > > > > > > > > rjmccall wrote:
> > > > > > > > > > The way you've written this makes it sound like "does the target support VLAs?", but the actual semantic checks treat it as "do OpenMP devices on this target support VLAs?"  Maybe there should be a more specific way to query things about OpenMP devices instead of setting a global flag for the target?
> > > > > > > > > Actually, the NVPTX and SPIR targets never support VLAs. So I felt like it would be more correct to make this a global property of the target.
> > > > > > > > > 
> > > > > > > > > The difference is that the other programming models (OpenCL and CUDA) error out immediatelyand regardless of the target because this limitation is reflected in the standards that disallow VLAs (see SemaType.cpp). For OpenMP we might have target devices that support VLA so we shouldn't error out for those.
> > > > > > > > If you want to make it a global property of the target, that's fine, but then I don't understand why your diagnostic only fires when (S.isInOpenMPDeclareTargetContext() || S.isInOpenMPTargetExecutionDirective()).
> > > > > > > That is because of how OpenMP offloading works and how it is implemented in Clang. Consider the following snippet from the added test case:
> > > > > > > ```lang=c
> > > > > > > int vla[arg];
> > > > > > > 
> > > > > > > #pragma omp target map(vla[0:arg])
> > > > > > > {
> > > > > > >    // more code here...
> > > > > > > }
> > > > > > > ```
> > > > > > > 
> > > > > > > Clang will take the following steps to compile this into a working binary for a GPU:
> > > > > > > 1. Parse and (semantically) analyze the code as-is for the host and produce LLVM Bitcode.
> > > > > > > 2. Parse and analyze again the code as-is and generate code for the offloading target, the GPU in this case.
> > > > > > > 3. Take LLVM Bitcode from 1., generate host binary and embed target binary from 3.
> > > > > > > 
> > > > > > > `OpenMPIsDevice` will be true for 2., but the complete source code is analyzed. So to not throw errors for the host code, we have to make sure that we are actually generating code for the target device. This is either in a `target` directive or in a `declare target` region.
> > > > > > > Note that this is quite similar to what CUDA does, only they have `CUDADiagIfDeviceCode` for this logic. If you want me to add something of that kind for OpenMP target devices, I'm fine with that. However for the given case, it's a bit different because this error should only be thrown for target devices that don't support VLAs...
> > > > > > I see.  So the entire translation unit is re-parsed and re-Sema'ed from scratch for the target?  Which means you need to avoid generating errors about things in the outer translation unit that aren't part of the target directive that you actually want to compile.  I would've expected there to be some existing mechanism for that, to be honest, as opposed to explicitly trying to suppress target-specific diagnostics one by one.
> > > > > Yes, that is my understanding. For errors, we don't need to take anything special as the first `cc1` invocation will exit with a non-zero status so that the driver stops the compilation. For warnings, there seems to be no mechanism in place as I see them duplicated, even in code that is not generate for the target device (verified with an unused variable).
> > > > > 
> > > > > @ABataev @gtbercea Do I miss something here?
> > > > I'm not aware of any.
> > > John, target-specific checks require some special flags (like LangOpts.Cuda) that are not set when we re-compile the code for OpenMP devices. That's why errors are not emitted for the non-target code. But also because of that, we need some special OpenMP checks for target-specific code inside the target regions. For example, code in lib/Sema/SemaType.cpp, lines 2184, 2185 (see this file in this patch) checks for Cuda compilation and prohibits using of VLAs in Cuda mode. We also should prohibit using of VLAs in target code for NVPTX devices or other devices that do not support VLAs in OpenMP mode.
> > I think it would be cleaner here, and better for our OpenMP support overall, if we found a more general way to suppress unwanted diagnostics in the second invocation for code outside of the target directive.  This check (and several others) would then just implement a more general target feature disabling VLA support instead of being awkwardly OpenMP-specific.
> I think to get this we would need to make `Diag` a no-op `if (Context.getLangOpts().OpenMPIsDevice && !(isInOpenMPDeclareTargetContext() || isInOpenMPTargetExecutionDirective()))`. This would ignore all diagnostics outside of the code is really generated in the end...
I mean, the danger of this approach is that you don't really want to suppress diagnostics for top-level declarations: it can leave you with an invalid AST, and it is not valid to generate IR from an invalid AST.

Sorry for the ignorant questions that follow, but I assume the OpenMP spec must bless this double-translation somehow, and I'd like to understand more about that in order to advise how to proceed.  How does OpenMP handle the possibility that the code will be processed substantially differently for different targets?  Is there some rule in the spec saying that the code has to expand "the same" in both targets?  How does that work when e.g. size_t might have a different size or use a completely different type?  More generally, how do expect that this feature will work in the more complicated language modes, like OpenMP + C++?

https://reviews.llvm.org/D39505