[PATCH] D94337: Add cuda header type for cuh files

Wed Jan 13 10:57:16 PST 2021

tra added a comment.

In D94337#2492108 <https://reviews.llvm.org/D94337#2492108>, @rgreenblatt wrote:

> In D94337#2491825 <https://reviews.llvm.org/D94337#2491825>, @tra wrote:
>
>> 'Works' is not exactly the same as 'works correctly'. This example makes `a()` look like a regular host function, instead of the kernel, and that affects how the rest of the TU get parsed.
>> I.e. you'll have further errors if somewhere down below the file has `a<<<1,1>>>()`. Similar story with ignoring `__host__` and `__device__` attributes -- that may lead to various overload resolution errors, or reporting conflicting redeclarations/redefinitions for the perfectly valid host/device function overloads. The list goes on.
>
> No, this part is definitely working, the full set of SDK headers is included. I have confirmed this by compiling a .cu and a .cuh file with `-E` and checking that the output is identical. Further, I have confirmed that `global<<<_, _>>>()` builds, that `__host__` only functions can't be used on the device, and that `__device__` functions can't be used on the host.

We were talking about the case where clang can't expand CUDA-specific macros. 
It works in your specific case because you do have CUDA installed and compiler did find it. It's not necessarily the case for everyone everywhere. We want clang to work for tooling even when CUDA SDK is not installed on the machine at all. My argument was that without a sufficient set of CUDA macros clang can't parse CUDA sources correctly. CUDA SDK is currently the only source of those macros. We need clang to provide its own for the cases when CUDA SDK is not available.

Try compiling that source with `-nocudainc` which will illustrate what happens when CUDA SDK is not found. E.g. https://godbolt.org/z/MoKvfq

> To fix this, I added a new type `TY_CUDAHeader_DEVICE`. This type precompiles a header for a specific device architecture. 
> This type is used in place of `TY_CUDA_DEVICE` in the appropriate case. 
> I think having a header device type is the correct approach if a separate header type is used.

I think this kind of compilation pipeline restructuring is a bit premature. We need to address being able to parse CUDA headers reliably first.

There's also a question of whether it's necessary. It's not very useful for the Driver functionality itself and can be implemented in the tooling by explicitly telling the driver what we want. Your patch appears to implement it the other way around -- the tooling relies on the driver to implement the CUDA header handling magic. I think the right approach is to teach tooling how to handle CUDA and keep the clang driver changes to a minimum. Based on our previous conversation, se seem to need only `-x cuda-header`. Everything else can be controlled by the tooling with `--cuda-host-only`/`--cuda-device-only --cuda-gpu-arch=XXX`. Considering that tooling can't deal with more than one sub-compilation, using these flags is going to be necessary anyways.

> @tra I am guessing you have already thought about this, but
> one thing which is worth noting is that language servers 
> only maintain a single AST per file (and this probably won't change).

Using AST from the host-side compilation was an easy-to-do trade-off. It works reasonably well most of the time, at least for the sources we have exact compilation flags for. Re-engineering tooling to deal with multiple ASTs per TU just for CUDA may not be worth it.

> This is the host AST of course.
> For example, in the above program language servers
> will only have a diagnostic for `#pragma message "host"`.
> So, the values of `--cuda-gpu-arch` aren't relevant for language servers
> beyond determining what the preprocessor includes.
> However, the fact that the preprocessor includes depend on arch
> means that the __clang_cuda_standalone_defs.h approach won't always be perfectly
> correct.

Yes, that's exactly why I mentioned before that ideally we may need multiple ASTs as each device compilation will have a slightly different one.  The goal of `__clang_cuda_standalone_defs.h` is to make it possible to parse CUDA sources at all w/o having to rely on CUDA SDK. It is not intended to solve the multiple-AST issue.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94337/new/

https://reviews.llvm.org/D94337