[PATCH] D94337: Add cuda header type for cuh files

Ryan Greenblatt via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 11 23:04:14 PST 2021


rgreenblatt added a comment.

In D94337#2491825 <https://reviews.llvm.org/D94337#2491825>, @tra wrote:

> 'Works' is not exactly the same as 'works correctly'. This example makes `a()` look like a regular host function, instead of the kernel, and that affects how the rest of the TU get parsed.
> I.e. you'll have further errors if somewhere down below the file has `a<<<1,1>>>()`. Similar story with ignoring `__host__` and `__device__` attributes -- that may lead to various overload resolution errors, or reporting conflicting redeclarations/redefinitions for the perfectly valid host/device function overloads. The list goes on.

No, this part is definitely working, the full set of SDK headers is included. I have confirmed this by compiling a .cu and a .cuh file with `-E` and checking that the output is identical. Further, I have confirmed that `global<<<_, _>>>()` builds, that `__host__` only functions can't be used on the device, and that `__device__` functions can't be used on the host.

I decided to go ahead and understand exactly what is going on when building with -x cuda-header
Prior to the last update I made, when compling a cuda-header this was roughly what was happening:

- Preprocess the header in host mode with type `TY_CUDAHeader`
- For each cuda arch, build with type `TY_CUDA_DEVICE`. This builds the header as though it was a .cu src file and this is where the `#pragma once` warnings were occuring.
- if -f-syntax only was used, exit here
- try to construct a fat binary and violently explode in a crash because the host mode was compiled with type `TY_CUDAHeader` instead of `TY_CUDA`. `TY_CUDAHeader` attempts to generate a precompiled header rather than a binary.

To fix this, I added a new type `TY_CUDAHeader_DEVICE`. This type precompiles a header for a specific device architecture. 
This type is used in place of TY_CUDA_DEVICE in the appropriate case. 
I think having a header devicetype is the correct approach if a separate header type is used.
Now compilation looks like this:

- Preprocess the header in host mode with type `TY_CUDAHeader`
- For each cuda arch, build with type `TY_CUDAHeader_DEVICE`. This doesn't issue a warning for #pragma once - it is correctly (I think) considering the file as a header.
- if -f-syntax only was used, exit here
- Output precompiled headers for the host and each cuda arch. For example: file.cuh-cuda-nvptx64-nvidia-cuda-sm_60.gch, file.cuh-cuda-nvptx64-nvidia-cuda-sm_75.gch, file.cuh.gch

As far as I can tell this process is now working as expected. 
Of course, there is no way to use these precompiled headers right now, so I have no idea if they are at all valid.
Also, I haven't run this with assertions enabled yet (waiting a build), so it might trip something.

I have tested that everything works as expected using the following header file:

  #pragma once
  
  __device__ int device_only() {
    __syncthreads();
    return 0;
  }
  
  __host__ int host_only() { return 1; }
  
  __host__ __device__ void check_all_archs() {
  #ifdef __CUDA_ARCH__
  #if __CUDA_ARCH__ == 750
  #pragma message "sm_75"
    // host_only();
  #elif __CUDA_ARCH__ == 600
  #pragma message "sm_60"
    // host_only();
  #else
  #pragma message "other sm"
    // host_only();
  #endif
  #else
  #pragma message "host"
    // device_only();
  #endif
  }
  
  __global__ void global() {
    int out_device = device_only();
    // int host_error = host_only();
  }
  
  void f() {
    check_all_archs();
    global<<<1, 1>>>();
  }

@tra I am guessing you have already thought about this, but
one thing which is worth noting is that language servers 
only maintain a single AST per file (and this probably won't change).
This is the host AST of course.
For example, in the above program language servers
will only have a diagnostic for `#pragma message "host"`.
So, the values of `--cuda-gpu-arch` aren't relevant for language servers
beyond determining what the preprocessor includes.
However, the fact that the preprocessor includes depend on arch
means that the __clang_cuda_standalone_defs.h approach won't always be perfectly
correct.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94337/new/

https://reviews.llvm.org/D94337



More information about the cfe-commits mailing list