[PATCH] D94337: Add cuda header type for cuh files
Ryan Greenblatt via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Jan 11 23:04:14 PST 2021
rgreenblatt added a comment.
In D94337#2491825 <https://reviews.llvm.org/D94337#2491825>, @tra wrote:
> 'Works' is not exactly the same as 'works correctly'. This example makes `a()` look like a regular host function, instead of the kernel, and that affects how the rest of the TU get parsed.
> I.e. you'll have further errors if somewhere down below the file has `a<<<1,1>>>()`. Similar story with ignoring `__host__` and `__device__` attributes -- that may lead to various overload resolution errors, or reporting conflicting redeclarations/redefinitions for the perfectly valid host/device function overloads. The list goes on.
No, this part is definitely working, the full set of SDK headers is included. I have confirmed this by compiling a .cu and a .cuh file with `-E` and checking that the output is identical. Further, I have confirmed that `global<<<_, _>>>()` builds, that `__host__` only functions can't be used on the device, and that `__device__` functions can't be used on the host.
I decided to go ahead and understand exactly what is going on when building with -x cuda-header
Prior to the last update I made, when compling a cuda-header this was roughly what was happening:
- Preprocess the header in host mode with type `TY_CUDAHeader`
- For each cuda arch, build with type `TY_CUDA_DEVICE`. This builds the header as though it was a .cu src file and this is where the `#pragma once` warnings were occuring.
- if -f-syntax only was used, exit here
- try to construct a fat binary and violently explode in a crash because the host mode was compiled with type `TY_CUDAHeader` instead of `TY_CUDA`. `TY_CUDAHeader` attempts to generate a precompiled header rather than a binary.
To fix this, I added a new type `TY_CUDAHeader_DEVICE`. This type precompiles a header for a specific device architecture.
This type is used in place of TY_CUDA_DEVICE in the appropriate case.
I think having a header devicetype is the correct approach if a separate header type is used.
Now compilation looks like this:
- Preprocess the header in host mode with type `TY_CUDAHeader`
- For each cuda arch, build with type `TY_CUDAHeader_DEVICE`. This doesn't issue a warning for #pragma once - it is correctly (I think) considering the file as a header.
- if -f-syntax only was used, exit here
- Output precompiled headers for the host and each cuda arch. For example: file.cuh-cuda-nvptx64-nvidia-cuda-sm_60.gch, file.cuh-cuda-nvptx64-nvidia-cuda-sm_75.gch, file.cuh.gch
As far as I can tell this process is now working as expected.
Of course, there is no way to use these precompiled headers right now, so I have no idea if they are at all valid.
Also, I haven't run this with assertions enabled yet (waiting a build), so it might trip something.
I have tested that everything works as expected using the following header file:
#pragma once
__device__ int device_only() {
__syncthreads();
return 0;
}
__host__ int host_only() { return 1; }
__host__ __device__ void check_all_archs() {
#ifdef __CUDA_ARCH__
#if __CUDA_ARCH__ == 750
#pragma message "sm_75"
// host_only();
#elif __CUDA_ARCH__ == 600
#pragma message "sm_60"
// host_only();
#else
#pragma message "other sm"
// host_only();
#endif
#else
#pragma message "host"
// device_only();
#endif
}
__global__ void global() {
int out_device = device_only();
// int host_error = host_only();
}
void f() {
check_all_archs();
global<<<1, 1>>>();
}
@tra I am guessing you have already thought about this, but
one thing which is worth noting is that language servers
only maintain a single AST per file (and this probably won't change).
This is the host AST of course.
For example, in the above program language servers
will only have a diagnostic for `#pragma message "host"`.
So, the values of `--cuda-gpu-arch` aren't relevant for language servers
beyond determining what the preprocessor includes.
However, the fact that the preprocessor includes depend on arch
means that the __clang_cuda_standalone_defs.h approach won't always be perfectly
correct.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94337/new/
https://reviews.llvm.org/D94337
More information about the cfe-commits
mailing list