[libc-commits] [libc] 1fce1d3 - [libc] Use `nvptx_kernel` attribute in NVPTX startup code

Fri Mar 24 12:47:09 PDT 2023

Author: Joseph Huber
Date: 2023-03-24T14:46:26-05:00
New Revision: 1fce1d341b17762bb45bdc89520b00820fd63337

URL: https://github.com/llvm/llvm-project/commit/1fce1d341b17762bb45bdc89520b00820fd63337
DIFF: https://github.com/llvm/llvm-project/commit/1fce1d341b17762bb45bdc89520b00820fd63337.diff

LOG: [libc] Use `nvptx_kernel` attribute in NVPTX startup code

Summary:
A recent patch allowed us to emit a callable kernel from freestanding
NVPTX code. This allows us to move away from using the CUDA language.
This has several advantages in that it works around an entire assortment
of errors I was seeing while implementing RPC for Nvidia.

Added: 
    

Modified: 
    libc/startup/gpu/nvptx/CMakeLists.txt
    libc/startup/gpu/nvptx/start.cpp

Removed: 
    


################################################################################
diff  --git a/libc/startup/gpu/nvptx/CMakeLists.txt b/libc/startup/gpu/nvptx/CMakeLists.txt
index 96ab7540cedb1..1ee2108b0ef29 100644

--- a/libc/startup/gpu/nvptx/CMakeLists.txt
+++ b/libc/startup/gpu/nvptx/CMakeLists.txt
@@ -6,11 +6,8 @@ add_startup_object(
     -ffreestanding # To avoid compiler warnings about calling the main function.
     -fno-builtin
     -nogpulib # Do not include any GPU vendor libraries.
-    -nostdinc
-    -x cuda # Use the CUDA toolchain to emit the `_start` kernel.
-    -fgpu-rdc # Emit relocatable device code from CUDA.
-    --offload-device-only
-    --offload-arch=${LIBC_GPU_TARGET_ARCHITECTURE}
+    -march=${LIBC_GPU_TARGET_ARCHITECTURE}
+    --target=${LIBC_GPU_TARGET_TRIPLE}
   NO_GPU_BUNDLE # Compile this file directly without special GPU handling.
 )
 get_fq_target_name(crt1 fq_name)

diff  --git a/libc/startup/gpu/nvptx/start.cpp b/libc/startup/gpu/nvptx/start.cpp
index cf4077c3d9edd..1e7f4ca7668c0 100644
--- a/libc/startup/gpu/nvptx/start.cpp
+++ b/libc/startup/gpu/nvptx/start.cpp
@@ -6,10 +6,9 @@
 //
 //===----------------------------------------------------------------------===//
 
-extern "C" __attribute__((device)) int main(int argc, char **argv, char **envp);
+extern "C" int main(int argc, char **argv, char **envp);
 
-// TODO: We shouldn't need to use the CUDA language to emit a kernel for NVPTX.
-extern "C" [[gnu::visibility("protected")]] __attribute__((global)) void
+extern "C" [[gnu::visibility("protected")]] __attribute__((nvptx_kernel)) void
 _start(int argc, char **argv, char **envp, int *ret, void *in, void *out,
        void *buffer) {
   __atomic_fetch_or(ret, main(argc, argv, envp), __ATOMIC_RELAXED);