[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

Wed Feb 7 12:12:35 PST 2024

https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/81033

Summary;
The LLVM C library currently uses `nanosleep` in the RPC interface and
for the C library `nanosleep` function. We build the LLVM C library for
every single NVPTX architecture individually currently, which is not
ideal. The goal is to make the LLVM-IR target independent, unfortunately
the one snag is the `nanosleep` function which will crash if used on a
GPU older than sm_70. There are three possible solutions to this.

1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions.
   This will work as long as optimizations are on, not ideal.
2. Get rid of the use of nanosleep in `libc`. This isn't ideal as
   sleeping during the busy-wait loops is helpful for thread scheduling
   and it prevents us from providing `nanosleep` as a C library
   function.
3. This patch, which simply makes it legal on all architectures but do
   nothing is it's older than sm_70.

This is a draft to question if this is an acceptable hack, as an
intrinsic silently doing nothing is not always a good idea. Potentially
a new intrinsic could be added instead, but there is also a desire to
have intrinsics map 1-to-1 with hardware.


>From 10447352c68c666c51cfba7d84a06cb23327bc8a Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Wed, 7 Feb 2024 14:03:00 -0600
Subject: [PATCH] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported

Summary;
The LLVM C library currently uses `nanosleep` in the RPC interface and
for the C library `nanosleep` function. We build the LLVM C library for
every single NVPTX architecture individually currently, which is not
ideal. The goal is to make the LLVM-IR target independent, unfortunately
the one snag is the `nanosleep` function which will crash if used on a
GPU older than sm_70. There are three possible solutions to this.

1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions.
   This will work as long as optimizations are on, not ideal.
2. Get rid of the use of nanosleep in `libc`. This isn't ideal as
   sleeping during the busy-wait loops is helpful for thread scheduling
   and it prevents us from providing `nanosleep` as a C library
   function.
3. This patch, which simply makes it legal on all architectures but do
   nothing is it's older than sm_70.

This is a draft to question if this is an acceptable hack, as an
intrinsic silently doing nothing is not always a good idea. Potentially
a new intrinsic could be added instead, but there is also a desire to
have intrinsics map 1-to-1 with hardware.
---
 clang/include/clang/Basic/BuiltinsNVPTX.def | 2 +-
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td    | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 7819e71d7fe2aa..5fd17a1f5b8552 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -159,7 +159,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
 
 BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
 BUILTIN(__nvvm_exit, "v", "r")
-TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", AND(SM_70, PTX63))
+TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", PTX63)
 
 // Min Max
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 2330d7213c26dc..fd786a12c78eba 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -646,6 +646,15 @@ def INT_NVVM_NANOSLEEP_I : NVPTXInst<(outs), (ins i32imm:$i), "nanosleep.u32 \t$
 def INT_NVVM_NANOSLEEP_R : NVPTXInst<(outs), (ins Int32Regs:$i), "nanosleep.u32 \t$i;",
                              [(int_nvvm_nanosleep Int32Regs:$i)]>,
         Requires<[hasPTX<63>, hasSM<70>]>;
+
+// Make 'nanosleep' a no-op on older architectures.
+def INT_NVVM_NANOSLEEP_I_NOOP : NVPTXInst<(outs), (ins i32imm:$i), "/* no-op */",
+                             [(int_nvvm_nanosleep imm:$i)]>,
+        Requires<[hasPTX<63>]>;
+def INT_NVVM_NANOSLEEP_R_NOOP : NVPTXInst<(outs), (ins Int32Regs:$i), "/* no-op */",
+                             [(int_nvvm_nanosleep Int32Regs:$i)]>,
+        Requires<[hasPTX<63>]>;
+
 //
 // Min Max
 //