[clang] [Clang] Do not emit intrinsic math functions on GPU targets (PR #98209)

Tue Jul 9 12:51:19 PDT 2024

https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/98209

Summary:
Currently, the GPU gets its math by using wrapper headers that eagerly
replace libcalls with calls to the vendor's math library. e.g.
```
// __clang_cuda_math.h
[[gnu::always_inline]] double sin(double __x) { return __nv_sin(__x); }
```

However, we want to be able to move away from including these headers.
When these headers are not included, the lack of `errno` on the GPU
target enables these to be transformed into intrinsic calls. These
intrinsic calls will then potentially not be supported by the backend,
see https://godbolt.org/z/oKvTevaE1.

Even in the case that these functions are supported, we still want to
use regular libcalls now so that the LTO linking will replace these
calls before they reach the backend.

This patch simply changes the logic to prevent emitting intrinsic
functions for the standard math library functions. This means that `sin`
will not be an intrinsic, but `__builtin_sin` will. A better solution
long-term would be to have a pass that does custom lowering of all of
these before LTO linking if possible.


>From d6927d897b990c018bd5bac868de5aa406d878ab Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Tue, 9 Jul 2024 14:43:55 -0500
Subject: [PATCH] [Clang] Do not emit intrinsic math functions on GPU targets

Summary:
Currently, the GPU gets its math by using wrapper headers that eagerly
replace libcalls with calls to the vendor's math library. e.g.
```
// __clang_cuda_math.h
[[gnu::always_inline]] double sin(double __x) { return __nv_sin(__x); }
```

However, we want to be able to move away from including these headers.
When these headers are not included, the lack of `errno` on the GPU
target enables these to be transformed into intrinsic calls. These
intrinsic calls will then potentially not be supported by the backend,
see https://godbolt.org/z/oKvTevaE1.

Even in the case that these functions are supported, we still want to
use regular libcalls now so that the LTO linking will replace these
calls before they reach the backend.

This patch simply changes the logic to prevent emitting intrinsic
functions for the standard math library functions. This means that `sin`
will not be an intrinsic, but `__builtin_sin` will. A better solution
long-term would be to have a pass that does custom lowering of all of
these before LTO linking if possible.
---
 clang/lib/CodeGen/CGBuiltin.cpp        |  6 +++
 clang/test/CodeGen/gpu-math-libcalls.c | 51 ++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 clang/test/CodeGen/gpu-math-libcalls.c

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 6cc0d9485720c..89c27147a2bd9 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2637,6 +2637,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       GenerateIntrinsics =
           ConstWithoutErrnoOrExceptions && ErrnoOverridenToFalseWithOpt;
   }
+  // The GPU targets do not want math intrinsics to reach the backend.
+  // TODO: We should add a custom pass to lower these early enough for LTO.
+  if (getTarget().getTriple().isNVPTX() || getTarget().getTriple().isAMDGPU())
+    GenerateIntrinsics = !getContext().BuiltinInfo.isPredefinedLibFunction(
+        BuiltinIDIfNoAsmLabel);
+
   if (GenerateIntrinsics) {
     switch (BuiltinIDIfNoAsmLabel) {
     case Builtin::BIceil:
diff --git a/clang/test/CodeGen/gpu-math-libcalls.c b/clang/test/CodeGen/gpu-math-libcalls.c
new file mode 100644
index 0000000000000..436ad0384ee2d
--- /dev/null
+++ b/clang/test/CodeGen/gpu-math-libcalls.c
@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa %s -emit-llvm -o - | FileCheck %s --check-prefix AMDGPU
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda %s -emit-llvm -o - | FileCheck %s --check-prefix NVPTX
+
+double sin(double);
+double cos(double);
+double sqrt(double);
+
+// AMDGPU-LABEL: define dso_local void @libcalls(
+// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) #[[ATTR3:[0-9]+]]
+// AMDGPU-NEXT:    [[CALL1:%.*]] = call double @cos(double noundef 0.000000e+00) #[[ATTR3]]
+// AMDGPU-NEXT:    [[CALL2:%.*]] = call double @sqrt(double noundef 0.000000e+00) #[[ATTR3]]
+// AMDGPU-NEXT:    ret void
+//
+// NVPTX-LABEL: define dso_local void @libcalls(
+// NVPTX-SAME: ) #[[ATTR0:[0-9]+]] {
+// NVPTX-NEXT:  [[ENTRY:.*:]]
+// NVPTX-NEXT:    [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) #[[ATTR3:[0-9]+]]
+// NVPTX-NEXT:    [[CALL1:%.*]] = call double @cos(double noundef 0.000000e+00) #[[ATTR3]]
+// NVPTX-NEXT:    [[CALL2:%.*]] = call double @sqrt(double noundef 0.000000e+00) #[[ATTR3]]
+// NVPTX-NEXT:    ret void
+//
+void libcalls() {
+  (void)sin(0.);
+  (void)cos(0.);
+  (void)sqrt(0.);
+}
+
+// AMDGPU-LABEL: define dso_local void @builtins(
+// AMDGPU-SAME: ) #[[ATTR0]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00)
+// AMDGPU-NEXT:    [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00)
+// AMDGPU-NEXT:    [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 0.000000e+00)
+// AMDGPU-NEXT:    ret void
+//
+// NVPTX-LABEL: define dso_local void @builtins(
+// NVPTX-SAME: ) #[[ATTR0]] {
+// NVPTX-NEXT:  [[ENTRY:.*:]]
+// NVPTX-NEXT:    [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00)
+// NVPTX-NEXT:    [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00)
+// NVPTX-NEXT:    [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 0.000000e+00)
+// NVPTX-NEXT:    ret void
+//
+void builtins() {
+  (void)__builtin_sin(0.);
+  (void)__builtin_cos(0.);
+  (void)__builtin_sqrt(0.);
+}