[clang] [HIP] fix host-used external kernel (PR #83870)
Yaxun Liu via cfe-commits
cfe-commits at lists.llvm.org
Mon Mar 4 15:22:05 PST 2024
https://github.com/yxsamliu updated https://github.com/llvm/llvm-project/pull/83870
>From dc94bb78adb323a539d195b791e50cf69c774246 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <yaxun.liu at amd.com>
Date: Mon, 4 Mar 2024 11:38:06 -0500
Subject: [PATCH] [HIP] fix host-used external kernel
In -fgpu-rdc mode, when an external kernel is used by a host function
with weak_odr linkage (e.g. explicitly instantiated template function),
the kernel should not be marked as host-used external kernel, since
the host function may be dropped by the linker. Mark the external
kernel as host-used external kernel will force a reference to
the external kernel, which the user may not define in other TU.
Fixes: https://github.com/llvm/llvm-project/issues/83771
---
clang/lib/Sema/SemaCUDA.cpp | 4 +++-
clang/lib/Sema/SemaExpr.cpp | 5 ++++-
clang/test/CodeGenCUDA/host-used-extern.cu | 13 +++++++++++++
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 6a66ecf6f94c17..0acfb0d564694f 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -895,7 +895,9 @@ bool Sema::CheckCUDACall(SourceLocation Loc, FunctionDecl *Callee) {
if (DiagKind == SemaDiagnosticBuilder::K_Nop) {
// For -fgpu-rdc, keep track of external kernels used by host functions.
if (LangOpts.CUDAIsDevice && LangOpts.GPURelocatableDeviceCode &&
- Callee->hasAttr<CUDAGlobalAttr>() && !Callee->isDefined())
+ Callee->hasAttr<CUDAGlobalAttr>() && !Callee->isDefined() &&
+ !Caller->getDescribedFunctionTemplate() &&
+ getASTContext().GetGVALinkageForFunction(Caller) == GVA_StrongExternal)
getASTContext().CUDAExternalDeviceDeclODRUsedByHost.insert(Callee);
return true;
}
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 0a449fc1082bd4..299f066a084c05 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -19211,7 +19211,10 @@ MarkVarDeclODRUsed(ValueDecl *V, SourceLocation Loc, Sema &SemaRef,
} else if (VarTarget == Sema::CVT_Device &&
!Var->hasAttr<CUDASharedAttr>() &&
(UserTarget == Sema::CFT_Host ||
- UserTarget == Sema::CFT_HostDevice)) {
+ UserTarget == Sema::CFT_HostDevice) &&
+ !FD->getDescribedFunctionTemplate() &&
+ SemaRef.getASTContext().GetGVALinkageForFunction(FD) ==
+ GVA_StrongExternal) {
// Record a CUDA/HIP device side variable if it is ODR-used
// by host code. This is done conservatively, when the variable is
// referenced in any of the following contexts:
diff --git a/clang/test/CodeGenCUDA/host-used-extern.cu b/clang/test/CodeGenCUDA/host-used-extern.cu
index e8f8e12aad47d1..b82b2d65aa741f 100644
--- a/clang/test/CodeGenCUDA/host-used-extern.cu
+++ b/clang/test/CodeGenCUDA/host-used-extern.cu
@@ -24,6 +24,7 @@
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel2v
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel3v
+// NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel5v
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @var2
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @var3
// NEG-NOT: @__clang_gpu_used_external = {{.*}} @ext_shvar
@@ -44,6 +45,10 @@ __global__ void kernel3();
// kernel4 is marked as used even though it is not called.
__global__ void kernel4();
+// kernel5 is not marked as used since it is called by host function
+// with linkonce_odr linkage, which may be dropped by linker.
+__global__ void kernel5();
+
extern __device__ int var1;
__device__ int var2;
@@ -67,3 +72,11 @@ __global__ void test_lambda_using_extern_shared() {
};
lambda();
}
+
+template<class T>
+void template_caller() {
+ kernel5<<<1, 1>>>();
+ var1 = 1;
+}
+
+template void template_caller<int>();
More information about the cfe-commits
mailing list