[clang] [clang][CUDA] Avoid ambiguity in host/device template specializations (PR #201049)
Steffen Larsen via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 2 05:47:59 PDT 2026
https://github.com/steffenlarsen updated https://github.com/llvm/llvm-project/pull/201049
>From b88265cd48b8e363d539bb5c1862634c7fb3e2af Mon Sep 17 00:00:00 2001
From: Steffen Holst Larsen <sholstla at amd.com>
Date: Tue, 2 Jun 2026 02:39:11 -0500
Subject: [PATCH 1/5] [clang][CUDA] Avoid ambiguity in host/device template
specializations
This commit changes SemaOverload to resolve an otherwise diagnosed
ambiguity between addresses of template specializations of functions
that are overloaded for both device and host. Similar to how it works
for non-templated function overloads, these changes prioritizes the
specializations that corresponds to the target of the owning function,
i.e. if compiling for host, the address of the host specialization takes
precedence over the device specialization and vice versa.
Fixes https://github.com/llvm/llvm-project/issues/199299
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
---
clang/lib/Sema/SemaOverload.cpp | 6 ++--
clang/test/SemaCUDA/addr-of-overloaded-fn.cu | 2 ++
.../addr-of-overloaded-template-fn.cu | 28 +++++++++++++++++++
3 files changed, 33 insertions(+), 3 deletions(-)
create mode 100644 clang/test/SemaCUDA/addr-of-overloaded-template-fn.cu
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index e11bbd7085798..ecda430c8424a 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -13742,6 +13742,9 @@ class AddressOfFunctionResolver {
OvlExpr->copyTemplateArgumentsInto(OvlExplicitTemplateArgs);
if (FindAllFunctionsThatMatchTargetTypeExactly()) {
+ if (Matches.size() > 1 && S.getLangOpts().CUDA)
+ EliminateSuboptimalCudaMatches();
+
// C++ [over.over]p4:
// If more than one function is selected, [...]
if (Matches.size() > 1 && !eliminiateSuboptimalOverloadCandidates()) {
@@ -13752,9 +13755,6 @@ class AddressOfFunctionResolver {
EliminateAllExceptMostSpecializedTemplate();
}
}
-
- if (S.getLangOpts().CUDA && Matches.size() > 1)
- EliminateSuboptimalCudaMatches();
}
bool hasComplained() const { return HasComplained; }
diff --git a/clang/test/SemaCUDA/addr-of-overloaded-fn.cu b/clang/test/SemaCUDA/addr-of-overloaded-fn.cu
index 03c7f7c3bd5b7..d91ee8d80d006 100644
--- a/clang/test/SemaCUDA/addr-of-overloaded-fn.cu
+++ b/clang/test/SemaCUDA/addr-of-overloaded-fn.cu
@@ -2,6 +2,8 @@
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -fcuda-is-device -verify %s
#include "Inputs/cuda.h"
diff --git a/clang/test/SemaCUDA/addr-of-overloaded-template-fn.cu b/clang/test/SemaCUDA/addr-of-overloaded-template-fn.cu
new file mode 100644
index 0000000000000..16df30fc0f375
--- /dev/null
+++ b/clang/test/SemaCUDA/addr-of-overloaded-template-fn.cu
@@ -0,0 +1,28 @@
+// expected-no-diagnostics
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -fsyntax-only -fcuda-is-device -verify %s
+
+// Tests that no ambiguities are diagnosed when resolving addresses of
+// specialized template functions with the same overloads on host and device.
+
+#include "Inputs/cuda.h"
+
+template <typename T> __host__ void overload(T) {}
+template <typename T> __device__ void overload(T) {}
+
+__host__ __device__ void test_hd() {
+ void (*x)(int) = overload<int>;
+ void (*y)(float) = overload<float>;
+}
+
+__host__ void test_host() {
+ void (*x)(int) = overload<int>;
+ void (*y)(float) = overload<float>;
+}
+__device__ void test_device() {
+ void (*x)(int) = overload<int>;
+ void (*y)(float) = overload<float>;
+}
>From 8231ee84cc252664953352a43deb130ff9c51c76 Mon Sep 17 00:00:00 2001
From: Steffen Holst Larsen <sholstla at amd.com>
Date: Tue, 2 Jun 2026 04:53:07 -0500
Subject: [PATCH 2/5] Add release note
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
---
clang/docs/ReleaseNotes.rst | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index fc0a1d2d4c926..445211745eb14 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -598,6 +598,10 @@ Improvements to Clang's diagnostics
- Clang now rejects inline asm constraints and clobbers that contain an
embedded null character, instead of silently truncating them. (#GH173900)
+- Fixed false positive for host-device ambiguities when retrieving the address
+ of specializations of templated functions that have overloads for both host
+ and device.
+
Improvements to Clang's time-trace
----------------------------------
>From a220fe5f8a313914d899b3779a7083e885f10233 Mon Sep 17 00:00:00 2001
From: Steffen Holst Larsen <sholstla at amd.com>
Date: Tue, 2 Jun 2026 04:54:08 -0500
Subject: [PATCH 3/5] Mention issue in release note
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
---
clang/docs/ReleaseNotes.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 445211745eb14..01074c6485ad3 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -600,7 +600,7 @@ Improvements to Clang's diagnostics
- Fixed false positive for host-device ambiguities when retrieving the address
of specializations of templated functions that have overloads for both host
- and device.
+ and device. (#GH199299)
Improvements to Clang's time-trace
----------------------------------
>From 20bdc987bbfcbf78e1c98edcea0ad2e7fc2d2714 Mon Sep 17 00:00:00 2001
From: Steffen Holst Larsen <sholstla at amd.com>
Date: Tue, 2 Jun 2026 04:56:03 -0500
Subject: [PATCH 4/5] Mention CUDA and HIP in the note
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
---
clang/docs/ReleaseNotes.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 01074c6485ad3..537e27d55e409 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -598,9 +598,9 @@ Improvements to Clang's diagnostics
- Clang now rejects inline asm constraints and clobbers that contain an
embedded null character, instead of silently truncating them. (#GH173900)
-- Fixed false positive for host-device ambiguities when retrieving the address
- of specializations of templated functions that have overloads for both host
- and device. (#GH199299)
+- Fixed false positive for host-device ambiguities in CUDA/HIP when retrieving
+ the address of specializations of templated functions that have overloads for
+ both host and device. (#GH199299)
Improvements to Clang's time-trace
----------------------------------
>From 5b24bbe564b0ac978cc913e568d65bd1ce5a912a Mon Sep 17 00:00:00 2001
From: Steffen Larsen <sholstla at amd.com>
Date: Tue, 2 Jun 2026 14:47:47 +0200
Subject: [PATCH 5/5] Move release note to CUDA Support section
---
clang/docs/ReleaseNotes.rst | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 537e27d55e409..2fb117ea36043 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -598,10 +598,6 @@ Improvements to Clang's diagnostics
- Clang now rejects inline asm constraints and clobbers that contain an
embedded null character, instead of silently truncating them. (#GH173900)
-- Fixed false positive for host-device ambiguities in CUDA/HIP when retrieving
- the address of specializations of templated functions that have overloads for
- both host and device. (#GH199299)
-
Improvements to Clang's time-trace
----------------------------------
@@ -821,6 +817,10 @@ CUDA/HIP Language Changes
CUDA Support
^^^^^^^^^^^^
+- Fixed a bug where host-device ambiguities in CUDA/HIP when retrieving the
+ address of specializations of templated functions that have overloads for both
+ host and device. (#GH199299)
+
AIX Support
^^^^^^^^^^^
More information about the cfe-commits
mailing list