[clang] e719e93 - [CUDA][HIP] Fix CTAD for host/device constructors (#168711)
via cfe-commits
cfe-commits at lists.llvm.org
Tue Dec 2 07:34:53 PST 2025
Author: Yaxun (Sam) Liu
Date: 2025-12-02T10:34:48-05:00
New Revision: e719e93d4157edfad17e9bf40670decc158470c4
URL: https://github.com/llvm/llvm-project/commit/e719e93d4157edfad17e9bf40670decc158470c4
DIFF: https://github.com/llvm/llvm-project/commit/e719e93d4157edfad17e9bf40670decc158470c4.diff
LOG: [CUDA][HIP] Fix CTAD for host/device constructors (#168711)
Clang currently does not allow using CTAD in CUDA/HIP device functions
since deduction guides are treated as host-only. This patch fixes that
by treating deduction guides as host+device. The rationale is that
deduction guides do not actually generate code in IR, and there is an
existing check for device/host correctness for constructors.
The patch also suppresses duplicate implicit deduction guides from
host/device constructors with identical signatures and constraints
to prevent ambiguity.
For CUDA/HIP, deduction guides are now always implicitly enabled for
both host and device, which matches nvcc's effective behavior. Unlike
nvcc, which silently ignores explicit CUDA/HIP target attributes on
deduction guides, Clang diagnoses such attributes as errors to keep
the syntax clean and avoid confusion.
This ensures CTAD works correctly in CUDA/HIP for constructors with
different target attributes and provides clearer diagnostics when users
attempt to annotate deduction guides with CUDA/HIP target attributes.
Example:
```
#include <tuple>
__host__ __device__ void func()
{
std::tuple<int, int> t = std::tuple(1, 1);
}
```
This compiles with nvcc but fails with clang for CUDA/HIP without this
fix.
Reference: https://godbolt.org/z/WhT1GrhWE
Fixes: https://github.com/ROCm/ROCm/issues/5646
Fixes: https://github.com/llvm/llvm-project/issues/146646
Added:
clang/test/SemaCUDA/deduction-guide-attrs.cu
clang/test/SemaCUDA/deduction-guide-overload.cu
clang/test/SemaCUDA/deduction-guide.cu
Modified:
clang/docs/HIPSupport.rst
clang/docs/ReleaseNotes.rst
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/Sema/SemaCUDA.cpp
clang/lib/Sema/SemaDeclAttr.cpp
clang/lib/Sema/SemaTemplateDeductionGuide.cpp
Removed:
################################################################################
diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 6415bc8f248b2..bf0688636640d 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -412,6 +412,54 @@ Example Usage
__host__ __device__ int Four(void) __attribute__((weak, alias("_Z6__Fourv")));
__host__ __device__ float Four(float f) __attribute__((weak, alias("_Z6__Fourf")));
+C++17 Class Template Argument Deduction (CTAD) Support
+======================================================
+
+Clang supports C++17 Class Template Argument Deduction (CTAD) in both host and
+device code for HIP. This allows you to omit template arguments when creating
+class template instances, letting the compiler deduce them from constructor
+arguments.
+
+.. code-block:: c++
+
+ #include <tuple>
+
+ __host__ __device__ void func() {
+ std::tuple<int, int> t = std::tuple(1, 1);
+ }
+
+In the above example, ``std::tuple(1, 1)`` automatically deduces the type to be
+``std::tuple<int, int>``.
+
+Deduction Guides
+----------------
+
+User-defined deduction guides are also supported. Since deduction guides are not
+executable code and only participate in type deduction, they semantically behave
+as ``__host__ __device__``. This ensures they are available for deduction in both
+host and device contexts, and CTAD continues to respect any constraints on the
+corresponding constructors in the usual C++ way.
+
+.. code-block:: c++
+
+ template <typename T>
+ struct MyType {
+ T value;
+ __device__ MyType(T v) : value(v) {}
+ };
+
+ MyType(float) -> MyType<double>;
+
+ __device__ void deviceFunc() {
+ MyType m(1.0f); // Deduces MyType<double>
+ }
+
+.. note::
+
+ Explicit HIP target attributes such as ``__host__`` or ``__device__``
+ are not allowed on deduction guides. Clang treats all deduction guides
+ as if they were ``__host__ __device__`` and diagnoses any explicit
+ target attributes on them as errors.
Host and Device Attributes of Default Destructors
===================================================
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 3526ffb40f350..8d71280481b9a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -648,6 +648,20 @@ RISC-V Support
CUDA/HIP Language Changes
^^^^^^^^^^^^^^^^^^^^^^^^^
+- Clang now supports C++17 Class Template Argument Deduction (CTAD) in CUDA/HIP
+ device code by treating deduction guides as if they were ``__host__ __device__``.
+
+- Clang avoids ambiguous CTAD in CUDA/HIP by not synthesizing duplicate implicit
+ deduction guides when ``__host__`` and ``__device__`` constructors
diff er only
+ in CUDA target attributes (same signature and constraints).
+
+- Clang diagnoses CUDA/HIP target attributes written on deduction guides as errors,
+ since deduction guides do not participate in code generation.
+
+- Clang preserves distinct implicit deduction guides for constructors that
diff er
+ by constraints, so constraint-based CTAD works in CUDA/HIP device code as in
+ standard C++.
+
CUDA Support
^^^^^^^^^^^^
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 69ed958a2a2aa..cd0b9d09ec58f 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2769,6 +2769,9 @@ def err_deduction_guide_name_not_class_template : Error<
"cannot specify deduction guide for "
"%select{<error>|function template|variable template|alias template|"
"template template parameter|concept|dependent template name}0 %1">;
+def err_deduction_guide_target_attr : Error<
+ "in CUDA/HIP, target attributes are not allowed on deduction guides; "
+ "deduction guides are implicitly enabled for both host and device">;
def err_deduction_guide_wrong_scope : Error<
"deduction guide must be declared in the same scope as template %q0">;
def err_deduction_guide_defines_function : Error<
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index dd9bcab56b083..5df1c3b33a311 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -215,6 +215,12 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const FunctionDecl *D,
if (D == nullptr)
return CurCUDATargetCtx.Target;
+ // C++ deduction guides are never codegen'ed and only participate in template
+ // argument deduction. Treat them as if they were always host+device so that
+ // CUDA/HIP target checking never rejects their use based solely on target.
+ if (isa<CXXDeductionGuideDecl>(D))
+ return CUDAFunctionTarget::HostDevice;
+
if (D->hasAttr<CUDAInvalidTargetAttr>())
return CUDAFunctionTarget::InvalidTarget;
@@ -986,6 +992,12 @@ bool SemaCUDA::CheckCall(SourceLocation Loc, FunctionDecl *Callee) {
if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated())
return true;
+ // C++ deduction guides participate in overload resolution but are not
+ // callable functions and are never codegen'ed. Treat them as always
+ // allowed for CUDA/HIP compatibility checking.
+ if (isa<CXXDeductionGuideDecl>(Callee))
+ return true;
+
// FIXME: Is bailing out early correct here? Should we instead assume that
// the caller is a global initializer?
FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index c9d1ee76a2e52..8e7a5f8f07fa5 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -7987,6 +7987,19 @@ void Sema::ProcessDeclAttributeList(
}
}
+ // CUDA/HIP: disallow explicit CUDA target attributes on deduction guides.
+ // Deduction guides are not callable functions and never participate in
+ // codegen; they are always treated as host+device for CUDA/HIP semantic
+ // checks, so explicit target attributes on them would be misleading.
+ if (getLangOpts().CUDA)
+ if (auto *Guide = dyn_cast<CXXDeductionGuideDecl>(D);
+ Guide &&
+ (Guide->hasAttr<CUDAHostAttr>() || Guide->hasAttr<CUDADeviceAttr>() ||
+ Guide->hasAttr<CUDAGlobalAttr>())) {
+ Diag(Guide->getLocation(), diag::err_deduction_guide_target_attr);
+ Guide->setInvalidDecl();
+ }
+
// Do not permit 'constructor' or 'destructor' attributes on __device__ code.
if (getLangOpts().CUDAIsDevice && D->hasAttr<CUDADeviceAttr>() &&
(D->hasAttr<ConstructorAttr>() || D->hasAttr<DestructorAttr>()) &&
diff --git a/clang/lib/Sema/SemaTemplateDeductionGuide.cpp b/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
index bfb10665c25b1..ccac3d9ba0a72 100644
--- a/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
+++ b/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
@@ -54,6 +54,26 @@ using namespace clang;
using namespace sema;
namespace {
+
+/// Return true if two associated-constraint sets are semantically equal.
+static bool HaveSameAssociatedConstraints(
+ Sema &SemaRef, const NamedDecl *Old, ArrayRef<AssociatedConstraint> OldACs,
+ const NamedDecl *New, ArrayRef<AssociatedConstraint> NewACs) {
+ if (OldACs.size() != NewACs.size())
+ return false;
+ if (OldACs.empty())
+ return true;
+
+ // General case: pairwise compare each associated constraint expression.
+ Sema::TemplateCompareNewDeclInfo NewInfo(New);
+ for (size_t I = 0, E = OldACs.size(); I != E; ++I)
+ if (!SemaRef.AreConstraintExpressionsEqual(
+ Old, OldACs[I].ConstraintExpr, NewInfo, NewACs[I].ConstraintExpr))
+ return false;
+
+ return true;
+}
+
/// Tree transform to "extract" a transformed type from a class template's
/// constructor to a deduction guide.
class ExtractTypeForDeductionGuide
@@ -218,9 +238,51 @@ buildDeductionGuide(Sema &SemaRef, TemplateDecl *OriginalTemplate,
TInfo->getTypeLoc().castAs<FunctionProtoTypeLoc>().getParams();
// Build the implicit deduction guide template.
+ QualType GuideType = TInfo->getType();
+
+ // In CUDA/HIP mode, avoid duplicate implicit guides that
diff er only in CUDA
+ // target attributes (same constructor signature and constraints).
+ if (IsImplicit && Ctor && SemaRef.getLangOpts().CUDA) {
+ SmallVector<AssociatedConstraint, 4> NewACs;
+ Ctor->getAssociatedConstraints(NewACs);
+
+ for (NamedDecl *Existing : DC->lookup(DeductionGuideName)) {
+ auto *ExistingFT = dyn_cast<FunctionTemplateDecl>(Existing);
+ auto *ExistingGuide =
+ ExistingFT
+ ? dyn_cast<CXXDeductionGuideDecl>(ExistingFT->getTemplatedDecl())
+ : dyn_cast<CXXDeductionGuideDecl>(Existing);
+ if (!ExistingGuide)
+ continue;
+
+ // Only consider guides that were also synthesized from a constructor.
+ auto *ExistingCtor = ExistingGuide->getCorrespondingConstructor();
+ if (!ExistingCtor)
+ continue;
+
+ // If the underlying constructors are overloads (
diff erent signatures once
+ // CUDA attributes are ignored), they should each get their own guides.
+ if (SemaRef.IsOverload(Ctor, ExistingCtor,
+ /*UseMemberUsingDeclRules=*/false,
+ /*ConsiderCudaAttrs=*/false))
+ continue;
+
+ // At this point, the constructors have the same signature ignoring CUDA
+ // attributes. Decide whether their associated constraints are also the
+ // same; only in that case do we treat one guide as a duplicate of the
+ // other.
+ SmallVector<AssociatedConstraint, 4> ExistingACs;
+ ExistingCtor->getAssociatedConstraints(ExistingACs);
+
+ if (HaveSameAssociatedConstraints(SemaRef, ExistingCtor, ExistingACs,
+ Ctor, NewACs))
+ return Existing;
+ }
+ }
+
auto *Guide = CXXDeductionGuideDecl::Create(
- SemaRef.Context, DC, LocStart, ES, Name, TInfo->getType(), TInfo, LocEnd,
- Ctor, DeductionCandidate::Normal, FunctionTrailingRC);
+ SemaRef.Context, DC, LocStart, ES, Name, GuideType, TInfo, LocEnd, Ctor,
+ DeductionCandidate::Normal, FunctionTrailingRC);
Guide->setImplicit(IsImplicit);
Guide->setParams(Params);
diff --git a/clang/test/SemaCUDA/deduction-guide-attrs.cu b/clang/test/SemaCUDA/deduction-guide-attrs.cu
new file mode 100644
index 0000000000000..c706a013a5eb8
--- /dev/null
+++ b/clang/test/SemaCUDA/deduction-guide-attrs.cu
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN: -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN: -verify %s
+
+#include "Inputs/cuda.h"
+
+template <typename T>
+struct S {
+ __host__ __device__ S(T);
+};
+
+template <typename T>
+S(T) -> S<T>;
+
+// CUDA/HIP target attributes on deduction guides are rejected.
+template <typename U>
+__host__ S(U) -> S<U>; // expected-error {{in CUDA/HIP, target attributes are not allowed on deduction guides; deduction guides are implicitly enabled for both host and device}}
+
+template <typename V>
+__device__ S(V) -> S<V>; // expected-error {{in CUDA/HIP, target attributes are not allowed on deduction guides; deduction guides are implicitly enabled for both host and device}}
+
+template <typename W>
+__global__ S(W) -> S<W>; // expected-error {{in CUDA/HIP, target attributes are not allowed on deduction guides; deduction guides are implicitly enabled for both host and device}}
diff --git a/clang/test/SemaCUDA/deduction-guide-overload.cu b/clang/test/SemaCUDA/deduction-guide-overload.cu
new file mode 100644
index 0000000000000..935f6395692a1
--- /dev/null
+++ b/clang/test/SemaCUDA/deduction-guide-overload.cu
@@ -0,0 +1,111 @@
+// RUN: %clang_cc1 -std=c++20 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN: -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -std=c++20 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN: -verify %s
+// expected-no-diagnostics
+
+#include "Inputs/cuda.h"
+
+// This test exercises class template argument deduction (CTAD) when there are
+// multiple constructors that
diff er only by constraints. In CUDA/HIP mode, the
+// implementation must *not* collapse implicit deduction guides that have the
+// same function type but
diff erent constraints; otherwise, CTAD can lose viable
+// candidates.
+
+template <typename T>
+concept Signed = __is_signed(T);
+
+template <typename T>
+concept NotSigned = !Signed<T>;
+
+// 1) Constrained ctors with
diff erent constraints: ensure we keep
+// deduction guides that
diff er only by constraints.
+
+template <typename T>
+struct OverloadCTAD {
+ __host__ __device__ OverloadCTAD(T) requires Signed<T>;
+ __host__ __device__ OverloadCTAD(T) requires NotSigned<T>;
+};
+
+__host__ __device__ void use_overload_ctad_hd() {
+ OverloadCTAD a(1); // T = int, uses Signed-constrained guide
+ OverloadCTAD b(1u); // T = unsigned int, uses NotSigned-constrained guide
+}
+
+__device__ void use_overload_ctad_dev() {
+ OverloadCTAD c(1);
+ OverloadCTAD d(1u);
+}
+
+__global__ void use_overload_ctad_global() {
+ OverloadCTAD e(1);
+ OverloadCTAD f(1u);
+}
+
+// 2) Add a pair of constructors that have the same signature and the same
+// constraint but
diff er only by CUDA target attributes. This exercises the
+// case where two implicit deduction guides would be identical except for
+// their originating constructor's CUDA target.
+
+template <typename T>
+struct OverloadCTADTargets {
+ __host__ OverloadCTADTargets(T) requires Signed<T>;
+ __device__ OverloadCTADTargets(T) requires Signed<T>;
+};
+
+__host__ void use_overload_ctad_targets_host() {
+ OverloadCTADTargets g(1);
+}
+
+__device__ void use_overload_ctad_targets_device() {
+ OverloadCTADTargets h(1);
+}
+
+// 3) Unconstrained host/device duplicates: identical signatures and no
+// constraints,
diff ering only by CUDA target attributes.
+
+template <typename T>
+struct UnconstrainedHD {
+ __host__ UnconstrainedHD(T);
+ __device__ UnconstrainedHD(T);
+};
+
+__host__ __device__ void use_unconstrained_hd_hd() {
+ UnconstrainedHD u1(1);
+}
+
+__device__ void use_unconstrained_hd_dev() {
+ UnconstrainedHD u2(1);
+}
+
+__global__ void use_unconstrained_hd_global() {
+ UnconstrainedHD u3(1);
+}
+
+// 4) Constrained vs unconstrained ctors with the same signature: guides
+// must not be collapsed away when constraints
diff er.
+
+template <typename T>
+concept IsInt = __is_same(T, int);
+
+template <typename T>
+struct ConstrainedVsUnconstrained {
+ __host__ __device__ ConstrainedVsUnconstrained(T);
+ __host__ __device__ ConstrainedVsUnconstrained(T) requires IsInt<T>;
+};
+
+__host__ __device__ void use_constrained_vs_unconstrained_hd() {
+ ConstrainedVsUnconstrained a(1); // T = int, constrained guide viable
+ ConstrainedVsUnconstrained b(1u); // T = unsigned, only unconstrained guide
+}
+
+__device__ void use_constrained_vs_unconstrained_dev() {
+ ConstrainedVsUnconstrained c(1);
+ ConstrainedVsUnconstrained d(1u);
+}
+
+__global__ void use_constrained_vs_unconstrained_global() {
+ ConstrainedVsUnconstrained e(1);
+ ConstrainedVsUnconstrained f(1u);
+}
+
diff --git a/clang/test/SemaCUDA/deduction-guide.cu b/clang/test/SemaCUDA/deduction-guide.cu
new file mode 100644
index 0000000000000..30e02f7518053
--- /dev/null
+++ b/clang/test/SemaCUDA/deduction-guide.cu
@@ -0,0 +1,47 @@
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN: -fcuda-is-device -verify=expected,dev %s
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN: -verify %s
+
+#include "Inputs/cuda.h"
+
+template <class T>
+struct CTADType { // expected-note 2{{candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided}}
+ // expected-note at -1 2{{candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided}}
+ T first;
+ T second;
+
+ CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
+ __device__ CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
+ __host__ __device__ CTADType(T x, T y) : first(x), second(y) {} // expected-note 2{{candidate constructor not viable: requires 2 arguments, but 3 were provided}}
+ CTADType(T x, T y, T z) : first(x), second(z) {} // dev-note {{'CTADType' declared here}}
+ // expected-note at -1 {{candidate constructor not viable: call to __host__ function from __device__ function}}
+ // expected-note at -2 {{candidate constructor not viable: call to __host__ function from __global__ function}}
+};
+
+template <class T>
+CTADType(T, T) -> CTADType<T>;
+
+__host__ __device__ void use_ctad_host_device() {
+ CTADType ctad_from_two_args(1, 1);
+ CTADType ctad_from_one_arg(1);
+ CTADType ctad_from_three_args(1, 2, 3); // dev-error {{reference to __host__ function 'CTADType' in __host__ __device__ function}}
+}
+
+__host__ void use_ctad_host() {
+ CTADType ctad_from_two_args(1, 1);
+ CTADType ctad_from_one_arg(1);
+ CTADType ctad_from_three_args(1, 2, 3);
+}
+
+__device__ void use_ctad_device() {
+ CTADType ctad_from_two_args(1, 1);
+ CTADType ctad_from_one_arg(1);
+ CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
+}
+
+__global__ void use_ctad_global() {
+ CTADType ctad_from_two_args(1, 1);
+ CTADType ctad_from_one_arg(1);
+ CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
+}
More information about the cfe-commits
mailing list