[clang] [CUDA][HIP] Fix CTAD for host/device constructors (PR #168711)

Tue Nov 25 19:51:03 PST 2025

https://github.com/yxsamliu updated https://github.com/llvm/llvm-project/pull/168711

>From eaf4aec69be8bebcb5205e6c44ea5c5db6217067 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <yaxun.liu at amd.com>
Date: Wed, 19 Nov 2025 00:16:46 -0500
Subject: [PATCH] [CUDA][HIP] Fix CTAD for host/device constructors

Clang currently does not allow using CTAD in CUDA/HIP device functions
since deduction guides are treated as host-only. This patch fixes that
by treating deduction guides as host+device. The rationale is that
deduction guides do not actually generate code in IR, and there is an
existing check for device/host correctness for constructors.

The patch also suppresses duplicate implicit deduction guides from
host/device constructors with identical signatures to prevent
ambiguity.

For CUDA/HIP, deduction guides are now always implicitly enabled for
both host and device, which matches nvcc's effective behavior. Unlike
nvcc, which silently ignores explicit CUDA/HIP target attributes on
deduction guides, Clang diagnoses such attributes as errors to keep
the syntax clean and avoid confusion.

This ensures CTAD works correctly in CUDA/HIP for constructors with
different target attributes and provides clearer diagnostics when users
attempt to annotate deduction guides with CUDA/HIP target attributes.

Example:

```
  #include <tuple>

  __host__ __device__ void func()
  {
    std::tuple<int, int> t = std::tuple(1, 1);
  }
```

This compiles with nvcc but fails with clang for CUDA/HIP without this
fix.

Reference: https://godbolt.org/z/WhT1GrhWE

Fixes: https://github.com/ROCm/ROCm/issues/5646

Fixes: https://github.com/llvm/llvm-project/issues/146646
---
 clang/docs/HIPSupport.rst                     | 45 ++++++++++++++++++
 .../clang/Basic/DiagnosticSemaKinds.td        |  3 ++
 clang/lib/Sema/SemaCUDA.cpp                   | 12 +++++
 clang/lib/Sema/SemaDeclAttr.cpp               | 12 +++++
 clang/lib/Sema/SemaTemplateDeductionGuide.cpp | 26 +++++++++-
 clang/test/SemaCUDA/deduction-guide-attrs.cu  | 24 ++++++++++
 clang/test/SemaCUDA/deduction-guide.cu        | 47 +++++++++++++++++++
 7 files changed, 167 insertions(+), 2 deletions(-)
 create mode 100644 clang/test/SemaCUDA/deduction-guide-attrs.cu
 create mode 100644 clang/test/SemaCUDA/deduction-guide.cu

diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 92ea07974373e..4c477cc1e1634 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -376,6 +376,51 @@ Example Usage
       basePtr->virtualFunction(); // Allowed since obj is constructed in device code
    }
 
+C++17 Class Template Argument Deduction (CTAD) Support
+======================================================
+
+Clang supports C++17 Class Template Argument Deduction (CTAD) in both host and device code for HIP.
+This allows you to omit template arguments when creating class template instances, letting the compiler
+deduce them from constructor arguments.
+
+.. code-block:: c++
+
+   #include <tuple>
+
+   __host__ __device__ void func() {
+     std::tuple<int, int> t = std::tuple(1, 1);
+   }
+
+In the above example, ``std::tuple(1, 1)`` automatically deduces the type to be ``std::tuple<int, int>``.
+
+Deduction Guides
+----------------
+
+User-defined deduction guides are also supported. Since deduction guides are not executable code and only
+participate in type deduction, they semantically behave as ``__host__ __device__``. This ensures they are
+available for deduction in both host and device contexts.
+
+.. code-block:: c++
+
+   template <typename T>
+   struct MyType {
+     T value;
+     __device__ MyType(T v) : value(v) {}
+   };
+
+   MyType(float) -> MyType<double>;
+
+   __device__ void deviceFunc() {
+     MyType m(1.0f); // Deduces MyType<double>
+   }
+
+.. note::
+
+   Explicit HIP target attributes such as ``__host__`` or ``__device__``
+   are not allowed on deduction guides. Clang treats all deduction guides
+   as if they were ``__host__ __device__`` and diagnoses any explicit
+   target attributes on them as errors.
+
 Host and Device Attributes of Default Destructors
 ===================================================
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 53aa86a7dabde..8af54c288b46f 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2769,6 +2769,9 @@ def err_deduction_guide_name_not_class_template : Error<
   "cannot specify deduction guide for "
   "%select{<error>|function template|variable template|alias template|"
   "template template parameter|concept|dependent template name}0 %1">;
+def err_deduction_guide_target_attr : Error<
+  "deduction guides are implicitly enabled for both host and device in "
+  "CUDA/HIP; explicit CUDA/HIP target attributes are not allowed">;
 def err_deduction_guide_wrong_scope : Error<
   "deduction guide must be declared in the same scope as template %q0">;
 def err_deduction_guide_defines_function : Error<
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 31735a0f5feb3..8d1e03c8bc571 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -137,6 +137,12 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const FunctionDecl *D,
   if (D == nullptr)
     return CurCUDATargetCtx.Target;
 
+  // C++ deduction guides are never codegen'ed and only participate in template
+  // argument deduction.  Treat them as if they were always host+device so that
+  // CUDA/HIP target checking never rejects their use based solely on target.
+  if (isa<CXXDeductionGuideDecl>(D))
+    return CUDAFunctionTarget::HostDevice;
+
   if (D->hasAttr<CUDAInvalidTargetAttr>())
     return CUDAFunctionTarget::InvalidTarget;
 
@@ -907,6 +913,12 @@ bool SemaCUDA::CheckCall(SourceLocation Loc, FunctionDecl *Callee) {
   if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated())
     return true;
 
+  // C++ deduction guides participate in overload resolution but are not
+  // callable functions and are never codegen'ed.  Treat them as always
+  // allowed for CUDA/HIP compatibility checking.
+  if (isa<CXXDeductionGuideDecl>(Callee))
+    return true;
+
   // FIXME: Is bailing out early correct here?  Should we instead assume that
   // the caller is a global initializer?
   FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index e3af5023c74d0..90c4581cec5a7 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -7984,6 +7984,18 @@ void Sema::ProcessDeclAttributeList(
     }
   }
 
+  // CUDA/HIP: disallow explicit CUDA target attributes on deduction guides.
+  // Deduction guides are not callable functions and never participate in
+  // codegen; they are always treated as host+device for CUDA/HIP semantic
+  // checks, so explicit target attributes on them would be misleading noise.
+  if (getLangOpts().CUDA)
+    if (auto *Guide = dyn_cast<CXXDeductionGuideDecl>(D))
+      if (Guide->hasAttr<CUDAHostAttr>() || Guide->hasAttr<CUDADeviceAttr>() ||
+          Guide->hasAttr<CUDAGlobalAttr>()) {
+        Diag(Guide->getLocation(), diag::err_deduction_guide_target_attr);
+        Guide->setInvalidDecl();
+      }
+
   // Do not permit 'constructor' or 'destructor' attributes on __device__ code.
   if (getLangOpts().CUDAIsDevice && D->hasAttr<CUDADeviceAttr>() &&
       (D->hasAttr<ConstructorAttr>() || D->hasAttr<DestructorAttr>()) &&
diff --git a/clang/lib/Sema/SemaTemplateDeductionGuide.cpp b/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
index bfb10665c25b1..6949cec0dc141 100644
--- a/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
+++ b/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
@@ -218,9 +218,31 @@ buildDeductionGuide(Sema &SemaRef, TemplateDecl *OriginalTemplate,
       TInfo->getTypeLoc().castAs<FunctionProtoTypeLoc>().getParams();
 
   // Build the implicit deduction guide template.
+  QualType GuideType = TInfo->getType();
+
+  // In CUDA/HIP mode, avoid creating duplicate implicit deduction guides with
+  // identical function types. This can happen when there are separate
+  // __host__ and __device__ constructors with the same signature; each would
+  // otherwise synthesize its own implicit deduction guide, leading to
+  // ambiguous CTAD purely due to target attributes. For such cases we keep the
+  // first guide we created and skip building another one.
+  if (IsImplicit && Ctor && SemaRef.getLangOpts().CUDA)
+    for (NamedDecl *Existing : DC->lookup(DeductionGuideName)) {
+      auto *ExistingFT = dyn_cast<FunctionTemplateDecl>(Existing);
+      auto *ExistingGuide =
+          ExistingFT
+              ? dyn_cast<CXXDeductionGuideDecl>(ExistingFT->getTemplatedDecl())
+              : dyn_cast<CXXDeductionGuideDecl>(Existing);
+      if (!ExistingGuide)
+        continue;
+
+      if (SemaRef.Context.hasSameType(ExistingGuide->getType(), GuideType))
+        return Existing;
+    }
+
   auto *Guide = CXXDeductionGuideDecl::Create(
-      SemaRef.Context, DC, LocStart, ES, Name, TInfo->getType(), TInfo, LocEnd,
-      Ctor, DeductionCandidate::Normal, FunctionTrailingRC);
+      SemaRef.Context, DC, LocStart, ES, Name, GuideType, TInfo, LocEnd, Ctor,
+      DeductionCandidate::Normal, FunctionTrailingRC);
   Guide->setImplicit(IsImplicit);
   Guide->setParams(Params);
 
diff --git a/clang/test/SemaCUDA/deduction-guide-attrs.cu b/clang/test/SemaCUDA/deduction-guide-attrs.cu
new file mode 100644
index 0000000000000..6af975874fa99
--- /dev/null
+++ b/clang/test/SemaCUDA/deduction-guide-attrs.cu
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:            -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:            -verify %s
+
+#include "Inputs/cuda.h"
+
+template <typename T>
+struct S {
+  __host__ __device__ S(T);
+};
+
+template <typename T>
+S(T) -> S<T>;
+
+// CUDA/HIP target attributes on deduction guides are rejected.
+template <typename U>
+__host__ S(U) -> S<U>;   // expected-error {{deduction guides are implicitly enabled for both host and device in CUDA/HIP; explicit CUDA/HIP target attributes are not allowed}}
+
+template <typename V>
+__device__ S(V) -> S<V>; // expected-error {{deduction guides are implicitly enabled for both host and device in CUDA/HIP; explicit CUDA/HIP target attributes are not allowed}}
+
+template <typename W>
+__global__ S(W) -> S<W>; // expected-error {{deduction guides are implicitly enabled for both host and device in CUDA/HIP; explicit CUDA/HIP target attributes are not allowed}}
diff --git a/clang/test/SemaCUDA/deduction-guide.cu b/clang/test/SemaCUDA/deduction-guide.cu
new file mode 100644
index 0000000000000..30e02f7518053
--- /dev/null
+++ b/clang/test/SemaCUDA/deduction-guide.cu
@@ -0,0 +1,47 @@
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:            -fcuda-is-device -verify=expected,dev %s
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:            -verify %s
+
+#include "Inputs/cuda.h"
+
+template <class T>
+struct CTADType { // expected-note 2{{candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided}}
+                  // expected-note at -1 2{{candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided}}
+  T first;
+  T second;
+
+  CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
+  __device__ CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
+  __host__ __device__ CTADType(T x, T y) : first(x), second(y) {} // expected-note 2{{candidate constructor not viable: requires 2 arguments, but 3 were provided}}
+  CTADType(T x, T y, T z) : first(x), second(z) {} // dev-note {{'CTADType' declared here}}
+                                                   // expected-note at -1 {{candidate constructor not viable: call to __host__ function from __device__ function}}
+                                                   // expected-note at -2 {{candidate constructor not viable: call to __host__ function from __global__ function}}
+};
+
+template <class T>
+CTADType(T, T) -> CTADType<T>;
+
+__host__ __device__ void use_ctad_host_device() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType ctad_from_three_args(1, 2, 3); // dev-error {{reference to __host__ function 'CTADType' in __host__ __device__ function}}
+}
+
+__host__ void use_ctad_host() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType ctad_from_three_args(1, 2, 3);
+}
+
+__device__ void use_ctad_device() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
+}
+
+__global__ void use_ctad_global() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
+}