[libcxx-commits] [clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

Mon Aug 18 23:19:10 PDT 2025

https://github.com/snarkmaster updated https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 01/15] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
    is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
    caught all the interesting bugs that I had in earlier revs, and covers
    equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

    * I considered optimizing unmodified `await_suspend()` methods, as long as
      they unconditionally end with an `h.destroy()` call on the current
      handle, or an exception.  However, this would (a) force dynamic dispatch
      for `destroy` -- bloating IR & reducing optimization opportunities, (b)
      require far more complex, delicate, and fragile analysis, (c) retain more
      of the frame setup, so that e.g.  `h.done()` works properly.  The current
      solution shortcuts all these concerns.

    * I want to `Promise&`, rather than `std::coroutine_handle` to
      `await_suspend_destroy` -- this is safer, simpler, and more efficient.
      Short-circuiting corotuines should not touch the handle.  This decision
      forces the attribue to go on the class.  Resolving a method attribute
      would have required looking up overloads for both types, and choosing
      one, which is costly and a bad UX to boot.

    * `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
      This portability / compatibility solution avoids dire issues that would
      arise if users relied on `__has_cpp_attribute` and the declaration and
      definition happened to use different toolchains.  In particular, it will
      even be safe for a future compiler release to killswitch this attribute
      by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
                 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
    a further optimization opportunity.  But, I'm sure there are
    higher-leverage ways of making these non-suspending coros compile better, I
    just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
    would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
    https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
    Heap allocs are definitely elided, the compiled code looks like a function,
    not a coroutine, but there's still an optimization gap.  On the plus side,
    this results in a 4x speedup (!) in optimized ASAN builds (numbers not
    shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result<int> result_coro(result<int>&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result<int> catching_result_func(result<int>&& r) {
  return result_catch_all([&]() -> result<int> {
    if (r.has_value()) {
      return r.value_or_throw() + 1;
    }
    return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result<int> non_catching_result_func(result<int>&& r) {
  if (r.has_value()) {
    return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}

============================================================================
[...]lly/result/test/result_coro_bench.cpp     relative  time/iter   iters/s
============================================================================
result_coro_success                                        13.61ns    73.49M
non_catching_result_func_success                            3.39ns   295.00M
catching_result_func_success                                4.41ns   226.88M
result_coro_error                                          19.55ns    51.16M
non_catching_result_func_error                              9.15ns   109.26M
catching_result_func_error                                 10.19ns    98.10M

============================================================================
[...]lly/result/test/result_coro_bench.cpp     relative  time/iter   iters/s
============================================================================
result_coro_success                                        10.59ns    94.39M
non_catching_result_func_success                            3.39ns   295.00M
catching_result_func_success                                4.07ns   245.81M
result_coro_error                                          13.66ns    73.18M
non_catching_result_func_error                              9.00ns   111.11M
catching_result_func_error                                 10.04ns    99.63M
```

Demo program from the Compiler Explorer link above:

```cpp
 #include <coroutine>
 #include <optional>

// Read this LATER -- this implementation detail isn't required to understand
// the value of [[clang::coro_await_suspend_destroy]].
//
// `optional_wrapper` exists since `get_return_object()` can't return
// `std::optional` directly. C++ coroutines have a fundamental timing mismatch
// between when the return object is created and when the value is available:
//
// 1) Early (coroutine startup): `get_return_object()` is called and must return
//    something immediately.
// 2) Later (when `co_return` executes): `return_value(T)` is called with the
//    actual value.
// 3) Issue: If `get_return_object()` returns the storage, it's empty when
//    returned, and writing to it later cannot affect the already-returned copy.
template <typename T>
struct optional_wrapper {
  std::optional<T> storage_;
  std::optional<T>*& pointer_;
  optional_wrapper(std::optional<T>*& p) : pointer_(p) {
    pointer_ = &storage_;
  }
  operator std::optional<T>() { return std::move(storage_); }
  ~optional_wrapper() {}
};

// Make `std::optional` a coroutine
template <typename T, typename... Args>
struct std::coroutine_traits<std::optional<T>, Args...> {
  struct promise_type {
    std::optional<T>* storagePtr_ = nullptr;
    promise_type() = default;
    ::optional_wrapper<T> get_return_object() {
      return ::optional_wrapper<T>(storagePtr_);
    }
    std::suspend_never initial_suspend() const noexcept { return {}; }
    std::suspend_never final_suspend() const noexcept { return {}; }
    void return_value(T&& value) { *storagePtr_ = std::move(value); }
    void unhandled_exception() {
      // Leave storage_ empty to represent error
    }
  };
};

template <typename T>
struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
  std::optional<T> opt_;
  bool await_ready() const noexcept { return opt_.has_value(); }
  T await_resume() { return std::move(opt_).value(); }
  // Adding `noexcept` here makes the early IR much smaller, but the
  // optimizer is able to discard the cruft for simpler cases.
  void await_suspend_destroy(auto& promise) noexcept {
    // Assume the return object defaults to "empty"
  }
  void await_suspend(auto handle) {
    await_suspend_destroy(handle.promise());
    handle.destroy();
  }
};

template <typename T>
optional_awaitable<T> operator co_await(std::optional<T> opt) {
  return {std::move(opt)};
}

// Non-coroutine baseline -- matches the logic of `simple_coro`.
std::optional<int> simple_func(const std::optional<int>& r) {
  try {
    if (r.has_value()) {
        return r.value() + 1;
    }
  } catch (...) {}
  return std::nullopt; // return empty on empty input or error
}

// Without `coro_await_suspend_destroy`, allocates its frame on-heap.
std::optional<int> simple_coro(const std::optional<int>& r) {
  co_return co_await std::move(r) + 4;
}

// Without `co_await`, this optimizes much like `simple_func`.
// Bugs:
//  - Doesn't short-circuit when `r` is empty, but throws
//  - Lacks an exception boundary
std::optional<int> wrong_simple_coro(const std::optional<int>& r) {
  co_return r.value() + 2;
}

int main() {
  return
      simple_func(std::optional<int>{32}).value() +
      simple_coro(std::optional<int>{8}).value() +
      wrong_simple_coro(std::optional<int>{16}).value();
}
```

Test Plan:

For the all-important E2E test, I used this terrible cargo-culted script to run
the new end-to-end test with the new compiler.  (Yes, I realize I should only
need 10% of those `-D` settings for a successful build.)

To make sure the test covered what I meant it to do:
  - I also added an `#error` in the "no attribute" branch to make sure the
    compiler indeed supports the attribute.
  - I ran it with a compiler not supporting the attribute, and that also
    passed.
  - I also tried `return 1;` from `main()` and saw the logs of the 7 successful
    tests running.

```sh
 #!/bin/bash -uex
set -o pipefail
LLVMBASE=/path/to/source/of/llvm-project
SYSCLANG=/path/to/origianl/bin/clang

 # NB Can add `--debug-output` to debug cmake...

 # Bootstrap clang -- Use `RelWithDebInfo` or the next phase is too slow!
mkdir -p bootstrap
cd bootstrap
cmake "$LLVMBASE/llvm" \
    -G Ninja \
    -DBUILD_SHARED_LIBS=true \
    -DCMAKE_ASM_COMPILER="$SYSCLANG" \
    -DCMAKE_ASM_COMPILER_ID=Clang \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DCMAKE_CXX_COMPILER="$SYSCLANG"++ \
    -DCMAKE_C_COMPILER="$SYSCLANG" \
    -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-redhat-linux-gnu \
    -DLLVM_HOST_TRIPLE=x86_64-redhat-linux-gnu \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_BINDINGS=OFF \
    -DLLVM_ENABLE_LLD=ON \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_OPTIMIZED_TABLEGEN=true \
    -DLLVM_FORCE_ENABLE_STATS=ON \
    -DLLVM_ENABLE_DUMP=ON \
    -DCLANG_DEFAULT_PIE_ON_LINUX=OFF
ninja clang lld
ninja check-clang-codegencoroutines # Includes the new IR regression tests
cd ..

NEWCLANG="$PWD"/bootstrap/bin/clang
NEWLLD="$PWD"/bootstrap/bin/lld
 # LIBCXX_INCLUDE_BENCHMARKS=OFF because google-benchmark bugs out
cmake "$LLVMBASE/runtimes" \
    -G Ninja \
    -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-redhat-linux-gnu \
    -DLLVM_HOST_TRIPLE=x86_64-redhat-linux-gnu \
    -DBUILD_SHARED_LIBS=true \
    -DCMAKE_ASM_COMPILER="$NEWCLANG" \
    -DCMAKE_ASM_COMPILER_ID=Clang \
    -DCMAKE_C_COMPILER="$NEWCLANG" \
    -DCMAKE_CXX_COMPILER="$NEWCLANG"++ \
    -DLLVM_FORCE_ENABLE_STATS=ON \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_LLD=ON \
    -DLIBCXX_INCLUDE_TESTS=ON \
    -DLIBCXX_INCLUDE_BENCHMARKS=OFF \
    -DLLVM_INCLUDE_TESTS=ON \
    -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

ninja cxx-test-depends

LIBCXXBUILD=$PWD
cd "$LLVMBASE"

libcxx/utils/libcxx-lit "$LIBCXXBUILD" -v \
    libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
```
---
 clang/docs/ReleaseNotes.rst                   |   6 +
 clang/include/clang/Basic/Attr.td             |   8 +
 clang/include/clang/Basic/AttrDocs.td         |  87 ++++
 .../clang/Basic/DiagnosticSemaKinds.td        |   3 +
 clang/lib/CodeGen/CGCoroutine.cpp             | 232 +++++++---
 clang/lib/Sema/SemaCoroutine.cpp              | 102 ++++-
 .../coro-await-suspend-destroy-errors.cpp     |  55 +++
 .../coro-await-suspend-destroy.cpp            | 129 ++++++
 ...a-attribute-supported-attributes-list.test |   1 +
 .../coro_await_suspend_destroy.pass.cpp       | 409 ++++++++++++++++++
 10 files changed, 942 insertions(+), 90 deletions(-)
 create mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
 create mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
 create mode 100644 libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0e9fcaa5fac6a..41c412730b033 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -136,6 +136,12 @@ Removed Compiler Flags
 Attribute Changes in Clang
 --------------------------
 
+- Introduced a new attribute ``[[clang::coro_await_suspend_destroy]]``.  When
+  applied to a coroutine awaiter class, it causes suspensions into this awaiter
+  to use a new `await_suspend_destroy(Promise&)` method instead of the standard
+  `await_suspend(std::coroutine_handle<...>)`.  The coroutine is then destroyed.
+  This improves code speed & size for "short-circuiting" coroutines.
+
 Improvements to Clang's diagnostics
 -----------------------------------
 - Added a separate diagnostic group ``-Wfunction-effect-redeclarations``, for the more pedantic
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 30efb9f39e4f4..341848be00e7d 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1352,6 +1352,14 @@ def CoroAwaitElidableArgument : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def CoroAwaitSuspendDestroy: InheritableAttr {
+  let Spellings = [Clang<"coro_await_suspend_destroy">];
+  let Subjects = SubjectList<[CXXRecord]>;
+  let LangOpts = [CPlusPlus];
+  let Documentation = [CoroAwaitSuspendDestroyDoc];
+  let SimpleHandler = 1;
+}
+
 // OSObject-based attributes.
 def OSConsumed : InheritableParamAttr {
   let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 2b095ab975202..d2224d86b3900 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control
+flow as):
+
+.. code-block:: c++
+
+  T val;
+  if (awaiter.await_ready()) {
+    val = awaiter.await_resume();
+  } else {
+    awaiter.await_suspend();
+    return /* value representing the "execution short-circuited" outcome */;
+  }
+
+The benefits of this attribute are:
+  - **Avoid heap allocations for coro frames**: Allocating short-circuiting
+    coros on the stack makes code more predictable under memory pressure.
+    Without this attribute, LLVM cannot elide heap allocation even when all
+    awaiters are short-circuiting.
+  - **Performance**: Significantly faster execution and smaller code size.
+  - **Build time**: Faster compilation due to less IR being generated.
+
+Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
+further improve optimization.
+
+Here is a toy example of a portable short-circuiting awaiter:
+
+.. code-block:: c++
+
+  template <typename T>
+  struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
+    std::optional<T> opt_;
+    bool await_ready() const noexcept { return opt_.has_value(); }
+    T await_resume() { return std::move(opt_).value(); }
+    void await_suspend_destroy(auto& promise) {
+      // Assume the return object of the outer coro defaults to "empty".
+    }
+    // Fallback for when `coro_await_suspend_destroy` is unavailable.
+    void await_suspend(auto handle) {
+      await_suspend_destroy(handle.promise());
+      handle.destroy();
+    }
+  };
+
+If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
+then the coroutine optimizes more like a plain function, with 2 caveats:
+  - **Behavior:** The coroutine promise provides an implicit exception boundary
+    (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
+    This exception handling behavior is usually desirable in robust,
+    return-value-oriented programs that need short-circuiting coroutines.
+    Otherwise, the promise can always re-throw.
+  - **Speed:** As of 2025, there is still an optimization gap between a
+    realistic short-circuiting coro, and the equivalent (but much more verbose)
+    function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
+    improvement is to also elide trivial suspends like `std::suspend_never`, in
+    order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
+
+}];
+}
+
 def CountedByDocs : Documentation {
   let Category = DocCatField;
   let Content = [{
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 116341f4b66d5..58e7dd7db86d1 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12504,6 +12504,9 @@ def note_coroutine_promise_call_implicitly_required : Note<
 def err_await_suspend_invalid_return_type : Error<
   "return type of 'await_suspend' is required to be 'void' or 'bool' (have %0)"
 >;
+def err_await_suspend_destroy_invalid_return_type : Error<
+  "return type of 'await_suspend_destroy' is required to be 'void' (have %0)"
+>;
 def note_await_ready_no_bool_conversion : Note<
   "return type of 'await_ready' is required to be contextually convertible to 'bool'"
 >;
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 827385f9c1a1f..d74bef592aa9c 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -174,6 +174,66 @@ static bool StmtCanThrow(const Stmt *S) {
   return false;
 }
 
+// Check if this suspend should be calling `await_suspend_destroy`
+static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) {
+  // This can only be an `await_suspend_destroy` suspend expression if it
+  // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this.
+  // Moreover, when `await_suspend` returns a handle, the outermost method call
+  // is `.address()` -- making it harder to get the actual class or method.
+  if (S.getSuspendReturnType() !=
+      CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) {
+    return false;
+  }
+
+  // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend
+  // expression uses `[[clang::coro_await_suspend_destroy]]`.
+  //
+  // Any mismatch is a serious bug -- we would either double-free, or fail to
+  // destroy the promise type. For this reason, we make our decision based on
+  // the method name, and fatal outside of the happy path -- including on
+  // failure to find a method name.
+  //
+  // As a debug-only check we also try to detect the `AwaiterClass`. This is
+  // secondary, because  detection of the awaiter type can be silently broken by
+  // small `buildCoawaitCalls` AST changes.
+  StringRef SuspendMethodName;           // Primary
+  CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort
+  if (auto *SuspendCall =
+          dyn_cast<CallExpr>(S.getSuspendExpr()->IgnoreImplicit())) {
+    if (auto *SuspendMember = dyn_cast<MemberExpr>(SuspendCall->getCallee())) {
+      if (auto *BaseExpr = SuspendMember->getBase()) {
+        // `IgnoreImplicitAsWritten` is critical since `await_suspend...` can be
+        // invoked on the base of the actual awaiter, and the base need not have
+        // the attribute. In such cases, the AST will show the true awaiter
+        // being upcast to the base.
+        AwaiterClass = BaseExpr->IgnoreImplicitAsWritten()
+                           ->getType()
+                           ->getAsCXXRecordDecl();
+      }
+      if (auto *SuspendMethod =
+              dyn_cast<CXXMethodDecl>(SuspendMember->getMemberDecl())) {
+        SuspendMethodName = SuspendMethod->getName();
+      }
+    }
+  }
+  if (SuspendMethodName == "await_suspend_destroy") {
+    assert(!AwaiterClass ||
+           AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>());
+    return true;
+  } else if (SuspendMethodName == "await_suspend") {
+    assert(!AwaiterClass ||
+           !AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>());
+    return false;
+  } else {
+    llvm::report_fatal_error(
+        "Wrong method in [[clang::coro_await_suspend_destroy]] check: "
+        "expected 'await_suspend' or 'await_suspend_destroy', but got '" +
+        SuspendMethodName + "'");
+  }
+
+  return false;
+}
+
 // Emit suspend expression which roughly looks like:
 //
 //   auto && x = CommonExpr();
@@ -220,6 +280,25 @@ namespace {
     RValue RV;
   };
 }
+
+// The simplified `await_suspend_destroy` path avoids suspend intrinsics.
+static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro,
+                                    llvm::Function *SuspendWrapper,
+                                    llvm::Value *Awaiter, llvm::Value *Frame,
+                                    bool AwaitSuspendCanThrow) {
+  SmallVector<llvm::Value *, 2> DirectCallArgs;
+  DirectCallArgs.push_back(Awaiter);
+  DirectCallArgs.push_back(Frame);
+
+  if (AwaitSuspendCanThrow) {
+    CGF.EmitCallOrInvoke(SuspendWrapper, DirectCallArgs);
+  } else {
+    CGF.EmitNounwindRuntimeCall(SuspendWrapper, DirectCallArgs);
+  }
+
+  CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+}
+
 static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
@@ -234,7 +313,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   auto Prefix = buildSuspendPrefixStr(Coro, Kind);
   BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
   BasicBlock *SuspendBlock = CGF.createBasicBlock(Prefix + Twine(".suspend"));
-  BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
 
   // If expression is ready, no need to suspend.
   CGF.EmitBranchOnBoolExpr(S.getReadyExpr(), ReadyBlock, SuspendBlock, 0);
@@ -243,95 +321,105 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   CGF.EmitBlock(SuspendBlock);
 
   auto &Builder = CGF.Builder;
-  llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
-  auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
-  auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
 
   auto SuspendWrapper = CodeGenFunction(CGF.CGM).generateAwaitSuspendWrapper(
       CGF.CurFn->getName(), Prefix, S);
 
-  CGF.CurCoro.InSuspendBlock = true;
-
   assert(CGF.CurCoro.Data && CGF.CurCoro.Data->CoroBegin &&
          "expected to be called in coroutine context");
 
-  SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
-  SuspendIntrinsicCallArgs.push_back(
-      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
-
-  SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
-  SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
-
-  const auto SuspendReturnType = S.getSuspendReturnType();
-  llvm::Intrinsic::ID AwaitSuspendIID;
-
-  switch (SuspendReturnType) {
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
-    break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
-    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
-    break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
-    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
-    break;
-  }
-
-  llvm::Function *AwaitSuspendIntrinsic = CGF.CGM.getIntrinsic(AwaitSuspendIID);
-
   // SuspendHandle might throw since it also resumes the returned handle.
+  const auto SuspendReturnType = S.getSuspendReturnType();
   const bool AwaitSuspendCanThrow =
       SuspendReturnType ==
           CoroutineSuspendExpr::SuspendReturnType::SuspendHandle ||
       StmtCanThrow(S.getSuspendExpr());
 
-  llvm::CallBase *SuspendRet = nullptr;
-  // FIXME: add call attributes?
-  if (AwaitSuspendCanThrow)
-    SuspendRet =
-        CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
-  else
-    SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
-                                             SuspendIntrinsicCallArgs);
+  llvm::Value *Awaiter =
+      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF);
+  llvm::Value *Frame = CGF.CurCoro.Data->CoroBegin;
 
-  assert(SuspendRet);
-  CGF.CurCoro.InSuspendBlock = false;
+  if (useCoroAwaitSuspendDestroy(S)) { // Call `await_suspend_destroy` & cleanup
+    emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame,
+                            AwaitSuspendCanThrow);
+  } else { // Normal suspend path -- can actually suspend, uses intrinsics
+    CGF.CurCoro.InSuspendBlock = true;
 
-  switch (SuspendReturnType) {
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-    assert(SuspendRet->getType()->isVoidTy());
-    break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
-    assert(SuspendRet->getType()->isIntegerTy());
-
-    // Veto suspension if requested by bool returning await_suspend.
-    BasicBlock *RealSuspendBlock =
-        CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
-    CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
-    CGF.EmitBlock(RealSuspendBlock);
-    break;
-  }
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
-    assert(SuspendRet->getType()->isVoidTy());
-    break;
-  }
-  }
+    SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
+    SuspendIntrinsicCallArgs.push_back(Awaiter);
+    SuspendIntrinsicCallArgs.push_back(Frame);
+    SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
+    BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
 
-  // Emit the suspend point.
-  const bool IsFinalSuspend = (Kind == AwaitKind::Final);
-  llvm::Function *CoroSuspend =
-      CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
-  auto *SuspendResult = Builder.CreateCall(
-      CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
+    llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
+    auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
+    auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
 
-  // Create a switch capturing three possible continuations.
-  auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
-  Switch->addCase(Builder.getInt8(0), ReadyBlock);
-  Switch->addCase(Builder.getInt8(1), CleanupBlock);
+    llvm::Intrinsic::ID AwaitSuspendIID;
 
-  // Emit cleanup for this suspend point.
-  CGF.EmitBlock(CleanupBlock);
-  CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+    switch (SuspendReturnType) {
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
+      break;
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
+      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
+      break;
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
+      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
+      break;
+    }
+
+    llvm::Function *AwaitSuspendIntrinsic =
+        CGF.CGM.getIntrinsic(AwaitSuspendIID);
+
+    llvm::CallBase *SuspendRet = nullptr;
+    // FIXME: add call attributes?
+    if (AwaitSuspendCanThrow)
+      SuspendRet =
+          CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
+    else
+      SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
+                                               SuspendIntrinsicCallArgs);
+
+    assert(SuspendRet);
+    CGF.CurCoro.InSuspendBlock = false;
+
+    switch (SuspendReturnType) {
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+      assert(SuspendRet->getType()->isVoidTy());
+      break;
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
+      assert(SuspendRet->getType()->isIntegerTy());
+
+      // Veto suspension if requested by bool returning await_suspend.
+      BasicBlock *RealSuspendBlock =
+          CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
+      CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
+      CGF.EmitBlock(RealSuspendBlock);
+      break;
+    }
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
+      assert(SuspendRet->getType()->isVoidTy());
+      break;
+    }
+    }
+
+    // Emit the suspend point.
+    const bool IsFinalSuspend = (Kind == AwaitKind::Final);
+    llvm::Function *CoroSuspend =
+        CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
+    auto *SuspendResult = Builder.CreateCall(
+        CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
+
+    // Create a switch capturing three possible continuations.
+    auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
+    Switch->addCase(Builder.getInt8(0), ReadyBlock);
+    Switch->addCase(Builder.getInt8(1), CleanupBlock);
+
+    // Emit cleanup for this suspend point.
+    CGF.EmitBlock(CleanupBlock);
+    CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+  }
 
   // Emit await_resume expression.
   CGF.EmitBlock(ReadyBlock);
diff --git a/clang/lib/Sema/SemaCoroutine.cpp b/clang/lib/Sema/SemaCoroutine.cpp
index d193a33f22393..83fe7219c9997 100644
--- a/clang/lib/Sema/SemaCoroutine.cpp
+++ b/clang/lib/Sema/SemaCoroutine.cpp
@@ -289,6 +289,45 @@ static ExprResult buildCoroutineHandle(Sema &S, QualType PromiseType,
   return S.BuildCallExpr(nullptr, FromAddr.get(), Loc, FramePtr, Loc);
 }
 
+// To support [[clang::coro_await_suspend_destroy]], this builds
+//   *static_cast<Promise*>(
+//       __builtin_coro_promise(handle, alignof(Promise), false))
+static ExprResult buildPromiseRef(Sema &S, QualType PromiseType,
+                                  SourceLocation Loc) {
+  uint64_t Align =
+      S.Context.getTypeAlign(PromiseType) / S.Context.getCharWidth();
+
+  // Build the call to __builtin_coro_promise()
+  SmallVector<Expr *, 3> Args = {
+      S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_frame, {}),
+      S.ActOnIntegerConstant(Loc, Align).get(),         // alignof(Promise)
+      S.ActOnCXXBoolLiteral(Loc, tok::kw_false).get()}; // false
+  ExprResult CoroPromiseCall =
+      S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_promise, Args);
+
+  if (CoroPromiseCall.isInvalid())
+    return ExprError();
+
+  // Cast to Promise*
+  ExprResult CastExpr = S.ImpCastExprToType(
+      CoroPromiseCall.get(), S.Context.getPointerType(PromiseType), CK_BitCast);
+  if (CastExpr.isInvalid())
+    return ExprError();
+
+  // Dereference to get Promise&
+  return S.CreateBuiltinUnaryOp(Loc, UO_Deref, CastExpr.get());
+}
+
+static bool hasCoroAwaitSuspendDestroyAttr(Expr *Awaiter) {
+  QualType AwaiterType = Awaiter->getType();
+  if (auto *RD = AwaiterType->getAsCXXRecordDecl()) {
+    if (RD->hasAttr<CoroAwaitSuspendDestroyAttr>()) {
+      return true;
+    }
+  }
+  return false;
+}
+
 struct ReadySuspendResumeResult {
   enum AwaitCallType { ACT_Ready, ACT_Suspend, ACT_Resume };
   Expr *Results[3];
@@ -399,15 +438,30 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
       Calls.Results[ACT::ACT_Ready] = S.MaybeCreateExprWithCleanups(Conv.get());
   }
 
-  ExprResult CoroHandleRes =
-      buildCoroutineHandle(S, CoroPromise->getType(), Loc);
-  if (CoroHandleRes.isInvalid()) {
-    Calls.IsInvalid = true;
-    return Calls;
+  // For awaiters with `[[clang::coro_await_suspend_destroy]]`, we call
+  // `void await_suspend_destroy(Promise&)` & promptly destroy the coro.
+  CallExpr *AwaitSuspend = nullptr;
+  bool UseAwaitSuspendDestroy = hasCoroAwaitSuspendDestroyAttr(Operand);
+  if (UseAwaitSuspendDestroy) {
+    ExprResult PromiseRefRes = buildPromiseRef(S, CoroPromise->getType(), Loc);
+    if (PromiseRefRes.isInvalid()) {
+      Calls.IsInvalid = true;
+      return Calls;
+    }
+    Expr *PromiseRef = PromiseRefRes.get();
+    AwaitSuspend = cast_or_null<CallExpr>(
+        BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  } else { // The standard `await_suspend(std::coroutine_handle<...>)`
+    ExprResult CoroHandleRes =
+        buildCoroutineHandle(S, CoroPromise->getType(), Loc);
+    if (CoroHandleRes.isInvalid()) {
+      Calls.IsInvalid = true;
+      return Calls;
+    }
+    Expr *CoroHandle = CoroHandleRes.get();
+    AwaitSuspend = cast_or_null<CallExpr>(
+        BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   }
-  Expr *CoroHandle = CoroHandleRes.get();
-  CallExpr *AwaitSuspend = cast_or_null<CallExpr>(
-      BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
     return Calls;
   if (!AwaitSuspend->getType()->isDependentType()) {
@@ -417,25 +471,37 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
     //     type Z.
     QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
-    // Support for coroutine_handle returning await_suspend.
-    if (Expr *TailCallSuspend =
-            maybeTailCall(S, RetType, AwaitSuspend, Loc))
+    auto EmitAwaitSuspendDiag = [&](unsigned int DiagCode) {
+      S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(), DiagCode) << RetType;
+      S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required)
+          << AwaitSuspend->getDirectCallee();
+      Calls.IsInvalid = true;
+    };
+
+    // `await_suspend_destroy` must return `void` -- and `CGCoroutine.cpp`
+    // critically depends on this in `hasCoroAwaitSuspendDestroyAttr`.
+    if (UseAwaitSuspendDestroy) {
+      if (RetType->isVoidType()) {
+        Calls.Results[ACT::ACT_Suspend] =
+            S.MaybeCreateExprWithCleanups(AwaitSuspend);
+      } else {
+        EmitAwaitSuspendDiag(
+            diag::err_await_suspend_destroy_invalid_return_type);
+      }
+      // Support for coroutine_handle returning await_suspend.
+    } else if (Expr *TailCallSuspend =
+                   maybeTailCall(S, RetType, AwaitSuspend, Loc)) {
       // Note that we don't wrap the expression with ExprWithCleanups here
       // because that might interfere with tailcall contract (e.g. inserting
       // clean up instructions in-between tailcall and return). Instead
       // ExprWithCleanups is wrapped within maybeTailCall() prior to the resume
       // call.
       Calls.Results[ACT::ACT_Suspend] = TailCallSuspend;
-    else {
+    } else {
       // non-class prvalues always have cv-unqualified types
       if (RetType->isReferenceType() ||
           (!RetType->isBooleanType() && !RetType->isVoidType())) {
-        S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(),
-               diag::err_await_suspend_invalid_return_type)
-            << RetType;
-        S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required)
-            << AwaitSuspend->getDirectCallee();
-        Calls.IsInvalid = true;
+        EmitAwaitSuspendDiag(diag::err_await_suspend_invalid_return_type);
       } else
         Calls.Results[ACT::ACT_Suspend] =
             S.MaybeCreateExprWithCleanups(AwaitSuspend);
diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
new file mode 100644
index 0000000000000..6a082c15f2581
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
@@ -0,0 +1,55 @@
+// RUN: %clang_cc1 -std=c++20 -verify %s 
+
+#include "Inputs/coroutine.h"
+
+// Coroutine type with `std::suspend_never` for initial/final suspend
+struct Task {
+  struct promise_type {
+    Task get_return_object() { return {}; }
+    std::suspend_never initial_suspend() { return {}; }
+    std::suspend_never final_suspend() noexcept { return {}; }
+    void return_void() {}
+    void unhandled_exception() {}
+  };
+};
+
+struct [[clang::coro_await_suspend_destroy]] WrongReturnTypeAwaitable {
+  bool await_ready() { return false; }
+  bool await_suspend_destroy(auto& promise) { return true; } // expected-error {{return type of 'await_suspend_destroy' is required to be 'void' (have 'bool')}}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_invalid_destroying_await() {
+  co_await WrongReturnTypeAwaitable{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}}
+}
+
+struct [[clang::coro_await_suspend_destroy]] MissingMethodAwaitable {
+  bool await_ready() { return false; }
+  // Missing await_suspend_destroy method
+  void await_suspend(auto handle) {
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_missing_method() {
+  co_await MissingMethodAwaitable{}; // expected-error {{no member named 'await_suspend_destroy' in 'MissingMethodAwaitable'}}
+}
+
+struct [[clang::coro_await_suspend_destroy]] WrongParameterTypeAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(int x) {} // expected-note {{passing argument to parameter 'x' here}}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_wrong_parameter_type() {
+  co_await WrongParameterTypeAwaitable{}; // expected-error {{no viable conversion from 'std::coroutine_traits<Task>::promise_type' (aka 'Task::promise_type') to 'int'}}
+}
diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
new file mode 100644
index 0000000000000..fa1dbf475e56c
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
@@ -0,0 +1,129 @@
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
+// RUN:   -disable-llvm-passes | FileCheck %s --check-prefix=CHECK-INITIAL
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
+// RUN:   -O2 | FileCheck %s --check-prefix=CHECK-OPTIMIZED
+
+#include "Inputs/coroutine.h"
+
+// Awaitable with `coro_await_suspend_destroy` attribute
+struct [[clang::coro_await_suspend_destroy]] DestroyingAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(auto& promise) {}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+// Awaitable without `coro_await_suspend_destroy` (normal behavior)
+struct NormalAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend(std::coroutine_handle<> h) {}
+  void await_resume() {}
+};
+
+// Coroutine type with `std::suspend_never` for initial/final suspend
+struct Task {
+  struct promise_type {
+    Task get_return_object() { return {}; }
+    std::suspend_never initial_suspend() { return {}; }
+    std::suspend_never final_suspend() noexcept { return {}; }
+    void return_void() {}
+    void unhandled_exception() {}
+  };
+};
+
+// Single co_await with coro_await_suspend_destroy.
+// Should result in no allocation after optimization.
+Task test_single_destroying_await() {
+  co_await DestroyingAwaitable{};
+}
+
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv
+// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
+// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+
+// Test multiple `co_await`s, all with `coro_await_suspend_destroy`.
+// This should also result in no allocation after optimization.
+Task test_multiple_destroying_awaits(bool condition) {
+  co_await DestroyingAwaitable{};
+  co_await DestroyingAwaitable{};
+  if (condition) {
+    co_await DestroyingAwaitable{};
+  }
+}
+
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
+// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
+// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+
+// Mixed awaits - some with `coro_await_suspend_destroy`, some without.
+// We should still see allocation because not all awaits destroy the coroutine.
+Task test_mixed_awaits() {
+  co_await NormalAwaitable{}; // Must precede "destroy" to be reachable
+  co_await DestroyingAwaitable{};
+}
+
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv
+// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
+// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv
+// CHECK-OPTIMIZED: call{{.*}} @_Znwm
+
+
+// Check the attribute detection affects control flow.  
+Task test_attribute_detection() {
+  co_await DestroyingAwaitable{};
+  // Unreachable in OPTIMIZED, so those builds don't see an allocation.
+  co_await NormalAwaitable{};
+}
+
+// Check that we skip the normal suspend intrinsic and go directly to cleanup.
+//
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z24test_attribute_detectionv
+// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await
+// CHECK-INITIAL-NEXT: br label %cleanup5
+// CHECK-INITIAL-NOT: call{{.*}} @llvm.coro.suspend
+// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await
+// CHECK-INITIAL: call{{.*}} @llvm.coro.suspend
+// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__final
+
+// Since `co_await DestroyingAwaitable{}` gets converted into an unconditional
+// branch, the `co_await NormalAwaitable{}` is unreachable in optimized builds.
+// 
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+
+// Template awaitable with `coro_await_suspend_destroy` attribute
+template<typename T>
+struct [[clang::coro_await_suspend_destroy]] TemplateDestroyingAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(auto& promise) {}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_template_destroying_await() {
+  co_await TemplateDestroyingAwaitable<int>{};
+}
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z30test_template_destroying_awaitv
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index 05693538252aa..43327744ffc8a 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -62,6 +62,7 @@
 // CHECK-NEXT: Convergent (SubjectMatchRule_function)
 // CHECK-NEXT: CoroAwaitElidable (SubjectMatchRule_record)
 // CHECK-NEXT: CoroAwaitElidableArgument (SubjectMatchRule_variable_is_parameter)
+// CHECK-NEXT: CoroAwaitSuspendDestroy (SubjectMatchRule_record)
 // CHECK-NEXT: CoroDisableLifetimeBound (SubjectMatchRule_function)
 // CHECK-NEXT: CoroLifetimeBound (SubjectMatchRule_record)
 // CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record)
diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
new file mode 100644
index 0000000000000..1b48b1523bf12
--- /dev/null
+++ b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
@@ -0,0 +1,409 @@
+//===-- Integration test for `clang::co_await_suspend_destroy` ------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+// Test for the `coro_await_suspend_destroy` attribute and
+// `await_suspend_destroy` method.
+//
+// Per `AttrDocs.td`, using `coro_await_suspend_destroy` with
+// `await_suspend_destroy` should be equivalent to providing a stub
+// `await_suspend` that calls `await_suspend_destroy` and then destroys the
+// coroutine handle.
+//
+// This test logs control flow in a variety of scenarios (controlled by
+// `test_toggles`), and checks that the execution traces are identical for
+// awaiters with/without the attribute. We currently test all combinations of
+// error injection points to ensure behavioral equivalence.
+//
+// In contrast to Clang `lit` tests, this makes it easy to verify non-divergence
+// of functional behavior of the entire coroutine across many scenarios,
+// including exception handling, early returns, and mixed usage with legacy
+// awaitables.
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17
+
+#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
+#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY [[clang::coro_await_suspend_destroy]]
+#else
+#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY
+#endif
+
+#include <cassert>
+#include <coroutine>
+#include <exception>
+#include <iostream>
+#include <memory>
+#include <optional>
+#include <string>
+
+struct my_err : std::exception {};
+
+enum test_toggles {
+  throw_in_convert_optional_wrapper = 0,
+  throw_in_return_value,
+  throw_in_await_resume,
+  throw_in_await_suspend_destroy,
+  dynamic_short_circuit,          // Does not apply to `..._shortcircuits_to_empty` tests
+  largest = dynamic_short_circuit // for array in `test_driver`
+};
+
+enum test_event {
+  unset = 0,
+  // Besides events, we also log various integers between 1 and 9999 that
+  // disambiguate different awaiters, or represent different return values.
+  convert_optional_wrapper = 10000,
+  destroy_return_object,
+  destroy_promise,
+  get_return_object,
+  initial_suspend,
+  final_suspend,
+  return_value,
+  throw_return_value,
+  unhandled_exception,
+  await_ready,
+  await_resume,
+  destroy_optional_awaitable,
+  throw_await_resume,
+  await_suspend_destroy,
+  throw_await_suspend_destroy,
+  await_suspend,
+  coro_catch,
+  throw_convert_optional_wrapper,
+};
+
+struct test_driver {
+  static constexpr int max_events = 1000;
+
+  bool toggles_[test_toggles::largest + 1] = {};
+  int events_[max_events]                  = {};
+  int cur_event_                           = 0;
+
+  bool toggles(test_toggles toggle) const { return toggles_[toggle]; }
+  void log(auto&&... events) {
+    for (auto event : {static_cast<int>(events)...}) {
+      assert(cur_event_ < max_events);
+      events_[cur_event_++] = event;
+    }
+  }
+};
+
+// `optional_wrapper` exists since `get_return_object()` can't return
+// `std::optional` directly. C++ coroutines have a fundamental timing mismatch
+// between when the return object is created and when the value is available:
+//
+// 1) Early (coroutine startup): `get_return_object()` is called and must return
+//    something immediately.
+// 2) Later (when `co_return` executes): `return_value(T)` is called with the
+//    actual value.
+// 3) Issue: If `get_return_object()` returns the storage, it's empty when
+//    returned, and writing to it later cannot affect the already-returned copy.
+template <typename T>
+struct optional_wrapper {
+  test_driver& driver_;
+  std::optional<T> storage_;
+  std::optional<T>*& pointer_;
+  optional_wrapper(test_driver& driver, std::optional<T>*& p) : driver_(driver), pointer_(p) { pointer_ = &storage_; }
+  operator std::optional<T>() {
+    if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) {
+      driver_.log(test_event::throw_convert_optional_wrapper);
+      throw my_err();
+    }
+    driver_.log(test_event::convert_optional_wrapper);
+    return std::move(storage_);
+  }
+  ~optional_wrapper() { driver_.log(test_event::destroy_return_object); }
+};
+
+// Make `std::optional` a coroutine
+template <typename T, typename... Args>
+struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> {
+  struct promise_type {
+    std::optional<T>* storagePtr_ = nullptr;
+    test_driver& driver_;
+
+    promise_type(test_driver& driver, auto&&...) : driver_(driver) {}
+    ~promise_type() { driver_.log(test_event::destroy_promise); }
+    optional_wrapper<T> get_return_object() {
+      driver_.log(test_event::get_return_object);
+      return optional_wrapper<T>(driver_, storagePtr_);
+    }
+    std::suspend_never initial_suspend() const noexcept {
+      driver_.log(test_event::initial_suspend);
+      return {};
+    }
+    std::suspend_never final_suspend() const noexcept {
+      driver_.log(test_event::final_suspend);
+      return {};
+    }
+    void return_value(T value) {
+      driver_.log(test_event::return_value, value);
+      if (driver_.toggles(test_toggles::throw_in_return_value)) {
+        driver_.log(test_event::throw_return_value);
+        throw my_err();
+      }
+      *storagePtr_ = std::move(value);
+    }
+    void unhandled_exception() {
+      // Leave `*storagePtr_` empty to represent error
+      driver_.log(test_event::unhandled_exception);
+    }
+  };
+};
+
+template <typename T, bool HasAttr>
+struct base_optional_awaitable {
+  test_driver& driver_;
+  int id_;
+  std::optional<T> opt_;
+
+  ~base_optional_awaitable() { driver_.log(test_event::destroy_optional_awaitable, id_); }
+
+  bool await_ready() const noexcept {
+    driver_.log(test_event::await_ready, id_);
+    return opt_.has_value();
+  }
+  T await_resume() {
+    if (driver_.toggles(test_toggles::throw_in_await_resume)) {
+      driver_.log(test_event::throw_await_resume, id_);
+      throw my_err();
+    }
+    driver_.log(test_event::await_resume, id_);
+    return std::move(opt_).value();
+  }
+  void await_suspend_destroy(auto& promise) {
+#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
+    if constexpr (HasAttr) {
+      // This is just here so that old & new events compare exactly equal.
+      driver_.log(test_event::await_suspend);
+    }
+#endif
+    assert(promise.storagePtr_);
+    if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) {
+      driver_.log(test_event::throw_await_suspend_destroy, id_);
+      throw my_err();
+    }
+    driver_.log(test_event::await_suspend_destroy, id_);
+  }
+  void await_suspend(auto handle) {
+    driver_.log(test_event::await_suspend);
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+};
+
+template <typename T>
+struct old_optional_awaitable : base_optional_awaitable<T, false> {};
+
+template <typename T>
+struct ATTR_CORO_AWAIT_SUSPEND_DESTROY new_optional_awaitable : base_optional_awaitable<T, true> {};
+
+void enumerate_toggles(auto lambda) {
+  // Generate all combinations of toggle values
+  for (int mask = 0; mask <= (1 << (test_toggles::largest + 1)) - 1; ++mask) {
+    test_driver driver;
+    for (int i = 0; i <= test_toggles::largest; ++i) {
+      driver.toggles_[i] = (mask & (1 << i)) != 0;
+    }
+    lambda(driver);
+  }
+}
+
+template <typename T>
+void check_coro_with_driver_for(auto coro_fn) {
+  enumerate_toggles([&](const test_driver& driver) {
+    auto old_driver = driver;
+    std::optional<T> old_res;
+    bool old_threw = false;
+    try {
+      old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver);
+    } catch (const my_err&) {
+      old_threw = true;
+    }
+    auto new_driver = driver;
+    std::optional<T> new_res;
+    bool new_threw = false;
+    try {
+      new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver);
+    } catch (const my_err&) {
+      new_threw = true;
+    }
+
+    // Print toggle values for debugging
+    std::string toggle_info = "Toggles: ";
+    for (int i = 0; i <= test_toggles::largest; ++i) {
+      if (driver.toggles_[i]) {
+        toggle_info += std::to_string(i) + " ";
+      }
+    }
+    toggle_info += "\n";
+    std::cerr << toggle_info.c_str() << std::endl;
+
+    assert(old_threw == new_threw);
+    assert(old_res == new_res);
+
+    // Compare events arrays directly using cur_event_ and indices
+    assert(old_driver.cur_event_ == new_driver.cur_event_);
+    for (int i = 0; i < old_driver.cur_event_; ++i) {
+      assert(old_driver.events_[i] == new_driver.events_[i]);
+    }
+  });
+}
+
+// Move-only, non-nullable type that quacks like int but stores a
+// heap-allocated int. Used to exercise the machinery with a nontrivial type.
+class heap_int {
+private:
+  std::unique_ptr<int> ptr_;
+
+public:
+  explicit heap_int(int value) : ptr_(std::make_unique<int>(value)) {}
+
+  heap_int operator+(const heap_int& other) const { return heap_int(*ptr_ + *other.ptr_); }
+
+  bool operator==(const heap_int& other) const { return *ptr_ == *other.ptr_; }
+
+  /*implicit*/ operator int() const { return *ptr_; }
+};
+
+void check_coro_with_driver(auto coro_fn) {
+  check_coro_with_driver_for<int>(coro_fn);
+  check_coro_with_driver_for<heap_int>(coro_fn);
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) {
+  T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
+  co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
+  co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
+}
+
+void test_coro_shortcircuits_to_empty() {
+  std::cerr << "test_coro_shortcircuits_to_empty" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_shortcircuits_to_empty<Awaitable, T>(driver);
+  });
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_simple_await(test_driver& driver) {
+  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
+      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
+}
+
+void test_coro_simple_await() {
+  std::cerr << "test_coro_simple_await" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_simple_await<Awaitable, T>(driver);
+  });
+}
+
+// The next pair of tests checks that adding a `try-catch` in the coroutine
+// doesn't affect control flow when `await_suspend_destroy` awaiters are in use.
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) {
+  try {
+    T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
+    co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
+    co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
+  } catch (...) {
+    driver.log(test_event::coro_catch);
+    throw;
+  }
+}
+
+void test_coro_catching_shortcircuits_to_empty() {
+  std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver);
+  });
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_catching_simple_await(test_driver& driver) {
+  try {
+    co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
+        co_await Awaitable{
+            driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
+  } catch (...) {
+    driver.log(test_event::coro_catch);
+    throw;
+  }
+}
+
+void test_coro_catching_simple_await() {
+  std::cerr << "test_coro_catching_simple_await" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_catching_simple_await<Awaitable, T>(driver);
+  });
+}
+
+// The next pair of tests shows that the `await_suspend_destroy` code path works
+// correctly, even if it's mixed in a coroutine with legacy awaitables.
+
+template <typename Awaitable, typename T>
+std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) {
+  T n  = co_await Awaitable{driver, 1, std::optional<T>{11}};
+  T n2 = co_await old_optional_awaitable<T>{driver, 2, std::optional<T>{22}};
+  co_await Awaitable{driver, 3, std::optional<T>{}}; // return early!
+  co_return n + n2 + co_await Awaitable{driver, 4, std::optional<T>{44}};
+}
+
+void test_noneliding_coro_shortcircuits_to_empty() {
+  std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver);
+  });
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> noneliding_coro_simple_await(test_driver& driver) {
+  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
+      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}} +
+      co_await old_optional_awaitable<T>{driver, 3, std::optional<T>{33}};
+}
+
+void test_noneliding_coro_simple_await() {
+  std::cerr << "test_noneliding_coro_simple_await" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return noneliding_coro_simple_await<Awaitable, T>(driver);
+  });
+}
+
+// Test nested coroutines (coroutines that await other coroutines)
+
+template <typename Awaitable, typename T>
+std::optional<T> inner_coro(test_driver& driver, int base_id) {
+  co_return co_await Awaitable{driver, base_id, std::optional<T>{100}} +
+      co_await Awaitable{
+          driver, base_id + 1, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{200}};
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> outer_coro(test_driver& driver) {
+  T result1 = co_await Awaitable{driver, 1, inner_coro<Awaitable, T>(driver, 10)};
+  T result2 = co_await Awaitable{driver, 2, inner_coro<Awaitable, T>(driver, 20)};
+  co_return result1 + result2;
+}
+
+void test_nested_coroutines() {
+  std::cerr << "test_nested_coroutines" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return outer_coro<Awaitable, T>(driver);
+  });
+}
+
+int main(int, char**) {
+  test_coro_shortcircuits_to_empty();
+  test_coro_simple_await();
+  test_coro_catching_shortcircuits_to_empty();
+  test_coro_catching_simple_await();
+  test_noneliding_coro_shortcircuits_to_empty();
+  test_noneliding_coro_simple_await();
+  test_nested_coroutines();
+  return 0;
+}

>From eb5557ab0eb43ff216441603d1c47615869d0bbe Mon Sep 17 00:00:00 2001
From: lesha <lesha at meta.com>
Date: Thu, 7 Aug 2025 23:38:21 -0700
Subject: [PATCH 02/15] Fix CI

---
 clang/include/clang/Basic/AttrDocs.td         | 32 ++++++-------
 .../coro_await_suspend_destroy.pass.cpp       | 48 +++++++++++++++++--
 2 files changed, 60 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index d2224d86b3900..e45f692740193 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9312,12 +9312,12 @@ flow as):
   }
 
 The benefits of this attribute are:
-  - **Avoid heap allocations for coro frames**: Allocating short-circuiting
-    coros on the stack makes code more predictable under memory pressure.
-    Without this attribute, LLVM cannot elide heap allocation even when all
-    awaiters are short-circuiting.
-  - **Performance**: Significantly faster execution and smaller code size.
-  - **Build time**: Faster compilation due to less IR being generated.
+- **Avoid heap allocations for coro frames**: Allocating short-circuiting
+  coros on the stack makes code more predictable under memory pressure.
+  Without this attribute, LLVM cannot elide heap allocation even when all
+  awaiters are short-circuiting.
+- **Performance**: Significantly faster execution and smaller code size.
+- **Build time**: Faster compilation due to less IR being generated.
 
 Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
 further improve optimization.
@@ -9343,16 +9343,16 @@ Here is a toy example of a portable short-circuiting awaiter:
 
 If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
 then the coroutine optimizes more like a plain function, with 2 caveats:
-  - **Behavior:** The coroutine promise provides an implicit exception boundary
-    (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
-    This exception handling behavior is usually desirable in robust,
-    return-value-oriented programs that need short-circuiting coroutines.
-    Otherwise, the promise can always re-throw.
-  - **Speed:** As of 2025, there is still an optimization gap between a
-    realistic short-circuiting coro, and the equivalent (but much more verbose)
-    function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
-    improvement is to also elide trivial suspends like `std::suspend_never`, in
-    order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
+- **Behavior:** The coroutine promise provides an implicit exception boundary
+  (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
+  This exception handling behavior is usually desirable in robust,
+  return-value-oriented programs that need short-circuiting coroutines.
+  Otherwise, the promise can always re-throw.
+- **Speed:** As of 2025, there is still an optimization gap between a
+  realistic short-circuiting coro, and the equivalent (but much more verbose)
+  function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
+  improvement is to also elide trivial suspends like `std::suspend_never`, in
+  order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
 
 }];
 }
diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
index 1b48b1523bf12..9da8ba530edf3 100644
--- a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
+++ b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
@@ -40,6 +40,14 @@
 #include <optional>
 #include <string>
 
+#define DEBUG_LOG 0 // Logs break no-localization CI, set to 1 if needed
+
+#ifndef TEST_HAS_NO_EXCEPTIONS
+#  define THROW(_ex) throw _ex;
+#else
+#  define THROW(_ex)
+#endif
+
 struct my_err : std::exception {};
 
 enum test_toggles {
@@ -110,7 +118,7 @@ struct optional_wrapper {
   operator std::optional<T>() {
     if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) {
       driver_.log(test_event::throw_convert_optional_wrapper);
-      throw my_err();
+      THROW(my_err());
     }
     driver_.log(test_event::convert_optional_wrapper);
     return std::move(storage_);
@@ -143,7 +151,7 @@ struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> {
       driver_.log(test_event::return_value, value);
       if (driver_.toggles(test_toggles::throw_in_return_value)) {
         driver_.log(test_event::throw_return_value);
-        throw my_err();
+        THROW(my_err());
       }
       *storagePtr_ = std::move(value);
     }
@@ -169,7 +177,7 @@ struct base_optional_awaitable {
   T await_resume() {
     if (driver_.toggles(test_toggles::throw_in_await_resume)) {
       driver_.log(test_event::throw_await_resume, id_);
-      throw my_err();
+      THROW(my_err());
     }
     driver_.log(test_event::await_resume, id_);
     return std::move(opt_).value();
@@ -184,7 +192,7 @@ struct base_optional_awaitable {
     assert(promise.storagePtr_);
     if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) {
       driver_.log(test_event::throw_await_suspend_destroy, id_);
-      throw my_err();
+      THROW(my_err());
     }
     driver_.log(test_event::await_suspend_destroy, id_);
   }
@@ -218,20 +226,29 @@ void check_coro_with_driver_for(auto coro_fn) {
     auto old_driver = driver;
     std::optional<T> old_res;
     bool old_threw = false;
+#ifndef TEST_HAS_NO_EXCEPTIONS
     try {
+#endif
       old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver);
+#ifndef TEST_HAS_NO_EXCEPTIONS
     } catch (const my_err&) {
       old_threw = true;
     }
+#endif
     auto new_driver = driver;
     std::optional<T> new_res;
     bool new_threw = false;
+#ifndef TEST_HAS_NO_EXCEPTIONS
     try {
+#endif
       new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver);
+#ifndef TEST_HAS_NO_EXCEPTIONS
     } catch (const my_err&) {
       new_threw = true;
     }
+#endif
 
+#if DEBUG_LOG
     // Print toggle values for debugging
     std::string toggle_info = "Toggles: ";
     for (int i = 0; i <= test_toggles::largest; ++i) {
@@ -241,6 +258,7 @@ void check_coro_with_driver_for(auto coro_fn) {
     }
     toggle_info += "\n";
     std::cerr << toggle_info.c_str() << std::endl;
+#endif
 
     assert(old_threw == new_threw);
     assert(old_res == new_res);
@@ -282,7 +300,9 @@ std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) {
 }
 
 void test_coro_shortcircuits_to_empty() {
+#if DEBUG_LOG
   std::cerr << "test_coro_shortcircuits_to_empty" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_shortcircuits_to_empty<Awaitable, T>(driver);
   });
@@ -295,7 +315,9 @@ std::optional<T> coro_simple_await(test_driver& driver) {
 }
 
 void test_coro_simple_await() {
+#if DEBUG_LOG
   std::cerr << "test_coro_simple_await" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_simple_await<Awaitable, T>(driver);
   });
@@ -306,18 +328,24 @@ void test_coro_simple_await() {
 
 template <typename Awaitable, typename T>
 std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) {
+#ifndef TEST_HAS_NO_EXCEPTIONS
   try {
+#endif
     T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
     co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
     co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
+#ifndef TEST_HAS_NO_EXCEPTIONS
   } catch (...) {
     driver.log(test_event::coro_catch);
     throw;
   }
+#endif
 }
 
 void test_coro_catching_shortcircuits_to_empty() {
+#if DEBUG_LOG
   std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver);
   });
@@ -325,18 +353,24 @@ void test_coro_catching_shortcircuits_to_empty() {
 
 template <typename Awaitable, typename T>
 std::optional<T> coro_catching_simple_await(test_driver& driver) {
+#ifndef TEST_HAS_NO_EXCEPTIONS
   try {
+#endif
     co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
         co_await Awaitable{
             driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
+#ifndef TEST_HAS_NO_EXCEPTIONS
   } catch (...) {
     driver.log(test_event::coro_catch);
     throw;
   }
+#endif
 }
 
 void test_coro_catching_simple_await() {
+#if DEBUG_LOG
   std::cerr << "test_coro_catching_simple_await" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_catching_simple_await<Awaitable, T>(driver);
   });
@@ -354,7 +388,9 @@ std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) {
 }
 
 void test_noneliding_coro_shortcircuits_to_empty() {
+#if DEBUG_LOG
   std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver);
   });
@@ -368,7 +404,9 @@ std::optional<T> noneliding_coro_simple_await(test_driver& driver) {
 }
 
 void test_noneliding_coro_simple_await() {
+#if DEBUG_LOG
   std::cerr << "test_noneliding_coro_simple_await" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return noneliding_coro_simple_await<Awaitable, T>(driver);
   });
@@ -391,7 +429,9 @@ std::optional<T> outer_coro(test_driver& driver) {
 }
 
 void test_nested_coroutines() {
+#if DEBUG_LOG
   std::cerr << "test_nested_coroutines" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return outer_coro<Awaitable, T>(driver);
   });

>From 5d6a06d27ba913bc49286a8bdea97446ba37fae2 Mon Sep 17 00:00:00 2001
From: lesha <lesha at meta.com>
Date: Fri, 8 Aug 2025 00:12:00 -0700
Subject: [PATCH 03/15] Improve doc formatting

---
 clang/include/clang/Basic/AttrDocs.td | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index e45f692740193..a80b8e97efee2 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9312,11 +9312,14 @@ flow as):
   }
 
 The benefits of this attribute are:
+
 - **Avoid heap allocations for coro frames**: Allocating short-circuiting
   coros on the stack makes code more predictable under memory pressure.
   Without this attribute, LLVM cannot elide heap allocation even when all
   awaiters are short-circuiting.
+
 - **Performance**: Significantly faster execution and smaller code size.
+
 - **Build time**: Faster compilation due to less IR being generated.
 
 Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
@@ -9343,11 +9346,13 @@ Here is a toy example of a portable short-circuiting awaiter:
 
 If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
 then the coroutine optimizes more like a plain function, with 2 caveats:
+
 - **Behavior:** The coroutine promise provides an implicit exception boundary
   (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
   This exception handling behavior is usually desirable in robust,
   return-value-oriented programs that need short-circuiting coroutines.
   Otherwise, the promise can always re-throw.
+
 - **Speed:** As of 2025, there is still an optimization gap between a
   realistic short-circuiting coro, and the equivalent (but much more verbose)
   function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for

>From 1dabfe69a6b5d86470a988b3ee4624a411f0fc80 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Fri, 8 Aug 2025 22:02:07 -0700
Subject: [PATCH 04/15] Rework the AttrDocs.td addition based on feedback

---
 clang/include/clang/Basic/AttrDocs.td | 60 ++++++++++++++++-----------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index a80b8e97efee2..0313d9b2cede4 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9278,10 +9278,10 @@ The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
 coroutine awaiter type.  When this attribute is present, the awaiter must
 implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
 returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
-called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
-method.  The coroutine being suspended will then be immediately destroyed.
+called directly.  The coroutine being suspended will then be immediately
+destroyed.
 
-Logically, the new behavior is equivalent to this standard code:
+The new behavior is equivalent to this standard code:
 
 .. code-block:: c++
 
@@ -9296,10 +9296,24 @@ stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
 support, the awaiter will behave nearly identically, with the only difference
 being heap allocation instead of stack allocation for the coroutine frame.
 
-This attribute exists to optimize short-circuiting coroutines—coroutines whose
-suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
-short-circuiting (like a ``co_await`` that can be expressed in regular control
-flow as):
+This attribute helps optimize short-circuiting coroutines.
+
+A short-circuiting coroutine is one where every ``co_await`` or ``co_yield``
+either immediately produces a value, or exits the coroutine.  In other words,
+they use coroutine syntax to concisely branch out of a synchronous function. 
+Here are close analogs in other languages:
+
+- Rust has ``Result<T>`` and a ``?`` operator to unpack it, while
+  ``folly::result<T>`` is a C++ short-circuiting coroutine, with ``co_await``
+  acting just like ``?``.
+
+- Haskell has ``Maybe`` & ``Error`` monads.  A short-circuiting ``co_await``
+  loosely corresponds to the monadic ``>>=``, whereas a short-circuiting
+  ``std::optional`` coro would be an exact analog of ``Maybe``.
+
+The C++ implementation relies on short-circuiting awaiters.  These either
+resume synchronously, or immediately destroy the awaiting coroutine and return
+control to the parent:
 
 .. code-block:: c++
 
@@ -9311,7 +9325,20 @@ flow as):
     return /* value representing the "execution short-circuited" outcome */;
   }
 
-The benefits of this attribute are:
+Then, a short-ciruiting coroutine is one where all the suspend points are
+either (i) trivial (like ``std::suspend_never``), or (ii) short-circuiting.
+
+Although the coroutine machinery makes them harder to optimize, logically,
+short-circuiting coroutines are like syntax sugar for regular functions where:
+
+- `co_await` allows expressions to return early.
+
+- `unhandled_exception()` lets the coroutine promise type wrap the function
+  body in an implicit try-catch.  This mandatory exception boundary behavior
+  can be desirable in robust, return-value-oriented programs that benefit from
+  short-circuiting coroutines.  If not, the promise can always re-throw.
+
+This attribute improves short-circuiting coroutines in a few ways:
 
 - **Avoid heap allocations for coro frames**: Allocating short-circuiting
   coros on the stack makes code more predictable under memory pressure.
@@ -9330,7 +9357,7 @@ Here is a toy example of a portable short-circuiting awaiter:
 .. code-block:: c++
 
   template <typename T>
-  struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
+  struct [[clang::coro_await_suspend_destroy]] optional_awaiter {
     std::optional<T> opt_;
     bool await_ready() const noexcept { return opt_.has_value(); }
     T await_resume() { return std::move(opt_).value(); }
@@ -9344,21 +9371,6 @@ Here is a toy example of a portable short-circuiting awaiter:
     }
   };
 
-If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
-then the coroutine optimizes more like a plain function, with 2 caveats:
-
-- **Behavior:** The coroutine promise provides an implicit exception boundary
-  (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
-  This exception handling behavior is usually desirable in robust,
-  return-value-oriented programs that need short-circuiting coroutines.
-  Otherwise, the promise can always re-throw.
-
-- **Speed:** As of 2025, there is still an optimization gap between a
-  realistic short-circuiting coro, and the equivalent (but much more verbose)
-  function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
-  improvement is to also elide trivial suspends like `std::suspend_never`, in
-  order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
-
 }];
 }
 

>From 62789efb010ae6629afa9d46597aa33c3652c2cb Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Sat, 9 Aug 2025 00:23:37 -0700
Subject: [PATCH 05/15] Split out the `libcxx/test` change into PR #152820

---
 .../coro_await_suspend_destroy.pass.cpp       | 449 ------------------
 1 file changed, 449 deletions(-)
 delete mode 100644 libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp

diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
deleted file mode 100644
index 9da8ba530edf3..0000000000000
--- a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
+++ /dev/null
@@ -1,449 +0,0 @@
-//===-- Integration test for `clang::co_await_suspend_destroy` ------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-// Test for the `coro_await_suspend_destroy` attribute and
-// `await_suspend_destroy` method.
-//
-// Per `AttrDocs.td`, using `coro_await_suspend_destroy` with
-// `await_suspend_destroy` should be equivalent to providing a stub
-// `await_suspend` that calls `await_suspend_destroy` and then destroys the
-// coroutine handle.
-//
-// This test logs control flow in a variety of scenarios (controlled by
-// `test_toggles`), and checks that the execution traces are identical for
-// awaiters with/without the attribute. We currently test all combinations of
-// error injection points to ensure behavioral equivalence.
-//
-// In contrast to Clang `lit` tests, this makes it easy to verify non-divergence
-// of functional behavior of the entire coroutine across many scenarios,
-// including exception handling, early returns, and mixed usage with legacy
-// awaitables.
-//
-//===----------------------------------------------------------------------===//
-
-// UNSUPPORTED: c++03, c++11, c++14, c++17
-
-#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
-#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY [[clang::coro_await_suspend_destroy]]
-#else
-#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY
-#endif
-
-#include <cassert>
-#include <coroutine>
-#include <exception>
-#include <iostream>
-#include <memory>
-#include <optional>
-#include <string>
-
-#define DEBUG_LOG 0 // Logs break no-localization CI, set to 1 if needed
-
-#ifndef TEST_HAS_NO_EXCEPTIONS
-#  define THROW(_ex) throw _ex;
-#else
-#  define THROW(_ex)
-#endif
-
-struct my_err : std::exception {};
-
-enum test_toggles {
-  throw_in_convert_optional_wrapper = 0,
-  throw_in_return_value,
-  throw_in_await_resume,
-  throw_in_await_suspend_destroy,
-  dynamic_short_circuit,          // Does not apply to `..._shortcircuits_to_empty` tests
-  largest = dynamic_short_circuit // for array in `test_driver`
-};
-
-enum test_event {
-  unset = 0,
-  // Besides events, we also log various integers between 1 and 9999 that
-  // disambiguate different awaiters, or represent different return values.
-  convert_optional_wrapper = 10000,
-  destroy_return_object,
-  destroy_promise,
-  get_return_object,
-  initial_suspend,
-  final_suspend,
-  return_value,
-  throw_return_value,
-  unhandled_exception,
-  await_ready,
-  await_resume,
-  destroy_optional_awaitable,
-  throw_await_resume,
-  await_suspend_destroy,
-  throw_await_suspend_destroy,
-  await_suspend,
-  coro_catch,
-  throw_convert_optional_wrapper,
-};
-
-struct test_driver {
-  static constexpr int max_events = 1000;
-
-  bool toggles_[test_toggles::largest + 1] = {};
-  int events_[max_events]                  = {};
-  int cur_event_                           = 0;
-
-  bool toggles(test_toggles toggle) const { return toggles_[toggle]; }
-  void log(auto&&... events) {
-    for (auto event : {static_cast<int>(events)...}) {
-      assert(cur_event_ < max_events);
-      events_[cur_event_++] = event;
-    }
-  }
-};
-
-// `optional_wrapper` exists since `get_return_object()` can't return
-// `std::optional` directly. C++ coroutines have a fundamental timing mismatch
-// between when the return object is created and when the value is available:
-//
-// 1) Early (coroutine startup): `get_return_object()` is called and must return
-//    something immediately.
-// 2) Later (when `co_return` executes): `return_value(T)` is called with the
-//    actual value.
-// 3) Issue: If `get_return_object()` returns the storage, it's empty when
-//    returned, and writing to it later cannot affect the already-returned copy.
-template <typename T>
-struct optional_wrapper {
-  test_driver& driver_;
-  std::optional<T> storage_;
-  std::optional<T>*& pointer_;
-  optional_wrapper(test_driver& driver, std::optional<T>*& p) : driver_(driver), pointer_(p) { pointer_ = &storage_; }
-  operator std::optional<T>() {
-    if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) {
-      driver_.log(test_event::throw_convert_optional_wrapper);
-      THROW(my_err());
-    }
-    driver_.log(test_event::convert_optional_wrapper);
-    return std::move(storage_);
-  }
-  ~optional_wrapper() { driver_.log(test_event::destroy_return_object); }
-};
-
-// Make `std::optional` a coroutine
-template <typename T, typename... Args>
-struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> {
-  struct promise_type {
-    std::optional<T>* storagePtr_ = nullptr;
-    test_driver& driver_;
-
-    promise_type(test_driver& driver, auto&&...) : driver_(driver) {}
-    ~promise_type() { driver_.log(test_event::destroy_promise); }
-    optional_wrapper<T> get_return_object() {
-      driver_.log(test_event::get_return_object);
-      return optional_wrapper<T>(driver_, storagePtr_);
-    }
-    std::suspend_never initial_suspend() const noexcept {
-      driver_.log(test_event::initial_suspend);
-      return {};
-    }
-    std::suspend_never final_suspend() const noexcept {
-      driver_.log(test_event::final_suspend);
-      return {};
-    }
-    void return_value(T value) {
-      driver_.log(test_event::return_value, value);
-      if (driver_.toggles(test_toggles::throw_in_return_value)) {
-        driver_.log(test_event::throw_return_value);
-        THROW(my_err());
-      }
-      *storagePtr_ = std::move(value);
-    }
-    void unhandled_exception() {
-      // Leave `*storagePtr_` empty to represent error
-      driver_.log(test_event::unhandled_exception);
-    }
-  };
-};
-
-template <typename T, bool HasAttr>
-struct base_optional_awaitable {
-  test_driver& driver_;
-  int id_;
-  std::optional<T> opt_;
-
-  ~base_optional_awaitable() { driver_.log(test_event::destroy_optional_awaitable, id_); }
-
-  bool await_ready() const noexcept {
-    driver_.log(test_event::await_ready, id_);
-    return opt_.has_value();
-  }
-  T await_resume() {
-    if (driver_.toggles(test_toggles::throw_in_await_resume)) {
-      driver_.log(test_event::throw_await_resume, id_);
-      THROW(my_err());
-    }
-    driver_.log(test_event::await_resume, id_);
-    return std::move(opt_).value();
-  }
-  void await_suspend_destroy(auto& promise) {
-#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
-    if constexpr (HasAttr) {
-      // This is just here so that old & new events compare exactly equal.
-      driver_.log(test_event::await_suspend);
-    }
-#endif
-    assert(promise.storagePtr_);
-    if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) {
-      driver_.log(test_event::throw_await_suspend_destroy, id_);
-      THROW(my_err());
-    }
-    driver_.log(test_event::await_suspend_destroy, id_);
-  }
-  void await_suspend(auto handle) {
-    driver_.log(test_event::await_suspend);
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-};
-
-template <typename T>
-struct old_optional_awaitable : base_optional_awaitable<T, false> {};
-
-template <typename T>
-struct ATTR_CORO_AWAIT_SUSPEND_DESTROY new_optional_awaitable : base_optional_awaitable<T, true> {};
-
-void enumerate_toggles(auto lambda) {
-  // Generate all combinations of toggle values
-  for (int mask = 0; mask <= (1 << (test_toggles::largest + 1)) - 1; ++mask) {
-    test_driver driver;
-    for (int i = 0; i <= test_toggles::largest; ++i) {
-      driver.toggles_[i] = (mask & (1 << i)) != 0;
-    }
-    lambda(driver);
-  }
-}
-
-template <typename T>
-void check_coro_with_driver_for(auto coro_fn) {
-  enumerate_toggles([&](const test_driver& driver) {
-    auto old_driver = driver;
-    std::optional<T> old_res;
-    bool old_threw = false;
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    try {
-#endif
-      old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver);
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    } catch (const my_err&) {
-      old_threw = true;
-    }
-#endif
-    auto new_driver = driver;
-    std::optional<T> new_res;
-    bool new_threw = false;
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    try {
-#endif
-      new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver);
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    } catch (const my_err&) {
-      new_threw = true;
-    }
-#endif
-
-#if DEBUG_LOG
-    // Print toggle values for debugging
-    std::string toggle_info = "Toggles: ";
-    for (int i = 0; i <= test_toggles::largest; ++i) {
-      if (driver.toggles_[i]) {
-        toggle_info += std::to_string(i) + " ";
-      }
-    }
-    toggle_info += "\n";
-    std::cerr << toggle_info.c_str() << std::endl;
-#endif
-
-    assert(old_threw == new_threw);
-    assert(old_res == new_res);
-
-    // Compare events arrays directly using cur_event_ and indices
-    assert(old_driver.cur_event_ == new_driver.cur_event_);
-    for (int i = 0; i < old_driver.cur_event_; ++i) {
-      assert(old_driver.events_[i] == new_driver.events_[i]);
-    }
-  });
-}
-
-// Move-only, non-nullable type that quacks like int but stores a
-// heap-allocated int. Used to exercise the machinery with a nontrivial type.
-class heap_int {
-private:
-  std::unique_ptr<int> ptr_;
-
-public:
-  explicit heap_int(int value) : ptr_(std::make_unique<int>(value)) {}
-
-  heap_int operator+(const heap_int& other) const { return heap_int(*ptr_ + *other.ptr_); }
-
-  bool operator==(const heap_int& other) const { return *ptr_ == *other.ptr_; }
-
-  /*implicit*/ operator int() const { return *ptr_; }
-};
-
-void check_coro_with_driver(auto coro_fn) {
-  check_coro_with_driver_for<int>(coro_fn);
-  check_coro_with_driver_for<heap_int>(coro_fn);
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) {
-  T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
-  co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
-  co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
-}
-
-void test_coro_shortcircuits_to_empty() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_shortcircuits_to_empty" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_shortcircuits_to_empty<Awaitable, T>(driver);
-  });
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_simple_await(test_driver& driver) {
-  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
-      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
-}
-
-void test_coro_simple_await() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_simple_await" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_simple_await<Awaitable, T>(driver);
-  });
-}
-
-// The next pair of tests checks that adding a `try-catch` in the coroutine
-// doesn't affect control flow when `await_suspend_destroy` awaiters are in use.
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) {
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  try {
-#endif
-    T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
-    co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
-    co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  } catch (...) {
-    driver.log(test_event::coro_catch);
-    throw;
-  }
-#endif
-}
-
-void test_coro_catching_shortcircuits_to_empty() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver);
-  });
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_catching_simple_await(test_driver& driver) {
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  try {
-#endif
-    co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
-        co_await Awaitable{
-            driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  } catch (...) {
-    driver.log(test_event::coro_catch);
-    throw;
-  }
-#endif
-}
-
-void test_coro_catching_simple_await() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_catching_simple_await" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_catching_simple_await<Awaitable, T>(driver);
-  });
-}
-
-// The next pair of tests shows that the `await_suspend_destroy` code path works
-// correctly, even if it's mixed in a coroutine with legacy awaitables.
-
-template <typename Awaitable, typename T>
-std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) {
-  T n  = co_await Awaitable{driver, 1, std::optional<T>{11}};
-  T n2 = co_await old_optional_awaitable<T>{driver, 2, std::optional<T>{22}};
-  co_await Awaitable{driver, 3, std::optional<T>{}}; // return early!
-  co_return n + n2 + co_await Awaitable{driver, 4, std::optional<T>{44}};
-}
-
-void test_noneliding_coro_shortcircuits_to_empty() {
-#if DEBUG_LOG
-  std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver);
-  });
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> noneliding_coro_simple_await(test_driver& driver) {
-  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
-      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}} +
-      co_await old_optional_awaitable<T>{driver, 3, std::optional<T>{33}};
-}
-
-void test_noneliding_coro_simple_await() {
-#if DEBUG_LOG
-  std::cerr << "test_noneliding_coro_simple_await" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return noneliding_coro_simple_await<Awaitable, T>(driver);
-  });
-}
-
-// Test nested coroutines (coroutines that await other coroutines)
-
-template <typename Awaitable, typename T>
-std::optional<T> inner_coro(test_driver& driver, int base_id) {
-  co_return co_await Awaitable{driver, base_id, std::optional<T>{100}} +
-      co_await Awaitable{
-          driver, base_id + 1, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{200}};
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> outer_coro(test_driver& driver) {
-  T result1 = co_await Awaitable{driver, 1, inner_coro<Awaitable, T>(driver, 10)};
-  T result2 = co_await Awaitable{driver, 2, inner_coro<Awaitable, T>(driver, 20)};
-  co_return result1 + result2;
-}
-
-void test_nested_coroutines() {
-#if DEBUG_LOG
-  std::cerr << "test_nested_coroutines" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return outer_coro<Awaitable, T>(driver);
-  });
-}
-
-int main(int, char**) {
-  test_coro_shortcircuits_to_empty();
-  test_coro_simple_await();
-  test_coro_catching_shortcircuits_to_empty();
-  test_coro_catching_simple_await();
-  test_noneliding_coro_shortcircuits_to_empty();
-  test_noneliding_coro_simple_await();
-  test_nested_coroutines();
-  return 0;
-}

>From 4835f37b66a70c22d59d151a4ef7aab6de1b4524 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Fri, 8 Aug 2025 19:52:17 -0700
Subject: [PATCH 06/15] Lift standard suspend flow to emitStandardAwaitSuspend;
 tweak comment.

---
 clang/lib/CodeGen/CGCoroutine.cpp | 174 ++++++++++++++++--------------
 1 file changed, 96 insertions(+), 78 deletions(-)

diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index d74bef592aa9c..883a45d2acfff 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -282,6 +282,15 @@ namespace {
 }
 
 // The simplified `await_suspend_destroy` path avoids suspend intrinsics.
+//
+// If a coro has only `await_suspend_destroy` and trivial (`suspend_never`)
+// awaiters, then subsequent passes are able to allocate its frame on-stack.
+//
+// As of 2025, there is still an optimization gap between a realistic
+// short-circuiting coro, and the equivalent plain function.  For a
+// guesstimate, expect 4-5ns per call on x86.  One idea for improvement is to
+// also elide trivial suspends like `std::suspend_never`, in order to hit the
+// `HasCoroSuspend` path in `CoroEarly.cpp`.
 static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro,
                                     llvm::Function *SuspendWrapper,
                                     llvm::Value *Awaiter, llvm::Value *Frame,
@@ -299,6 +308,89 @@ static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro,
   CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
 }
 
+static void emitStandardAwaitSuspend(
+    CodeGenFunction &CGF, CGCoroData &Coro, CoroutineSuspendExpr const &S,
+    llvm::Function *SuspendWrapper, llvm::Value *Awaiter, llvm::Value *Frame,
+    bool AwaitSuspendCanThrow, SmallString<32> Prefix, BasicBlock *ReadyBlock,
+    AwaitKind Kind, CoroutineSuspendExpr::SuspendReturnType SuspendReturnType) {
+  auto &Builder = CGF.Builder;
+
+  CGF.CurCoro.InSuspendBlock = true;
+
+  SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
+  SuspendIntrinsicCallArgs.push_back(Awaiter);
+  SuspendIntrinsicCallArgs.push_back(Frame);
+  SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
+  BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
+
+  llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
+  auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
+  auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
+
+  llvm::Intrinsic::ID AwaitSuspendIID;
+  switch (SuspendReturnType) {
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
+    break;
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
+    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
+    break;
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
+    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
+    break;
+  }
+
+  llvm::Function *AwaitSuspendIntrinsic = CGF.CGM.getIntrinsic(AwaitSuspendIID);
+
+  llvm::CallBase *SuspendRet = nullptr;
+  // FIXME: add call attributes?
+  if (AwaitSuspendCanThrow)
+    SuspendRet =
+        CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
+  else
+    SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
+                                             SuspendIntrinsicCallArgs);
+
+  assert(SuspendRet);
+  CGF.CurCoro.InSuspendBlock = false;
+
+  switch (SuspendReturnType) {
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+    assert(SuspendRet->getType()->isVoidTy());
+    break;
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
+    assert(SuspendRet->getType()->isIntegerTy());
+
+    // Veto suspension if requested by bool returning await_suspend.
+    BasicBlock *RealSuspendBlock =
+        CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
+    CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
+    CGF.EmitBlock(RealSuspendBlock);
+    break;
+  }
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
+    assert(SuspendRet->getType()->isVoidTy());
+    break;
+  }
+  }
+
+  // Emit the suspend point.
+  const bool IsFinalSuspend = (Kind == AwaitKind::Final);
+  llvm::Function *CoroSuspend =
+      CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
+  auto *SuspendResult = Builder.CreateCall(
+      CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
+
+  // Create a switch capturing three possible continuations.
+  auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
+  Switch->addCase(Builder.getInt8(0), ReadyBlock);
+  Switch->addCase(Builder.getInt8(1), CleanupBlock);
+
+  // Emit cleanup for this suspend point.
+  CGF.EmitBlock(CleanupBlock);
+  CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+}
+
 static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
@@ -320,8 +412,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   // Otherwise, emit suspend logic.
   CGF.EmitBlock(SuspendBlock);
 
-  auto &Builder = CGF.Builder;
-
   auto SuspendWrapper = CodeGenFunction(CGF.CGM).generateAwaitSuspendWrapper(
       CGF.CurFn->getName(), Prefix, S);
 
@@ -343,82 +433,9 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
     emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame,
                             AwaitSuspendCanThrow);
   } else { // Normal suspend path -- can actually suspend, uses intrinsics
-    CGF.CurCoro.InSuspendBlock = true;
-
-    SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
-    SuspendIntrinsicCallArgs.push_back(Awaiter);
-    SuspendIntrinsicCallArgs.push_back(Frame);
-    SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
-    BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
-
-    llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
-    auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
-    auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
-
-    llvm::Intrinsic::ID AwaitSuspendIID;
-
-    switch (SuspendReturnType) {
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
-      break;
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
-      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
-      break;
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
-      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
-      break;
-    }
-
-    llvm::Function *AwaitSuspendIntrinsic =
-        CGF.CGM.getIntrinsic(AwaitSuspendIID);
-
-    llvm::CallBase *SuspendRet = nullptr;
-    // FIXME: add call attributes?
-    if (AwaitSuspendCanThrow)
-      SuspendRet =
-          CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
-    else
-      SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
-                                               SuspendIntrinsicCallArgs);
-
-    assert(SuspendRet);
-    CGF.CurCoro.InSuspendBlock = false;
-
-    switch (SuspendReturnType) {
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-      assert(SuspendRet->getType()->isVoidTy());
-      break;
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
-      assert(SuspendRet->getType()->isIntegerTy());
-
-      // Veto suspension if requested by bool returning await_suspend.
-      BasicBlock *RealSuspendBlock =
-          CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
-      CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
-      CGF.EmitBlock(RealSuspendBlock);
-      break;
-    }
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
-      assert(SuspendRet->getType()->isVoidTy());
-      break;
-    }
-    }
-
-    // Emit the suspend point.
-    const bool IsFinalSuspend = (Kind == AwaitKind::Final);
-    llvm::Function *CoroSuspend =
-        CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
-    auto *SuspendResult = Builder.CreateCall(
-        CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
-
-    // Create a switch capturing three possible continuations.
-    auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
-    Switch->addCase(Builder.getInt8(0), ReadyBlock);
-    Switch->addCase(Builder.getInt8(1), CleanupBlock);
-
-    // Emit cleanup for this suspend point.
-    CGF.EmitBlock(CleanupBlock);
-    CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+    emitStandardAwaitSuspend(CGF, Coro, S, SuspendWrapper, Awaiter, Frame,
+                             AwaitSuspendCanThrow, Prefix, ReadyBlock, Kind,
+                             SuspendReturnType);
   }
 
   // Emit await_resume expression.
@@ -429,6 +446,7 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   CXXTryStmt *TryStmt = nullptr;
   if (Coro.ExceptionHandler && Kind == AwaitKind::Init &&
       StmtCanThrow(S.getResumeExpr())) {
+    auto &Builder = CGF.Builder;
     Coro.ResumeEHVar =
         CGF.CreateTempAlloca(Builder.getInt1Ty(), Prefix + Twine("resume.eh"));
     Builder.CreateFlagStore(true, Coro.ResumeEHVar);

>From 99703af1f7dc906fc376d7aae01c025f19a88df5 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 07/15] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
    is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
    caught all the interesting bugs that I had in earlier revs, and covers
    equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

    * I considered optimizing unmodified `await_suspend()` methods, as long as
      they unconditionally end with an `h.destroy()` call on the current
      handle, or an exception.  However, this would (a) force dynamic dispatch
      for `destroy` -- bloating IR & reducing optimization opportunities, (b)
      require far more complex, delicate, and fragile analysis, (c) retain more
      of the frame setup, so that e.g.  `h.done()` works properly.  The current
      solution shortcuts all these concerns.

    * I want to `Promise&`, rather than `std::coroutine_handle` to
      `await_suspend_destroy` -- this is safer, simpler, and more efficient.
      Short-circuiting corotuines should not touch the handle.  This decision
      forces the attribue to go on the class.  Resolving a method attribute
      would have required looking up overloads for both types, and choosing
      one, which is costly and a bad UX to boot.

    * `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
      This portability / compatibility solution avoids dire issues that would
      arise if users relied on `__has_cpp_attribute` and the declaration and
      definition happened to use different toolchains.  In particular, it will
      even be safe for a future compiler release to killswitch this attribute
      by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
                 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
    a further optimization opportunity.  But, I'm sure there are
    higher-leverage ways of making these non-suspending coros compile better, I
    just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
    would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
    https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
    Heap allocs are definitely elided, the compiled code looks like a function,
    not a coroutine, but there's still an optimization gap.  On the plus side,
    this results in a 4x speedup (!) in optimized ASAN builds (numbers not
    shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result<int> result_coro(result<int>&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result<int> catching_result_func(result<int>&& r) {
  return result_catch_all([&]() -> result<int> {
    if (r.has_value()) {
      return r.value_or_throw() + 1;
    }
    return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result<int> non_catching_result_func(result<int>&& r) {
  if (r.has_value()) {
    return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}

============================================================================
[...]lly/result/test/result_coro_bench.cpp     relative  time/iter   iters/s
============================================================================
result_coro_success                                        13.61ns    73.49M
non_catching_result_func_success                            3.39ns   295.00M
catching_result_func_success                                4.41ns   226.88M
result_coro_error                                          19.55ns    51.16M
non_catching_result_func_error                              9.15ns   109.26M
catching_result_func_error                                 10.19ns    98.10M

============================================================================
[...]lly/result/test/result_coro_bench.cpp     relative  time/iter   iters/s
============================================================================
result_coro_success                                        10.59ns    94.39M
non_catching_result_func_success                            3.39ns   295.00M
catching_result_func_success                                4.07ns   245.81M
result_coro_error                                          13.66ns    73.18M
non_catching_result_func_error                              9.00ns   111.11M
catching_result_func_error                                 10.04ns    99.63M
```

Demo program from the Compiler Explorer link above:

```cpp
 #include <coroutine>
 #include <optional>

// Read this LATER -- this implementation detail isn't required to understand
// the value of [[clang::coro_await_suspend_destroy]].
//
// `optional_wrapper` exists since `get_return_object()` can't return
// `std::optional` directly. C++ coroutines have a fundamental timing mismatch
// between when the return object is created and when the value is available:
//
// 1) Early (coroutine startup): `get_return_object()` is called and must return
//    something immediately.
// 2) Later (when `co_return` executes): `return_value(T)` is called with the
//    actual value.
// 3) Issue: If `get_return_object()` returns the storage, it's empty when
//    returned, and writing to it later cannot affect the already-returned copy.
template <typename T>
struct optional_wrapper {
  std::optional<T> storage_;
  std::optional<T>*& pointer_;
  optional_wrapper(std::optional<T>*& p) : pointer_(p) {
    pointer_ = &storage_;
  }
  operator std::optional<T>() { return std::move(storage_); }
  ~optional_wrapper() {}
};

// Make `std::optional` a coroutine
template <typename T, typename... Args>
struct std::coroutine_traits<std::optional<T>, Args...> {
  struct promise_type {
    std::optional<T>* storagePtr_ = nullptr;
    promise_type() = default;
    ::optional_wrapper<T> get_return_object() {
      return ::optional_wrapper<T>(storagePtr_);
    }
    std::suspend_never initial_suspend() const noexcept { return {}; }
    std::suspend_never final_suspend() const noexcept { return {}; }
    void return_value(T&& value) { *storagePtr_ = std::move(value); }
    void unhandled_exception() {
      // Leave storage_ empty to represent error
    }
  };
};

template <typename T>
struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
  std::optional<T> opt_;
  bool await_ready() const noexcept { return opt_.has_value(); }
  T await_resume() { return std::move(opt_).value(); }
  // Adding `noexcept` here makes the early IR much smaller, but the
  // optimizer is able to discard the cruft for simpler cases.
  void await_suspend_destroy(auto& promise) noexcept {
    // Assume the return object defaults to "empty"
  }
  void await_suspend(auto handle) {
    await_suspend_destroy(handle.promise());
    handle.destroy();
  }
};

template <typename T>
optional_awaitable<T> operator co_await(std::optional<T> opt) {
  return {std::move(opt)};
}

// Non-coroutine baseline -- matches the logic of `simple_coro`.
std::optional<int> simple_func(const std::optional<int>& r) {
  try {
    if (r.has_value()) {
        return r.value() + 1;
    }
  } catch (...) {}
  return std::nullopt; // return empty on empty input or error
}

// Without `coro_await_suspend_destroy`, allocates its frame on-heap.
std::optional<int> simple_coro(const std::optional<int>& r) {
  co_return co_await std::move(r) + 4;
}

// Without `co_await`, this optimizes much like `simple_func`.
// Bugs:
//  - Doesn't short-circuit when `r` is empty, but throws
//  - Lacks an exception boundary
std::optional<int> wrong_simple_coro(const std::optional<int>& r) {
  co_return r.value() + 2;
}

int main() {
  return
      simple_func(std::optional<int>{32}).value() +
      simple_coro(std::optional<int>{8}).value() +
      wrong_simple_coro(std::optional<int>{16}).value();
}
```

Test Plan:

For the all-important E2E test, I used this terrible cargo-culted script to run
the new end-to-end test with the new compiler.  (Yes, I realize I should only
need 10% of those `-D` settings for a successful build.)

To make sure the test covered what I meant it to do:
  - I also added an `#error` in the "no attribute" branch to make sure the
    compiler indeed supports the attribute.
  - I ran it with a compiler not supporting the attribute, and that also
    passed.
  - I also tried `return 1;` from `main()` and saw the logs of the 7 successful
    tests running.

```sh
 #!/bin/bash -uex
set -o pipefail
LLVMBASE=/path/to/source/of/llvm-project
SYSCLANG=/path/to/origianl/bin/clang

 # NB Can add `--debug-output` to debug cmake...

 # Bootstrap clang -- Use `RelWithDebInfo` or the next phase is too slow!
mkdir -p bootstrap
cd bootstrap
cmake "$LLVMBASE/llvm" \
    -G Ninja \
    -DBUILD_SHARED_LIBS=true \
    -DCMAKE_ASM_COMPILER="$SYSCLANG" \
    -DCMAKE_ASM_COMPILER_ID=Clang \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DCMAKE_CXX_COMPILER="$SYSCLANG"++ \
    -DCMAKE_C_COMPILER="$SYSCLANG" \
    -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-redhat-linux-gnu \
    -DLLVM_HOST_TRIPLE=x86_64-redhat-linux-gnu \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_BINDINGS=OFF \
    -DLLVM_ENABLE_LLD=ON \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_OPTIMIZED_TABLEGEN=true \
    -DLLVM_FORCE_ENABLE_STATS=ON \
    -DLLVM_ENABLE_DUMP=ON \
    -DCLANG_DEFAULT_PIE_ON_LINUX=OFF
ninja clang lld
ninja check-clang-codegencoroutines # Includes the new IR regression tests
cd ..

NEWCLANG="$PWD"/bootstrap/bin/clang
NEWLLD="$PWD"/bootstrap/bin/lld
 # LIBCXX_INCLUDE_BENCHMARKS=OFF because google-benchmark bugs out
cmake "$LLVMBASE/runtimes" \
    -G Ninja \
    -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-redhat-linux-gnu \
    -DLLVM_HOST_TRIPLE=x86_64-redhat-linux-gnu \
    -DBUILD_SHARED_LIBS=true \
    -DCMAKE_ASM_COMPILER="$NEWCLANG" \
    -DCMAKE_ASM_COMPILER_ID=Clang \
    -DCMAKE_C_COMPILER="$NEWCLANG" \
    -DCMAKE_CXX_COMPILER="$NEWCLANG"++ \
    -DLLVM_FORCE_ENABLE_STATS=ON \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_LLD=ON \
    -DLIBCXX_INCLUDE_TESTS=ON \
    -DLIBCXX_INCLUDE_BENCHMARKS=OFF \
    -DLLVM_INCLUDE_TESTS=ON \
    -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

ninja cxx-test-depends

LIBCXXBUILD=$PWD
cd "$LLVMBASE"

libcxx/utils/libcxx-lit "$LIBCXXBUILD" -v \
    libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
```
---
 clang/docs/ReleaseNotes.rst                   |   6 +
 clang/include/clang/Basic/Attr.td             |   8 +
 clang/include/clang/Basic/AttrDocs.td         |  87 ++++
 .../clang/Basic/DiagnosticSemaKinds.td        |   3 +
 clang/lib/CodeGen/CGCoroutine.cpp             | 232 +++++++---
 clang/lib/Sema/SemaCoroutine.cpp              | 102 ++++-
 .../coro-await-suspend-destroy-errors.cpp     |  55 +++
 .../coro-await-suspend-destroy.cpp            | 129 ++++++
 ...a-attribute-supported-attributes-list.test |   1 +
 .../coro_await_suspend_destroy.pass.cpp       | 409 ++++++++++++++++++
 10 files changed, 942 insertions(+), 90 deletions(-)
 create mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
 create mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
 create mode 100644 libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index ac697e39dc184..0de047c10fdb1 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -153,6 +153,12 @@ Removed Compiler Flags
 Attribute Changes in Clang
 --------------------------
 
+- Introduced a new attribute ``[[clang::coro_await_suspend_destroy]]``.  When
+  applied to a coroutine awaiter class, it causes suspensions into this awaiter
+  to use a new `await_suspend_destroy(Promise&)` method instead of the standard
+  `await_suspend(std::coroutine_handle<...>)`.  The coroutine is then destroyed.
+  This improves code speed & size for "short-circuiting" coroutines.
+
 Improvements to Clang's diagnostics
 -----------------------------------
 - Added a separate diagnostic group ``-Wfunction-effect-redeclarations``, for the more pedantic
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 8c8e0b3bca46c..646a101459f86 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1352,6 +1352,14 @@ def CoroAwaitElidableArgument : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def CoroAwaitSuspendDestroy: InheritableAttr {
+  let Spellings = [Clang<"coro_await_suspend_destroy">];
+  let Subjects = SubjectList<[CXXRecord]>;
+  let LangOpts = [CPlusPlus];
+  let Documentation = [CoroAwaitSuspendDestroyDoc];
+  let SimpleHandler = 1;
+}
+
 // OSObject-based attributes.
 def OSConsumed : InheritableParamAttr {
   let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 00e8fc0787884..e2360bd48b0f7 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9363,6 +9363,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control
+flow as):
+
+.. code-block:: c++
+
+  T val;
+  if (awaiter.await_ready()) {
+    val = awaiter.await_resume();
+  } else {
+    awaiter.await_suspend();
+    return /* value representing the "execution short-circuited" outcome */;
+  }
+
+The benefits of this attribute are:
+  - **Avoid heap allocations for coro frames**: Allocating short-circuiting
+    coros on the stack makes code more predictable under memory pressure.
+    Without this attribute, LLVM cannot elide heap allocation even when all
+    awaiters are short-circuiting.
+  - **Performance**: Significantly faster execution and smaller code size.
+  - **Build time**: Faster compilation due to less IR being generated.
+
+Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
+further improve optimization.
+
+Here is a toy example of a portable short-circuiting awaiter:
+
+.. code-block:: c++
+
+  template <typename T>
+  struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
+    std::optional<T> opt_;
+    bool await_ready() const noexcept { return opt_.has_value(); }
+    T await_resume() { return std::move(opt_).value(); }
+    void await_suspend_destroy(auto& promise) {
+      // Assume the return object of the outer coro defaults to "empty".
+    }
+    // Fallback for when `coro_await_suspend_destroy` is unavailable.
+    void await_suspend(auto handle) {
+      await_suspend_destroy(handle.promise());
+      handle.destroy();
+    }
+  };
+
+If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
+then the coroutine optimizes more like a plain function, with 2 caveats:
+  - **Behavior:** The coroutine promise provides an implicit exception boundary
+    (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
+    This exception handling behavior is usually desirable in robust,
+    return-value-oriented programs that need short-circuiting coroutines.
+    Otherwise, the promise can always re-throw.
+  - **Speed:** As of 2025, there is still an optimization gap between a
+    realistic short-circuiting coro, and the equivalent (but much more verbose)
+    function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
+    improvement is to also elide trivial suspends like `std::suspend_never`, in
+    order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
+
+}];
+}
+
 def CountedByDocs : Documentation {
   let Category = DocCatField;
   let Content = [{
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index a7f3d37823075..6479b2c732917 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12507,6 +12507,9 @@ def note_coroutine_promise_call_implicitly_required : Note<
 def err_await_suspend_invalid_return_type : Error<
   "return type of 'await_suspend' is required to be 'void' or 'bool' (have %0)"
 >;
+def err_await_suspend_destroy_invalid_return_type : Error<
+  "return type of 'await_suspend_destroy' is required to be 'void' (have %0)"
+>;
 def note_await_ready_no_bool_conversion : Note<
   "return type of 'await_ready' is required to be contextually convertible to 'bool'"
 >;
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 827385f9c1a1f..d74bef592aa9c 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -174,6 +174,66 @@ static bool StmtCanThrow(const Stmt *S) {
   return false;
 }
 
+// Check if this suspend should be calling `await_suspend_destroy`
+static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) {
+  // This can only be an `await_suspend_destroy` suspend expression if it
+  // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this.
+  // Moreover, when `await_suspend` returns a handle, the outermost method call
+  // is `.address()` -- making it harder to get the actual class or method.
+  if (S.getSuspendReturnType() !=
+      CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) {
+    return false;
+  }
+
+  // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend
+  // expression uses `[[clang::coro_await_suspend_destroy]]`.
+  //
+  // Any mismatch is a serious bug -- we would either double-free, or fail to
+  // destroy the promise type. For this reason, we make our decision based on
+  // the method name, and fatal outside of the happy path -- including on
+  // failure to find a method name.
+  //
+  // As a debug-only check we also try to detect the `AwaiterClass`. This is
+  // secondary, because  detection of the awaiter type can be silently broken by
+  // small `buildCoawaitCalls` AST changes.
+  StringRef SuspendMethodName;           // Primary
+  CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort
+  if (auto *SuspendCall =
+          dyn_cast<CallExpr>(S.getSuspendExpr()->IgnoreImplicit())) {
+    if (auto *SuspendMember = dyn_cast<MemberExpr>(SuspendCall->getCallee())) {
+      if (auto *BaseExpr = SuspendMember->getBase()) {
+        // `IgnoreImplicitAsWritten` is critical since `await_suspend...` can be
+        // invoked on the base of the actual awaiter, and the base need not have
+        // the attribute. In such cases, the AST will show the true awaiter
+        // being upcast to the base.
+        AwaiterClass = BaseExpr->IgnoreImplicitAsWritten()
+                           ->getType()
+                           ->getAsCXXRecordDecl();
+      }
+      if (auto *SuspendMethod =
+              dyn_cast<CXXMethodDecl>(SuspendMember->getMemberDecl())) {
+        SuspendMethodName = SuspendMethod->getName();
+      }
+    }
+  }
+  if (SuspendMethodName == "await_suspend_destroy") {
+    assert(!AwaiterClass ||
+           AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>());
+    return true;
+  } else if (SuspendMethodName == "await_suspend") {
+    assert(!AwaiterClass ||
+           !AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>());
+    return false;
+  } else {
+    llvm::report_fatal_error(
+        "Wrong method in [[clang::coro_await_suspend_destroy]] check: "
+        "expected 'await_suspend' or 'await_suspend_destroy', but got '" +
+        SuspendMethodName + "'");
+  }
+
+  return false;
+}
+
 // Emit suspend expression which roughly looks like:
 //
 //   auto && x = CommonExpr();
@@ -220,6 +280,25 @@ namespace {
     RValue RV;
   };
 }
+
+// The simplified `await_suspend_destroy` path avoids suspend intrinsics.
+static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro,
+                                    llvm::Function *SuspendWrapper,
+                                    llvm::Value *Awaiter, llvm::Value *Frame,
+                                    bool AwaitSuspendCanThrow) {
+  SmallVector<llvm::Value *, 2> DirectCallArgs;
+  DirectCallArgs.push_back(Awaiter);
+  DirectCallArgs.push_back(Frame);
+
+  if (AwaitSuspendCanThrow) {
+    CGF.EmitCallOrInvoke(SuspendWrapper, DirectCallArgs);
+  } else {
+    CGF.EmitNounwindRuntimeCall(SuspendWrapper, DirectCallArgs);
+  }
+
+  CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+}
+
 static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
@@ -234,7 +313,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   auto Prefix = buildSuspendPrefixStr(Coro, Kind);
   BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
   BasicBlock *SuspendBlock = CGF.createBasicBlock(Prefix + Twine(".suspend"));
-  BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
 
   // If expression is ready, no need to suspend.
   CGF.EmitBranchOnBoolExpr(S.getReadyExpr(), ReadyBlock, SuspendBlock, 0);
@@ -243,95 +321,105 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   CGF.EmitBlock(SuspendBlock);
 
   auto &Builder = CGF.Builder;
-  llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
-  auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
-  auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
 
   auto SuspendWrapper = CodeGenFunction(CGF.CGM).generateAwaitSuspendWrapper(
       CGF.CurFn->getName(), Prefix, S);
 
-  CGF.CurCoro.InSuspendBlock = true;
-
   assert(CGF.CurCoro.Data && CGF.CurCoro.Data->CoroBegin &&
          "expected to be called in coroutine context");
 
-  SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
-  SuspendIntrinsicCallArgs.push_back(
-      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
-
-  SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
-  SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
-
-  const auto SuspendReturnType = S.getSuspendReturnType();
-  llvm::Intrinsic::ID AwaitSuspendIID;
-
-  switch (SuspendReturnType) {
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
-    break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
-    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
-    break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
-    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
-    break;
-  }
-
-  llvm::Function *AwaitSuspendIntrinsic = CGF.CGM.getIntrinsic(AwaitSuspendIID);
-
   // SuspendHandle might throw since it also resumes the returned handle.
+  const auto SuspendReturnType = S.getSuspendReturnType();
   const bool AwaitSuspendCanThrow =
       SuspendReturnType ==
           CoroutineSuspendExpr::SuspendReturnType::SuspendHandle ||
       StmtCanThrow(S.getSuspendExpr());
 
-  llvm::CallBase *SuspendRet = nullptr;
-  // FIXME: add call attributes?
-  if (AwaitSuspendCanThrow)
-    SuspendRet =
-        CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
-  else
-    SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
-                                             SuspendIntrinsicCallArgs);
+  llvm::Value *Awaiter =
+      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF);
+  llvm::Value *Frame = CGF.CurCoro.Data->CoroBegin;
 
-  assert(SuspendRet);
-  CGF.CurCoro.InSuspendBlock = false;
+  if (useCoroAwaitSuspendDestroy(S)) { // Call `await_suspend_destroy` & cleanup
+    emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame,
+                            AwaitSuspendCanThrow);
+  } else { // Normal suspend path -- can actually suspend, uses intrinsics
+    CGF.CurCoro.InSuspendBlock = true;
 
-  switch (SuspendReturnType) {
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-    assert(SuspendRet->getType()->isVoidTy());
-    break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
-    assert(SuspendRet->getType()->isIntegerTy());
-
-    // Veto suspension if requested by bool returning await_suspend.
-    BasicBlock *RealSuspendBlock =
-        CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
-    CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
-    CGF.EmitBlock(RealSuspendBlock);
-    break;
-  }
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
-    assert(SuspendRet->getType()->isVoidTy());
-    break;
-  }
-  }
+    SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
+    SuspendIntrinsicCallArgs.push_back(Awaiter);
+    SuspendIntrinsicCallArgs.push_back(Frame);
+    SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
+    BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
 
-  // Emit the suspend point.
-  const bool IsFinalSuspend = (Kind == AwaitKind::Final);
-  llvm::Function *CoroSuspend =
-      CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
-  auto *SuspendResult = Builder.CreateCall(
-      CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
+    llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
+    auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
+    auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
 
-  // Create a switch capturing three possible continuations.
-  auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
-  Switch->addCase(Builder.getInt8(0), ReadyBlock);
-  Switch->addCase(Builder.getInt8(1), CleanupBlock);
+    llvm::Intrinsic::ID AwaitSuspendIID;
 
-  // Emit cleanup for this suspend point.
-  CGF.EmitBlock(CleanupBlock);
-  CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+    switch (SuspendReturnType) {
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
+      break;
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
+      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
+      break;
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
+      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
+      break;
+    }
+
+    llvm::Function *AwaitSuspendIntrinsic =
+        CGF.CGM.getIntrinsic(AwaitSuspendIID);
+
+    llvm::CallBase *SuspendRet = nullptr;
+    // FIXME: add call attributes?
+    if (AwaitSuspendCanThrow)
+      SuspendRet =
+          CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
+    else
+      SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
+                                               SuspendIntrinsicCallArgs);
+
+    assert(SuspendRet);
+    CGF.CurCoro.InSuspendBlock = false;
+
+    switch (SuspendReturnType) {
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+      assert(SuspendRet->getType()->isVoidTy());
+      break;
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
+      assert(SuspendRet->getType()->isIntegerTy());
+
+      // Veto suspension if requested by bool returning await_suspend.
+      BasicBlock *RealSuspendBlock =
+          CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
+      CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
+      CGF.EmitBlock(RealSuspendBlock);
+      break;
+    }
+    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
+      assert(SuspendRet->getType()->isVoidTy());
+      break;
+    }
+    }
+
+    // Emit the suspend point.
+    const bool IsFinalSuspend = (Kind == AwaitKind::Final);
+    llvm::Function *CoroSuspend =
+        CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
+    auto *SuspendResult = Builder.CreateCall(
+        CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
+
+    // Create a switch capturing three possible continuations.
+    auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
+    Switch->addCase(Builder.getInt8(0), ReadyBlock);
+    Switch->addCase(Builder.getInt8(1), CleanupBlock);
+
+    // Emit cleanup for this suspend point.
+    CGF.EmitBlock(CleanupBlock);
+    CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+  }
 
   // Emit await_resume expression.
   CGF.EmitBlock(ReadyBlock);
diff --git a/clang/lib/Sema/SemaCoroutine.cpp b/clang/lib/Sema/SemaCoroutine.cpp
index cc03616e0dfe1..0f335f2b35279 100644
--- a/clang/lib/Sema/SemaCoroutine.cpp
+++ b/clang/lib/Sema/SemaCoroutine.cpp
@@ -284,6 +284,45 @@ static ExprResult buildCoroutineHandle(Sema &S, QualType PromiseType,
   return S.BuildCallExpr(nullptr, FromAddr.get(), Loc, FramePtr, Loc);
 }
 
+// To support [[clang::coro_await_suspend_destroy]], this builds
+//   *static_cast<Promise*>(
+//       __builtin_coro_promise(handle, alignof(Promise), false))
+static ExprResult buildPromiseRef(Sema &S, QualType PromiseType,
+                                  SourceLocation Loc) {
+  uint64_t Align =
+      S.Context.getTypeAlign(PromiseType) / S.Context.getCharWidth();
+
+  // Build the call to __builtin_coro_promise()
+  SmallVector<Expr *, 3> Args = {
+      S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_frame, {}),
+      S.ActOnIntegerConstant(Loc, Align).get(),         // alignof(Promise)
+      S.ActOnCXXBoolLiteral(Loc, tok::kw_false).get()}; // false
+  ExprResult CoroPromiseCall =
+      S.BuildBuiltinCallExpr(Loc, Builtin::BI__builtin_coro_promise, Args);
+
+  if (CoroPromiseCall.isInvalid())
+    return ExprError();
+
+  // Cast to Promise*
+  ExprResult CastExpr = S.ImpCastExprToType(
+      CoroPromiseCall.get(), S.Context.getPointerType(PromiseType), CK_BitCast);
+  if (CastExpr.isInvalid())
+    return ExprError();
+
+  // Dereference to get Promise&
+  return S.CreateBuiltinUnaryOp(Loc, UO_Deref, CastExpr.get());
+}
+
+static bool hasCoroAwaitSuspendDestroyAttr(Expr *Awaiter) {
+  QualType AwaiterType = Awaiter->getType();
+  if (auto *RD = AwaiterType->getAsCXXRecordDecl()) {
+    if (RD->hasAttr<CoroAwaitSuspendDestroyAttr>()) {
+      return true;
+    }
+  }
+  return false;
+}
+
 struct ReadySuspendResumeResult {
   enum AwaitCallType { ACT_Ready, ACT_Suspend, ACT_Resume };
   Expr *Results[3];
@@ -394,15 +433,30 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
       Calls.Results[ACT::ACT_Ready] = S.MaybeCreateExprWithCleanups(Conv.get());
   }
 
-  ExprResult CoroHandleRes =
-      buildCoroutineHandle(S, CoroPromise->getType(), Loc);
-  if (CoroHandleRes.isInvalid()) {
-    Calls.IsInvalid = true;
-    return Calls;
+  // For awaiters with `[[clang::coro_await_suspend_destroy]]`, we call
+  // `void await_suspend_destroy(Promise&)` & promptly destroy the coro.
+  CallExpr *AwaitSuspend = nullptr;
+  bool UseAwaitSuspendDestroy = hasCoroAwaitSuspendDestroyAttr(Operand);
+  if (UseAwaitSuspendDestroy) {
+    ExprResult PromiseRefRes = buildPromiseRef(S, CoroPromise->getType(), Loc);
+    if (PromiseRefRes.isInvalid()) {
+      Calls.IsInvalid = true;
+      return Calls;
+    }
+    Expr *PromiseRef = PromiseRefRes.get();
+    AwaitSuspend = cast_or_null<CallExpr>(
+        BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  } else { // The standard `await_suspend(std::coroutine_handle<...>)`
+    ExprResult CoroHandleRes =
+        buildCoroutineHandle(S, CoroPromise->getType(), Loc);
+    if (CoroHandleRes.isInvalid()) {
+      Calls.IsInvalid = true;
+      return Calls;
+    }
+    Expr *CoroHandle = CoroHandleRes.get();
+    AwaitSuspend = cast_or_null<CallExpr>(
+        BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   }
-  Expr *CoroHandle = CoroHandleRes.get();
-  CallExpr *AwaitSuspend = cast_or_null<CallExpr>(
-      BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
     return Calls;
   if (!AwaitSuspend->getType()->isDependentType()) {
@@ -412,25 +466,37 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
     //     type Z.
     QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
-    // Support for coroutine_handle returning await_suspend.
-    if (Expr *TailCallSuspend =
-            maybeTailCall(S, RetType, AwaitSuspend, Loc))
+    auto EmitAwaitSuspendDiag = [&](unsigned int DiagCode) {
+      S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(), DiagCode) << RetType;
+      S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required)
+          << AwaitSuspend->getDirectCallee();
+      Calls.IsInvalid = true;
+    };
+
+    // `await_suspend_destroy` must return `void` -- and `CGCoroutine.cpp`
+    // critically depends on this in `hasCoroAwaitSuspendDestroyAttr`.
+    if (UseAwaitSuspendDestroy) {
+      if (RetType->isVoidType()) {
+        Calls.Results[ACT::ACT_Suspend] =
+            S.MaybeCreateExprWithCleanups(AwaitSuspend);
+      } else {
+        EmitAwaitSuspendDiag(
+            diag::err_await_suspend_destroy_invalid_return_type);
+      }
+      // Support for coroutine_handle returning await_suspend.
+    } else if (Expr *TailCallSuspend =
+                   maybeTailCall(S, RetType, AwaitSuspend, Loc)) {
       // Note that we don't wrap the expression with ExprWithCleanups here
       // because that might interfere with tailcall contract (e.g. inserting
       // clean up instructions in-between tailcall and return). Instead
       // ExprWithCleanups is wrapped within maybeTailCall() prior to the resume
       // call.
       Calls.Results[ACT::ACT_Suspend] = TailCallSuspend;
-    else {
+    } else {
       // non-class prvalues always have cv-unqualified types
       if (RetType->isReferenceType() ||
           (!RetType->isBooleanType() && !RetType->isVoidType())) {
-        S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(),
-               diag::err_await_suspend_invalid_return_type)
-            << RetType;
-        S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required)
-            << AwaitSuspend->getDirectCallee();
-        Calls.IsInvalid = true;
+        EmitAwaitSuspendDiag(diag::err_await_suspend_invalid_return_type);
       } else
         Calls.Results[ACT::ACT_Suspend] =
             S.MaybeCreateExprWithCleanups(AwaitSuspend);
diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
new file mode 100644
index 0000000000000..6a082c15f2581
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
@@ -0,0 +1,55 @@
+// RUN: %clang_cc1 -std=c++20 -verify %s 
+
+#include "Inputs/coroutine.h"
+
+// Coroutine type with `std::suspend_never` for initial/final suspend
+struct Task {
+  struct promise_type {
+    Task get_return_object() { return {}; }
+    std::suspend_never initial_suspend() { return {}; }
+    std::suspend_never final_suspend() noexcept { return {}; }
+    void return_void() {}
+    void unhandled_exception() {}
+  };
+};
+
+struct [[clang::coro_await_suspend_destroy]] WrongReturnTypeAwaitable {
+  bool await_ready() { return false; }
+  bool await_suspend_destroy(auto& promise) { return true; } // expected-error {{return type of 'await_suspend_destroy' is required to be 'void' (have 'bool')}}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_invalid_destroying_await() {
+  co_await WrongReturnTypeAwaitable{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}}
+}
+
+struct [[clang::coro_await_suspend_destroy]] MissingMethodAwaitable {
+  bool await_ready() { return false; }
+  // Missing await_suspend_destroy method
+  void await_suspend(auto handle) {
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_missing_method() {
+  co_await MissingMethodAwaitable{}; // expected-error {{no member named 'await_suspend_destroy' in 'MissingMethodAwaitable'}}
+}
+
+struct [[clang::coro_await_suspend_destroy]] WrongParameterTypeAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(int x) {} // expected-note {{passing argument to parameter 'x' here}}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_wrong_parameter_type() {
+  co_await WrongParameterTypeAwaitable{}; // expected-error {{no viable conversion from 'std::coroutine_traits<Task>::promise_type' (aka 'Task::promise_type') to 'int'}}
+}
diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
new file mode 100644
index 0000000000000..fa1dbf475e56c
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
@@ -0,0 +1,129 @@
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
+// RUN:   -disable-llvm-passes | FileCheck %s --check-prefix=CHECK-INITIAL
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
+// RUN:   -O2 | FileCheck %s --check-prefix=CHECK-OPTIMIZED
+
+#include "Inputs/coroutine.h"
+
+// Awaitable with `coro_await_suspend_destroy` attribute
+struct [[clang::coro_await_suspend_destroy]] DestroyingAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(auto& promise) {}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+// Awaitable without `coro_await_suspend_destroy` (normal behavior)
+struct NormalAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend(std::coroutine_handle<> h) {}
+  void await_resume() {}
+};
+
+// Coroutine type with `std::suspend_never` for initial/final suspend
+struct Task {
+  struct promise_type {
+    Task get_return_object() { return {}; }
+    std::suspend_never initial_suspend() { return {}; }
+    std::suspend_never final_suspend() noexcept { return {}; }
+    void return_void() {}
+    void unhandled_exception() {}
+  };
+};
+
+// Single co_await with coro_await_suspend_destroy.
+// Should result in no allocation after optimization.
+Task test_single_destroying_await() {
+  co_await DestroyingAwaitable{};
+}
+
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv
+// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
+// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+
+// Test multiple `co_await`s, all with `coro_await_suspend_destroy`.
+// This should also result in no allocation after optimization.
+Task test_multiple_destroying_awaits(bool condition) {
+  co_await DestroyingAwaitable{};
+  co_await DestroyingAwaitable{};
+  if (condition) {
+    co_await DestroyingAwaitable{};
+  }
+}
+
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
+// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
+// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+
+// Mixed awaits - some with `coro_await_suspend_destroy`, some without.
+// We should still see allocation because not all awaits destroy the coroutine.
+Task test_mixed_awaits() {
+  co_await NormalAwaitable{}; // Must precede "destroy" to be reachable
+  co_await DestroyingAwaitable{};
+}
+
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv
+// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
+// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv
+// CHECK-OPTIMIZED: call{{.*}} @_Znwm
+
+
+// Check the attribute detection affects control flow.  
+Task test_attribute_detection() {
+  co_await DestroyingAwaitable{};
+  // Unreachable in OPTIMIZED, so those builds don't see an allocation.
+  co_await NormalAwaitable{};
+}
+
+// Check that we skip the normal suspend intrinsic and go directly to cleanup.
+//
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z24test_attribute_detectionv
+// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await
+// CHECK-INITIAL-NEXT: br label %cleanup5
+// CHECK-INITIAL-NOT: call{{.*}} @llvm.coro.suspend
+// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await
+// CHECK-INITIAL: call{{.*}} @llvm.coro.suspend
+// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__final
+
+// Since `co_await DestroyingAwaitable{}` gets converted into an unconditional
+// branch, the `co_await NormalAwaitable{}` is unreachable in optimized builds.
+// 
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+
+// Template awaitable with `coro_await_suspend_destroy` attribute
+template<typename T>
+struct [[clang::coro_await_suspend_destroy]] TemplateDestroyingAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(auto& promise) {}
+  void await_suspend(auto handle) {
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  void await_resume() {}
+};
+
+Task test_template_destroying_await() {
+  co_await TemplateDestroyingAwaitable<int>{};
+}
+
+// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z30test_template_destroying_awaitv
+// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
+// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index b9cf7cf9462fe..4c1f3d3a1fc66 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -63,6 +63,7 @@
 // CHECK-NEXT: Convergent (SubjectMatchRule_function)
 // CHECK-NEXT: CoroAwaitElidable (SubjectMatchRule_record)
 // CHECK-NEXT: CoroAwaitElidableArgument (SubjectMatchRule_variable_is_parameter)
+// CHECK-NEXT: CoroAwaitSuspendDestroy (SubjectMatchRule_record)
 // CHECK-NEXT: CoroDisableLifetimeBound (SubjectMatchRule_function)
 // CHECK-NEXT: CoroLifetimeBound (SubjectMatchRule_record)
 // CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record)
diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
new file mode 100644
index 0000000000000..1b48b1523bf12
--- /dev/null
+++ b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
@@ -0,0 +1,409 @@
+//===-- Integration test for `clang::co_await_suspend_destroy` ------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+// Test for the `coro_await_suspend_destroy` attribute and
+// `await_suspend_destroy` method.
+//
+// Per `AttrDocs.td`, using `coro_await_suspend_destroy` with
+// `await_suspend_destroy` should be equivalent to providing a stub
+// `await_suspend` that calls `await_suspend_destroy` and then destroys the
+// coroutine handle.
+//
+// This test logs control flow in a variety of scenarios (controlled by
+// `test_toggles`), and checks that the execution traces are identical for
+// awaiters with/without the attribute. We currently test all combinations of
+// error injection points to ensure behavioral equivalence.
+//
+// In contrast to Clang `lit` tests, this makes it easy to verify non-divergence
+// of functional behavior of the entire coroutine across many scenarios,
+// including exception handling, early returns, and mixed usage with legacy
+// awaitables.
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17
+
+#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
+#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY [[clang::coro_await_suspend_destroy]]
+#else
+#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY
+#endif
+
+#include <cassert>
+#include <coroutine>
+#include <exception>
+#include <iostream>
+#include <memory>
+#include <optional>
+#include <string>
+
+struct my_err : std::exception {};
+
+enum test_toggles {
+  throw_in_convert_optional_wrapper = 0,
+  throw_in_return_value,
+  throw_in_await_resume,
+  throw_in_await_suspend_destroy,
+  dynamic_short_circuit,          // Does not apply to `..._shortcircuits_to_empty` tests
+  largest = dynamic_short_circuit // for array in `test_driver`
+};
+
+enum test_event {
+  unset = 0,
+  // Besides events, we also log various integers between 1 and 9999 that
+  // disambiguate different awaiters, or represent different return values.
+  convert_optional_wrapper = 10000,
+  destroy_return_object,
+  destroy_promise,
+  get_return_object,
+  initial_suspend,
+  final_suspend,
+  return_value,
+  throw_return_value,
+  unhandled_exception,
+  await_ready,
+  await_resume,
+  destroy_optional_awaitable,
+  throw_await_resume,
+  await_suspend_destroy,
+  throw_await_suspend_destroy,
+  await_suspend,
+  coro_catch,
+  throw_convert_optional_wrapper,
+};
+
+struct test_driver {
+  static constexpr int max_events = 1000;
+
+  bool toggles_[test_toggles::largest + 1] = {};
+  int events_[max_events]                  = {};
+  int cur_event_                           = 0;
+
+  bool toggles(test_toggles toggle) const { return toggles_[toggle]; }
+  void log(auto&&... events) {
+    for (auto event : {static_cast<int>(events)...}) {
+      assert(cur_event_ < max_events);
+      events_[cur_event_++] = event;
+    }
+  }
+};
+
+// `optional_wrapper` exists since `get_return_object()` can't return
+// `std::optional` directly. C++ coroutines have a fundamental timing mismatch
+// between when the return object is created and when the value is available:
+//
+// 1) Early (coroutine startup): `get_return_object()` is called and must return
+//    something immediately.
+// 2) Later (when `co_return` executes): `return_value(T)` is called with the
+//    actual value.
+// 3) Issue: If `get_return_object()` returns the storage, it's empty when
+//    returned, and writing to it later cannot affect the already-returned copy.
+template <typename T>
+struct optional_wrapper {
+  test_driver& driver_;
+  std::optional<T> storage_;
+  std::optional<T>*& pointer_;
+  optional_wrapper(test_driver& driver, std::optional<T>*& p) : driver_(driver), pointer_(p) { pointer_ = &storage_; }
+  operator std::optional<T>() {
+    if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) {
+      driver_.log(test_event::throw_convert_optional_wrapper);
+      throw my_err();
+    }
+    driver_.log(test_event::convert_optional_wrapper);
+    return std::move(storage_);
+  }
+  ~optional_wrapper() { driver_.log(test_event::destroy_return_object); }
+};
+
+// Make `std::optional` a coroutine
+template <typename T, typename... Args>
+struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> {
+  struct promise_type {
+    std::optional<T>* storagePtr_ = nullptr;
+    test_driver& driver_;
+
+    promise_type(test_driver& driver, auto&&...) : driver_(driver) {}
+    ~promise_type() { driver_.log(test_event::destroy_promise); }
+    optional_wrapper<T> get_return_object() {
+      driver_.log(test_event::get_return_object);
+      return optional_wrapper<T>(driver_, storagePtr_);
+    }
+    std::suspend_never initial_suspend() const noexcept {
+      driver_.log(test_event::initial_suspend);
+      return {};
+    }
+    std::suspend_never final_suspend() const noexcept {
+      driver_.log(test_event::final_suspend);
+      return {};
+    }
+    void return_value(T value) {
+      driver_.log(test_event::return_value, value);
+      if (driver_.toggles(test_toggles::throw_in_return_value)) {
+        driver_.log(test_event::throw_return_value);
+        throw my_err();
+      }
+      *storagePtr_ = std::move(value);
+    }
+    void unhandled_exception() {
+      // Leave `*storagePtr_` empty to represent error
+      driver_.log(test_event::unhandled_exception);
+    }
+  };
+};
+
+template <typename T, bool HasAttr>
+struct base_optional_awaitable {
+  test_driver& driver_;
+  int id_;
+  std::optional<T> opt_;
+
+  ~base_optional_awaitable() { driver_.log(test_event::destroy_optional_awaitable, id_); }
+
+  bool await_ready() const noexcept {
+    driver_.log(test_event::await_ready, id_);
+    return opt_.has_value();
+  }
+  T await_resume() {
+    if (driver_.toggles(test_toggles::throw_in_await_resume)) {
+      driver_.log(test_event::throw_await_resume, id_);
+      throw my_err();
+    }
+    driver_.log(test_event::await_resume, id_);
+    return std::move(opt_).value();
+  }
+  void await_suspend_destroy(auto& promise) {
+#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
+    if constexpr (HasAttr) {
+      // This is just here so that old & new events compare exactly equal.
+      driver_.log(test_event::await_suspend);
+    }
+#endif
+    assert(promise.storagePtr_);
+    if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) {
+      driver_.log(test_event::throw_await_suspend_destroy, id_);
+      throw my_err();
+    }
+    driver_.log(test_event::await_suspend_destroy, id_);
+  }
+  void await_suspend(auto handle) {
+    driver_.log(test_event::await_suspend);
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+};
+
+template <typename T>
+struct old_optional_awaitable : base_optional_awaitable<T, false> {};
+
+template <typename T>
+struct ATTR_CORO_AWAIT_SUSPEND_DESTROY new_optional_awaitable : base_optional_awaitable<T, true> {};
+
+void enumerate_toggles(auto lambda) {
+  // Generate all combinations of toggle values
+  for (int mask = 0; mask <= (1 << (test_toggles::largest + 1)) - 1; ++mask) {
+    test_driver driver;
+    for (int i = 0; i <= test_toggles::largest; ++i) {
+      driver.toggles_[i] = (mask & (1 << i)) != 0;
+    }
+    lambda(driver);
+  }
+}
+
+template <typename T>
+void check_coro_with_driver_for(auto coro_fn) {
+  enumerate_toggles([&](const test_driver& driver) {
+    auto old_driver = driver;
+    std::optional<T> old_res;
+    bool old_threw = false;
+    try {
+      old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver);
+    } catch (const my_err&) {
+      old_threw = true;
+    }
+    auto new_driver = driver;
+    std::optional<T> new_res;
+    bool new_threw = false;
+    try {
+      new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver);
+    } catch (const my_err&) {
+      new_threw = true;
+    }
+
+    // Print toggle values for debugging
+    std::string toggle_info = "Toggles: ";
+    for (int i = 0; i <= test_toggles::largest; ++i) {
+      if (driver.toggles_[i]) {
+        toggle_info += std::to_string(i) + " ";
+      }
+    }
+    toggle_info += "\n";
+    std::cerr << toggle_info.c_str() << std::endl;
+
+    assert(old_threw == new_threw);
+    assert(old_res == new_res);
+
+    // Compare events arrays directly using cur_event_ and indices
+    assert(old_driver.cur_event_ == new_driver.cur_event_);
+    for (int i = 0; i < old_driver.cur_event_; ++i) {
+      assert(old_driver.events_[i] == new_driver.events_[i]);
+    }
+  });
+}
+
+// Move-only, non-nullable type that quacks like int but stores a
+// heap-allocated int. Used to exercise the machinery with a nontrivial type.
+class heap_int {
+private:
+  std::unique_ptr<int> ptr_;
+
+public:
+  explicit heap_int(int value) : ptr_(std::make_unique<int>(value)) {}
+
+  heap_int operator+(const heap_int& other) const { return heap_int(*ptr_ + *other.ptr_); }
+
+  bool operator==(const heap_int& other) const { return *ptr_ == *other.ptr_; }
+
+  /*implicit*/ operator int() const { return *ptr_; }
+};
+
+void check_coro_with_driver(auto coro_fn) {
+  check_coro_with_driver_for<int>(coro_fn);
+  check_coro_with_driver_for<heap_int>(coro_fn);
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) {
+  T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
+  co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
+  co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
+}
+
+void test_coro_shortcircuits_to_empty() {
+  std::cerr << "test_coro_shortcircuits_to_empty" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_shortcircuits_to_empty<Awaitable, T>(driver);
+  });
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_simple_await(test_driver& driver) {
+  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
+      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
+}
+
+void test_coro_simple_await() {
+  std::cerr << "test_coro_simple_await" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_simple_await<Awaitable, T>(driver);
+  });
+}
+
+// The next pair of tests checks that adding a `try-catch` in the coroutine
+// doesn't affect control flow when `await_suspend_destroy` awaiters are in use.
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) {
+  try {
+    T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
+    co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
+    co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
+  } catch (...) {
+    driver.log(test_event::coro_catch);
+    throw;
+  }
+}
+
+void test_coro_catching_shortcircuits_to_empty() {
+  std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver);
+  });
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> coro_catching_simple_await(test_driver& driver) {
+  try {
+    co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
+        co_await Awaitable{
+            driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
+  } catch (...) {
+    driver.log(test_event::coro_catch);
+    throw;
+  }
+}
+
+void test_coro_catching_simple_await() {
+  std::cerr << "test_coro_catching_simple_await" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return coro_catching_simple_await<Awaitable, T>(driver);
+  });
+}
+
+// The next pair of tests shows that the `await_suspend_destroy` code path works
+// correctly, even if it's mixed in a coroutine with legacy awaitables.
+
+template <typename Awaitable, typename T>
+std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) {
+  T n  = co_await Awaitable{driver, 1, std::optional<T>{11}};
+  T n2 = co_await old_optional_awaitable<T>{driver, 2, std::optional<T>{22}};
+  co_await Awaitable{driver, 3, std::optional<T>{}}; // return early!
+  co_return n + n2 + co_await Awaitable{driver, 4, std::optional<T>{44}};
+}
+
+void test_noneliding_coro_shortcircuits_to_empty() {
+  std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver);
+  });
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> noneliding_coro_simple_await(test_driver& driver) {
+  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
+      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}} +
+      co_await old_optional_awaitable<T>{driver, 3, std::optional<T>{33}};
+}
+
+void test_noneliding_coro_simple_await() {
+  std::cerr << "test_noneliding_coro_simple_await" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return noneliding_coro_simple_await<Awaitable, T>(driver);
+  });
+}
+
+// Test nested coroutines (coroutines that await other coroutines)
+
+template <typename Awaitable, typename T>
+std::optional<T> inner_coro(test_driver& driver, int base_id) {
+  co_return co_await Awaitable{driver, base_id, std::optional<T>{100}} +
+      co_await Awaitable{
+          driver, base_id + 1, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{200}};
+}
+
+template <typename Awaitable, typename T>
+std::optional<T> outer_coro(test_driver& driver) {
+  T result1 = co_await Awaitable{driver, 1, inner_coro<Awaitable, T>(driver, 10)};
+  T result2 = co_await Awaitable{driver, 2, inner_coro<Awaitable, T>(driver, 20)};
+  co_return result1 + result2;
+}
+
+void test_nested_coroutines() {
+  std::cerr << "test_nested_coroutines" << std::endl;
+  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
+    return outer_coro<Awaitable, T>(driver);
+  });
+}
+
+int main(int, char**) {
+  test_coro_shortcircuits_to_empty();
+  test_coro_simple_await();
+  test_coro_catching_shortcircuits_to_empty();
+  test_coro_catching_simple_await();
+  test_noneliding_coro_shortcircuits_to_empty();
+  test_noneliding_coro_simple_await();
+  test_nested_coroutines();
+  return 0;
+}

>From 8bf453efbeef444a12fee6829430b9ec26134b27 Mon Sep 17 00:00:00 2001
From: lesha <lesha at meta.com>
Date: Thu, 7 Aug 2025 23:38:21 -0700
Subject: [PATCH 08/15] Fix CI

---
 clang/include/clang/Basic/AttrDocs.td         | 32 ++++++-------
 .../coro_await_suspend_destroy.pass.cpp       | 48 +++++++++++++++++--
 2 files changed, 60 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index e2360bd48b0f7..6215419901b52 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9405,12 +9405,12 @@ flow as):
   }
 
 The benefits of this attribute are:
-  - **Avoid heap allocations for coro frames**: Allocating short-circuiting
-    coros on the stack makes code more predictable under memory pressure.
-    Without this attribute, LLVM cannot elide heap allocation even when all
-    awaiters are short-circuiting.
-  - **Performance**: Significantly faster execution and smaller code size.
-  - **Build time**: Faster compilation due to less IR being generated.
+- **Avoid heap allocations for coro frames**: Allocating short-circuiting
+  coros on the stack makes code more predictable under memory pressure.
+  Without this attribute, LLVM cannot elide heap allocation even when all
+  awaiters are short-circuiting.
+- **Performance**: Significantly faster execution and smaller code size.
+- **Build time**: Faster compilation due to less IR being generated.
 
 Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
 further improve optimization.
@@ -9436,16 +9436,16 @@ Here is a toy example of a portable short-circuiting awaiter:
 
 If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
 then the coroutine optimizes more like a plain function, with 2 caveats:
-  - **Behavior:** The coroutine promise provides an implicit exception boundary
-    (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
-    This exception handling behavior is usually desirable in robust,
-    return-value-oriented programs that need short-circuiting coroutines.
-    Otherwise, the promise can always re-throw.
-  - **Speed:** As of 2025, there is still an optimization gap between a
-    realistic short-circuiting coro, and the equivalent (but much more verbose)
-    function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
-    improvement is to also elide trivial suspends like `std::suspend_never`, in
-    order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
+- **Behavior:** The coroutine promise provides an implicit exception boundary
+  (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
+  This exception handling behavior is usually desirable in robust,
+  return-value-oriented programs that need short-circuiting coroutines.
+  Otherwise, the promise can always re-throw.
+- **Speed:** As of 2025, there is still an optimization gap between a
+  realistic short-circuiting coro, and the equivalent (but much more verbose)
+  function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
+  improvement is to also elide trivial suspends like `std::suspend_never`, in
+  order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
 
 }];
 }
diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
index 1b48b1523bf12..9da8ba530edf3 100644
--- a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
+++ b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
@@ -40,6 +40,14 @@
 #include <optional>
 #include <string>
 
+#define DEBUG_LOG 0 // Logs break no-localization CI, set to 1 if needed
+
+#ifndef TEST_HAS_NO_EXCEPTIONS
+#  define THROW(_ex) throw _ex;
+#else
+#  define THROW(_ex)
+#endif
+
 struct my_err : std::exception {};
 
 enum test_toggles {
@@ -110,7 +118,7 @@ struct optional_wrapper {
   operator std::optional<T>() {
     if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) {
       driver_.log(test_event::throw_convert_optional_wrapper);
-      throw my_err();
+      THROW(my_err());
     }
     driver_.log(test_event::convert_optional_wrapper);
     return std::move(storage_);
@@ -143,7 +151,7 @@ struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> {
       driver_.log(test_event::return_value, value);
       if (driver_.toggles(test_toggles::throw_in_return_value)) {
         driver_.log(test_event::throw_return_value);
-        throw my_err();
+        THROW(my_err());
       }
       *storagePtr_ = std::move(value);
     }
@@ -169,7 +177,7 @@ struct base_optional_awaitable {
   T await_resume() {
     if (driver_.toggles(test_toggles::throw_in_await_resume)) {
       driver_.log(test_event::throw_await_resume, id_);
-      throw my_err();
+      THROW(my_err());
     }
     driver_.log(test_event::await_resume, id_);
     return std::move(opt_).value();
@@ -184,7 +192,7 @@ struct base_optional_awaitable {
     assert(promise.storagePtr_);
     if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) {
       driver_.log(test_event::throw_await_suspend_destroy, id_);
-      throw my_err();
+      THROW(my_err());
     }
     driver_.log(test_event::await_suspend_destroy, id_);
   }
@@ -218,20 +226,29 @@ void check_coro_with_driver_for(auto coro_fn) {
     auto old_driver = driver;
     std::optional<T> old_res;
     bool old_threw = false;
+#ifndef TEST_HAS_NO_EXCEPTIONS
     try {
+#endif
       old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver);
+#ifndef TEST_HAS_NO_EXCEPTIONS
     } catch (const my_err&) {
       old_threw = true;
     }
+#endif
     auto new_driver = driver;
     std::optional<T> new_res;
     bool new_threw = false;
+#ifndef TEST_HAS_NO_EXCEPTIONS
     try {
+#endif
       new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver);
+#ifndef TEST_HAS_NO_EXCEPTIONS
     } catch (const my_err&) {
       new_threw = true;
     }
+#endif
 
+#if DEBUG_LOG
     // Print toggle values for debugging
     std::string toggle_info = "Toggles: ";
     for (int i = 0; i <= test_toggles::largest; ++i) {
@@ -241,6 +258,7 @@ void check_coro_with_driver_for(auto coro_fn) {
     }
     toggle_info += "\n";
     std::cerr << toggle_info.c_str() << std::endl;
+#endif
 
     assert(old_threw == new_threw);
     assert(old_res == new_res);
@@ -282,7 +300,9 @@ std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) {
 }
 
 void test_coro_shortcircuits_to_empty() {
+#if DEBUG_LOG
   std::cerr << "test_coro_shortcircuits_to_empty" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_shortcircuits_to_empty<Awaitable, T>(driver);
   });
@@ -295,7 +315,9 @@ std::optional<T> coro_simple_await(test_driver& driver) {
 }
 
 void test_coro_simple_await() {
+#if DEBUG_LOG
   std::cerr << "test_coro_simple_await" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_simple_await<Awaitable, T>(driver);
   });
@@ -306,18 +328,24 @@ void test_coro_simple_await() {
 
 template <typename Awaitable, typename T>
 std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) {
+#ifndef TEST_HAS_NO_EXCEPTIONS
   try {
+#endif
     T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
     co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
     co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
+#ifndef TEST_HAS_NO_EXCEPTIONS
   } catch (...) {
     driver.log(test_event::coro_catch);
     throw;
   }
+#endif
 }
 
 void test_coro_catching_shortcircuits_to_empty() {
+#if DEBUG_LOG
   std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver);
   });
@@ -325,18 +353,24 @@ void test_coro_catching_shortcircuits_to_empty() {
 
 template <typename Awaitable, typename T>
 std::optional<T> coro_catching_simple_await(test_driver& driver) {
+#ifndef TEST_HAS_NO_EXCEPTIONS
   try {
+#endif
     co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
         co_await Awaitable{
             driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
+#ifndef TEST_HAS_NO_EXCEPTIONS
   } catch (...) {
     driver.log(test_event::coro_catch);
     throw;
   }
+#endif
 }
 
 void test_coro_catching_simple_await() {
+#if DEBUG_LOG
   std::cerr << "test_coro_catching_simple_await" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return coro_catching_simple_await<Awaitable, T>(driver);
   });
@@ -354,7 +388,9 @@ std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) {
 }
 
 void test_noneliding_coro_shortcircuits_to_empty() {
+#if DEBUG_LOG
   std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver);
   });
@@ -368,7 +404,9 @@ std::optional<T> noneliding_coro_simple_await(test_driver& driver) {
 }
 
 void test_noneliding_coro_simple_await() {
+#if DEBUG_LOG
   std::cerr << "test_noneliding_coro_simple_await" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return noneliding_coro_simple_await<Awaitable, T>(driver);
   });
@@ -391,7 +429,9 @@ std::optional<T> outer_coro(test_driver& driver) {
 }
 
 void test_nested_coroutines() {
+#if DEBUG_LOG
   std::cerr << "test_nested_coroutines" << std::endl;
+#endif
   check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
     return outer_coro<Awaitable, T>(driver);
   });

>From 811501d741c096fd5ffdc730cac2dddfceb89786 Mon Sep 17 00:00:00 2001
From: lesha <lesha at meta.com>
Date: Fri, 8 Aug 2025 00:12:00 -0700
Subject: [PATCH 09/15] Improve doc formatting

---
 clang/include/clang/Basic/AttrDocs.td | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 6215419901b52..faf0081555cfb 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9405,11 +9405,14 @@ flow as):
   }
 
 The benefits of this attribute are:
+
 - **Avoid heap allocations for coro frames**: Allocating short-circuiting
   coros on the stack makes code more predictable under memory pressure.
   Without this attribute, LLVM cannot elide heap allocation even when all
   awaiters are short-circuiting.
+
 - **Performance**: Significantly faster execution and smaller code size.
+
 - **Build time**: Faster compilation due to less IR being generated.
 
 Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
@@ -9436,11 +9439,13 @@ Here is a toy example of a portable short-circuiting awaiter:
 
 If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
 then the coroutine optimizes more like a plain function, with 2 caveats:
+
 - **Behavior:** The coroutine promise provides an implicit exception boundary
   (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
   This exception handling behavior is usually desirable in robust,
   return-value-oriented programs that need short-circuiting coroutines.
   Otherwise, the promise can always re-throw.
+
 - **Speed:** As of 2025, there is still an optimization gap between a
   realistic short-circuiting coro, and the equivalent (but much more verbose)
   function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for

>From 63cf306908ba4ff04d413a3a9353b1f58182cb90 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Fri, 8 Aug 2025 22:02:07 -0700
Subject: [PATCH 10/15] Rework the AttrDocs.td addition based on feedback

---
 clang/include/clang/Basic/AttrDocs.td | 60 ++++++++++++++++-----------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index faf0081555cfb..4c00bc31e9efd 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9371,10 +9371,10 @@ The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
 coroutine awaiter type.  When this attribute is present, the awaiter must
 implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
 returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
-called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
-method.  The coroutine being suspended will then be immediately destroyed.
+called directly.  The coroutine being suspended will then be immediately
+destroyed.
 
-Logically, the new behavior is equivalent to this standard code:
+The new behavior is equivalent to this standard code:
 
 .. code-block:: c++
 
@@ -9389,10 +9389,24 @@ stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
 support, the awaiter will behave nearly identically, with the only difference
 being heap allocation instead of stack allocation for the coroutine frame.
 
-This attribute exists to optimize short-circuiting coroutines—coroutines whose
-suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
-short-circuiting (like a ``co_await`` that can be expressed in regular control
-flow as):
+This attribute helps optimize short-circuiting coroutines.
+
+A short-circuiting coroutine is one where every ``co_await`` or ``co_yield``
+either immediately produces a value, or exits the coroutine.  In other words,
+they use coroutine syntax to concisely branch out of a synchronous function. 
+Here are close analogs in other languages:
+
+- Rust has ``Result<T>`` and a ``?`` operator to unpack it, while
+  ``folly::result<T>`` is a C++ short-circuiting coroutine, with ``co_await``
+  acting just like ``?``.
+
+- Haskell has ``Maybe`` & ``Error`` monads.  A short-circuiting ``co_await``
+  loosely corresponds to the monadic ``>>=``, whereas a short-circuiting
+  ``std::optional`` coro would be an exact analog of ``Maybe``.
+
+The C++ implementation relies on short-circuiting awaiters.  These either
+resume synchronously, or immediately destroy the awaiting coroutine and return
+control to the parent:
 
 .. code-block:: c++
 
@@ -9404,7 +9418,20 @@ flow as):
     return /* value representing the "execution short-circuited" outcome */;
   }
 
-The benefits of this attribute are:
+Then, a short-ciruiting coroutine is one where all the suspend points are
+either (i) trivial (like ``std::suspend_never``), or (ii) short-circuiting.
+
+Although the coroutine machinery makes them harder to optimize, logically,
+short-circuiting coroutines are like syntax sugar for regular functions where:
+
+- `co_await` allows expressions to return early.
+
+- `unhandled_exception()` lets the coroutine promise type wrap the function
+  body in an implicit try-catch.  This mandatory exception boundary behavior
+  can be desirable in robust, return-value-oriented programs that benefit from
+  short-circuiting coroutines.  If not, the promise can always re-throw.
+
+This attribute improves short-circuiting coroutines in a few ways:
 
 - **Avoid heap allocations for coro frames**: Allocating short-circuiting
   coros on the stack makes code more predictable under memory pressure.
@@ -9423,7 +9450,7 @@ Here is a toy example of a portable short-circuiting awaiter:
 .. code-block:: c++
 
   template <typename T>
-  struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
+  struct [[clang::coro_await_suspend_destroy]] optional_awaiter {
     std::optional<T> opt_;
     bool await_ready() const noexcept { return opt_.has_value(); }
     T await_resume() { return std::move(opt_).value(); }
@@ -9437,21 +9464,6 @@ Here is a toy example of a portable short-circuiting awaiter:
     }
   };
 
-If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
-then the coroutine optimizes more like a plain function, with 2 caveats:
-
-- **Behavior:** The coroutine promise provides an implicit exception boundary
-  (as if wrapping the function in ``try {} catch { unhandled_exception(); }``).
-  This exception handling behavior is usually desirable in robust,
-  return-value-oriented programs that need short-circuiting coroutines.
-  Otherwise, the promise can always re-throw.
-
-- **Speed:** As of 2025, there is still an optimization gap between a
-  realistic short-circuiting coro, and the equivalent (but much more verbose)
-  function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
-  improvement is to also elide trivial suspends like `std::suspend_never`, in
-  order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.
-
 }];
 }
 

>From 2b2748c98c24790237c75805a7defa98960181f3 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Sat, 9 Aug 2025 00:23:37 -0700
Subject: [PATCH 11/15] Split out the `libcxx/test` change into PR #152820

---
 .../coro_await_suspend_destroy.pass.cpp       | 449 ------------------
 1 file changed, 449 deletions(-)
 delete mode 100644 libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp

diff --git a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp b/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
deleted file mode 100644
index 9da8ba530edf3..0000000000000
--- a/libcxx/test/std/language.support/support.coroutines/end.to.end/coro_await_suspend_destroy.pass.cpp
+++ /dev/null
@@ -1,449 +0,0 @@
-//===-- Integration test for `clang::co_await_suspend_destroy` ------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-// Test for the `coro_await_suspend_destroy` attribute and
-// `await_suspend_destroy` method.
-//
-// Per `AttrDocs.td`, using `coro_await_suspend_destroy` with
-// `await_suspend_destroy` should be equivalent to providing a stub
-// `await_suspend` that calls `await_suspend_destroy` and then destroys the
-// coroutine handle.
-//
-// This test logs control flow in a variety of scenarios (controlled by
-// `test_toggles`), and checks that the execution traces are identical for
-// awaiters with/without the attribute. We currently test all combinations of
-// error injection points to ensure behavioral equivalence.
-//
-// In contrast to Clang `lit` tests, this makes it easy to verify non-divergence
-// of functional behavior of the entire coroutine across many scenarios,
-// including exception handling, early returns, and mixed usage with legacy
-// awaitables.
-//
-//===----------------------------------------------------------------------===//
-
-// UNSUPPORTED: c++03, c++11, c++14, c++17
-
-#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
-#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY [[clang::coro_await_suspend_destroy]]
-#else
-#  define ATTR_CORO_AWAIT_SUSPEND_DESTROY
-#endif
-
-#include <cassert>
-#include <coroutine>
-#include <exception>
-#include <iostream>
-#include <memory>
-#include <optional>
-#include <string>
-
-#define DEBUG_LOG 0 // Logs break no-localization CI, set to 1 if needed
-
-#ifndef TEST_HAS_NO_EXCEPTIONS
-#  define THROW(_ex) throw _ex;
-#else
-#  define THROW(_ex)
-#endif
-
-struct my_err : std::exception {};
-
-enum test_toggles {
-  throw_in_convert_optional_wrapper = 0,
-  throw_in_return_value,
-  throw_in_await_resume,
-  throw_in_await_suspend_destroy,
-  dynamic_short_circuit,          // Does not apply to `..._shortcircuits_to_empty` tests
-  largest = dynamic_short_circuit // for array in `test_driver`
-};
-
-enum test_event {
-  unset = 0,
-  // Besides events, we also log various integers between 1 and 9999 that
-  // disambiguate different awaiters, or represent different return values.
-  convert_optional_wrapper = 10000,
-  destroy_return_object,
-  destroy_promise,
-  get_return_object,
-  initial_suspend,
-  final_suspend,
-  return_value,
-  throw_return_value,
-  unhandled_exception,
-  await_ready,
-  await_resume,
-  destroy_optional_awaitable,
-  throw_await_resume,
-  await_suspend_destroy,
-  throw_await_suspend_destroy,
-  await_suspend,
-  coro_catch,
-  throw_convert_optional_wrapper,
-};
-
-struct test_driver {
-  static constexpr int max_events = 1000;
-
-  bool toggles_[test_toggles::largest + 1] = {};
-  int events_[max_events]                  = {};
-  int cur_event_                           = 0;
-
-  bool toggles(test_toggles toggle) const { return toggles_[toggle]; }
-  void log(auto&&... events) {
-    for (auto event : {static_cast<int>(events)...}) {
-      assert(cur_event_ < max_events);
-      events_[cur_event_++] = event;
-    }
-  }
-};
-
-// `optional_wrapper` exists since `get_return_object()` can't return
-// `std::optional` directly. C++ coroutines have a fundamental timing mismatch
-// between when the return object is created and when the value is available:
-//
-// 1) Early (coroutine startup): `get_return_object()` is called and must return
-//    something immediately.
-// 2) Later (when `co_return` executes): `return_value(T)` is called with the
-//    actual value.
-// 3) Issue: If `get_return_object()` returns the storage, it's empty when
-//    returned, and writing to it later cannot affect the already-returned copy.
-template <typename T>
-struct optional_wrapper {
-  test_driver& driver_;
-  std::optional<T> storage_;
-  std::optional<T>*& pointer_;
-  optional_wrapper(test_driver& driver, std::optional<T>*& p) : driver_(driver), pointer_(p) { pointer_ = &storage_; }
-  operator std::optional<T>() {
-    if (driver_.toggles(test_toggles::throw_in_convert_optional_wrapper)) {
-      driver_.log(test_event::throw_convert_optional_wrapper);
-      THROW(my_err());
-    }
-    driver_.log(test_event::convert_optional_wrapper);
-    return std::move(storage_);
-  }
-  ~optional_wrapper() { driver_.log(test_event::destroy_return_object); }
-};
-
-// Make `std::optional` a coroutine
-template <typename T, typename... Args>
-struct std::coroutine_traits<std::optional<T>, test_driver&, Args...> {
-  struct promise_type {
-    std::optional<T>* storagePtr_ = nullptr;
-    test_driver& driver_;
-
-    promise_type(test_driver& driver, auto&&...) : driver_(driver) {}
-    ~promise_type() { driver_.log(test_event::destroy_promise); }
-    optional_wrapper<T> get_return_object() {
-      driver_.log(test_event::get_return_object);
-      return optional_wrapper<T>(driver_, storagePtr_);
-    }
-    std::suspend_never initial_suspend() const noexcept {
-      driver_.log(test_event::initial_suspend);
-      return {};
-    }
-    std::suspend_never final_suspend() const noexcept {
-      driver_.log(test_event::final_suspend);
-      return {};
-    }
-    void return_value(T value) {
-      driver_.log(test_event::return_value, value);
-      if (driver_.toggles(test_toggles::throw_in_return_value)) {
-        driver_.log(test_event::throw_return_value);
-        THROW(my_err());
-      }
-      *storagePtr_ = std::move(value);
-    }
-    void unhandled_exception() {
-      // Leave `*storagePtr_` empty to represent error
-      driver_.log(test_event::unhandled_exception);
-    }
-  };
-};
-
-template <typename T, bool HasAttr>
-struct base_optional_awaitable {
-  test_driver& driver_;
-  int id_;
-  std::optional<T> opt_;
-
-  ~base_optional_awaitable() { driver_.log(test_event::destroy_optional_awaitable, id_); }
-
-  bool await_ready() const noexcept {
-    driver_.log(test_event::await_ready, id_);
-    return opt_.has_value();
-  }
-  T await_resume() {
-    if (driver_.toggles(test_toggles::throw_in_await_resume)) {
-      driver_.log(test_event::throw_await_resume, id_);
-      THROW(my_err());
-    }
-    driver_.log(test_event::await_resume, id_);
-    return std::move(opt_).value();
-  }
-  void await_suspend_destroy(auto& promise) {
-#if __has_cpp_attribute(clang::coro_await_suspend_destroy)
-    if constexpr (HasAttr) {
-      // This is just here so that old & new events compare exactly equal.
-      driver_.log(test_event::await_suspend);
-    }
-#endif
-    assert(promise.storagePtr_);
-    if (driver_.toggles(test_toggles::throw_in_await_suspend_destroy)) {
-      driver_.log(test_event::throw_await_suspend_destroy, id_);
-      THROW(my_err());
-    }
-    driver_.log(test_event::await_suspend_destroy, id_);
-  }
-  void await_suspend(auto handle) {
-    driver_.log(test_event::await_suspend);
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-};
-
-template <typename T>
-struct old_optional_awaitable : base_optional_awaitable<T, false> {};
-
-template <typename T>
-struct ATTR_CORO_AWAIT_SUSPEND_DESTROY new_optional_awaitable : base_optional_awaitable<T, true> {};
-
-void enumerate_toggles(auto lambda) {
-  // Generate all combinations of toggle values
-  for (int mask = 0; mask <= (1 << (test_toggles::largest + 1)) - 1; ++mask) {
-    test_driver driver;
-    for (int i = 0; i <= test_toggles::largest; ++i) {
-      driver.toggles_[i] = (mask & (1 << i)) != 0;
-    }
-    lambda(driver);
-  }
-}
-
-template <typename T>
-void check_coro_with_driver_for(auto coro_fn) {
-  enumerate_toggles([&](const test_driver& driver) {
-    auto old_driver = driver;
-    std::optional<T> old_res;
-    bool old_threw = false;
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    try {
-#endif
-      old_res = coro_fn.template operator()<old_optional_awaitable<T>, T>(old_driver);
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    } catch (const my_err&) {
-      old_threw = true;
-    }
-#endif
-    auto new_driver = driver;
-    std::optional<T> new_res;
-    bool new_threw = false;
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    try {
-#endif
-      new_res = coro_fn.template operator()<new_optional_awaitable<T>, T>(new_driver);
-#ifndef TEST_HAS_NO_EXCEPTIONS
-    } catch (const my_err&) {
-      new_threw = true;
-    }
-#endif
-
-#if DEBUG_LOG
-    // Print toggle values for debugging
-    std::string toggle_info = "Toggles: ";
-    for (int i = 0; i <= test_toggles::largest; ++i) {
-      if (driver.toggles_[i]) {
-        toggle_info += std::to_string(i) + " ";
-      }
-    }
-    toggle_info += "\n";
-    std::cerr << toggle_info.c_str() << std::endl;
-#endif
-
-    assert(old_threw == new_threw);
-    assert(old_res == new_res);
-
-    // Compare events arrays directly using cur_event_ and indices
-    assert(old_driver.cur_event_ == new_driver.cur_event_);
-    for (int i = 0; i < old_driver.cur_event_; ++i) {
-      assert(old_driver.events_[i] == new_driver.events_[i]);
-    }
-  });
-}
-
-// Move-only, non-nullable type that quacks like int but stores a
-// heap-allocated int. Used to exercise the machinery with a nontrivial type.
-class heap_int {
-private:
-  std::unique_ptr<int> ptr_;
-
-public:
-  explicit heap_int(int value) : ptr_(std::make_unique<int>(value)) {}
-
-  heap_int operator+(const heap_int& other) const { return heap_int(*ptr_ + *other.ptr_); }
-
-  bool operator==(const heap_int& other) const { return *ptr_ == *other.ptr_; }
-
-  /*implicit*/ operator int() const { return *ptr_; }
-};
-
-void check_coro_with_driver(auto coro_fn) {
-  check_coro_with_driver_for<int>(coro_fn);
-  check_coro_with_driver_for<heap_int>(coro_fn);
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_shortcircuits_to_empty(test_driver& driver) {
-  T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
-  co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
-  co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
-}
-
-void test_coro_shortcircuits_to_empty() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_shortcircuits_to_empty" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_shortcircuits_to_empty<Awaitable, T>(driver);
-  });
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_simple_await(test_driver& driver) {
-  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
-      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
-}
-
-void test_coro_simple_await() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_simple_await" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_simple_await<Awaitable, T>(driver);
-  });
-}
-
-// The next pair of tests checks that adding a `try-catch` in the coroutine
-// doesn't affect control flow when `await_suspend_destroy` awaiters are in use.
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_catching_shortcircuits_to_empty(test_driver& driver) {
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  try {
-#endif
-    T n = co_await Awaitable{driver, 1, std::optional<T>{11}};
-    co_await Awaitable{driver, 2, std::optional<T>{}}; // return early!
-    co_return n + co_await Awaitable{driver, 3, std::optional<T>{22}};
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  } catch (...) {
-    driver.log(test_event::coro_catch);
-    throw;
-  }
-#endif
-}
-
-void test_coro_catching_shortcircuits_to_empty() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_catching_shortcircuits_to_empty" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_catching_shortcircuits_to_empty<Awaitable, T>(driver);
-  });
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> coro_catching_simple_await(test_driver& driver) {
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  try {
-#endif
-    co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
-        co_await Awaitable{
-            driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}};
-#ifndef TEST_HAS_NO_EXCEPTIONS
-  } catch (...) {
-    driver.log(test_event::coro_catch);
-    throw;
-  }
-#endif
-}
-
-void test_coro_catching_simple_await() {
-#if DEBUG_LOG
-  std::cerr << "test_coro_catching_simple_await" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return coro_catching_simple_await<Awaitable, T>(driver);
-  });
-}
-
-// The next pair of tests shows that the `await_suspend_destroy` code path works
-// correctly, even if it's mixed in a coroutine with legacy awaitables.
-
-template <typename Awaitable, typename T>
-std::optional<T> noneliding_coro_shortcircuits_to_empty(test_driver& driver) {
-  T n  = co_await Awaitable{driver, 1, std::optional<T>{11}};
-  T n2 = co_await old_optional_awaitable<T>{driver, 2, std::optional<T>{22}};
-  co_await Awaitable{driver, 3, std::optional<T>{}}; // return early!
-  co_return n + n2 + co_await Awaitable{driver, 4, std::optional<T>{44}};
-}
-
-void test_noneliding_coro_shortcircuits_to_empty() {
-#if DEBUG_LOG
-  std::cerr << "test_noneliding_coro_shortcircuits_to_empty" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return noneliding_coro_shortcircuits_to_empty<Awaitable, T>(driver);
-  });
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> noneliding_coro_simple_await(test_driver& driver) {
-  co_return co_await Awaitable{driver, 1, std::optional<T>{11}} +
-      co_await Awaitable{driver, 2, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{22}} +
-      co_await old_optional_awaitable<T>{driver, 3, std::optional<T>{33}};
-}
-
-void test_noneliding_coro_simple_await() {
-#if DEBUG_LOG
-  std::cerr << "test_noneliding_coro_simple_await" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return noneliding_coro_simple_await<Awaitable, T>(driver);
-  });
-}
-
-// Test nested coroutines (coroutines that await other coroutines)
-
-template <typename Awaitable, typename T>
-std::optional<T> inner_coro(test_driver& driver, int base_id) {
-  co_return co_await Awaitable{driver, base_id, std::optional<T>{100}} +
-      co_await Awaitable{
-          driver, base_id + 1, driver.toggles(dynamic_short_circuit) ? std::optional<T>{} : std::optional<T>{200}};
-}
-
-template <typename Awaitable, typename T>
-std::optional<T> outer_coro(test_driver& driver) {
-  T result1 = co_await Awaitable{driver, 1, inner_coro<Awaitable, T>(driver, 10)};
-  T result2 = co_await Awaitable{driver, 2, inner_coro<Awaitable, T>(driver, 20)};
-  co_return result1 + result2;
-}
-
-void test_nested_coroutines() {
-#if DEBUG_LOG
-  std::cerr << "test_nested_coroutines" << std::endl;
-#endif
-  check_coro_with_driver([]<typename Awaitable, typename T>(test_driver& driver) {
-    return outer_coro<Awaitable, T>(driver);
-  });
-}
-
-int main(int, char**) {
-  test_coro_shortcircuits_to_empty();
-  test_coro_simple_await();
-  test_coro_catching_shortcircuits_to_empty();
-  test_coro_catching_simple_await();
-  test_noneliding_coro_shortcircuits_to_empty();
-  test_noneliding_coro_simple_await();
-  test_nested_coroutines();
-  return 0;
-}

>From cf26f6b88e0d28862c7ce2ddf65954ef03f64532 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Fri, 8 Aug 2025 19:52:17 -0700
Subject: [PATCH 12/15] Lift standard suspend flow to emitStandardAwaitSuspend;
 tweak comment.

---
 clang/lib/CodeGen/CGCoroutine.cpp | 174 ++++++++++++++++--------------
 1 file changed, 96 insertions(+), 78 deletions(-)

diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index d74bef592aa9c..883a45d2acfff 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -282,6 +282,15 @@ namespace {
 }
 
 // The simplified `await_suspend_destroy` path avoids suspend intrinsics.
+//
+// If a coro has only `await_suspend_destroy` and trivial (`suspend_never`)
+// awaiters, then subsequent passes are able to allocate its frame on-stack.
+//
+// As of 2025, there is still an optimization gap between a realistic
+// short-circuiting coro, and the equivalent plain function.  For a
+// guesstimate, expect 4-5ns per call on x86.  One idea for improvement is to
+// also elide trivial suspends like `std::suspend_never`, in order to hit the
+// `HasCoroSuspend` path in `CoroEarly.cpp`.
 static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro,
                                     llvm::Function *SuspendWrapper,
                                     llvm::Value *Awaiter, llvm::Value *Frame,
@@ -299,6 +308,89 @@ static void emitAwaitSuspendDestroy(CodeGenFunction &CGF, CGCoroData &Coro,
   CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
 }
 
+static void emitStandardAwaitSuspend(
+    CodeGenFunction &CGF, CGCoroData &Coro, CoroutineSuspendExpr const &S,
+    llvm::Function *SuspendWrapper, llvm::Value *Awaiter, llvm::Value *Frame,
+    bool AwaitSuspendCanThrow, SmallString<32> Prefix, BasicBlock *ReadyBlock,
+    AwaitKind Kind, CoroutineSuspendExpr::SuspendReturnType SuspendReturnType) {
+  auto &Builder = CGF.Builder;
+
+  CGF.CurCoro.InSuspendBlock = true;
+
+  SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
+  SuspendIntrinsicCallArgs.push_back(Awaiter);
+  SuspendIntrinsicCallArgs.push_back(Frame);
+  SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
+  BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
+
+  llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
+  auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
+  auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
+
+  llvm::Intrinsic::ID AwaitSuspendIID;
+  switch (SuspendReturnType) {
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
+    break;
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
+    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
+    break;
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
+    AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
+    break;
+  }
+
+  llvm::Function *AwaitSuspendIntrinsic = CGF.CGM.getIntrinsic(AwaitSuspendIID);
+
+  llvm::CallBase *SuspendRet = nullptr;
+  // FIXME: add call attributes?
+  if (AwaitSuspendCanThrow)
+    SuspendRet =
+        CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
+  else
+    SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
+                                             SuspendIntrinsicCallArgs);
+
+  assert(SuspendRet);
+  CGF.CurCoro.InSuspendBlock = false;
+
+  switch (SuspendReturnType) {
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
+    assert(SuspendRet->getType()->isVoidTy());
+    break;
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
+    assert(SuspendRet->getType()->isIntegerTy());
+
+    // Veto suspension if requested by bool returning await_suspend.
+    BasicBlock *RealSuspendBlock =
+        CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
+    CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
+    CGF.EmitBlock(RealSuspendBlock);
+    break;
+  }
+  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
+    assert(SuspendRet->getType()->isVoidTy());
+    break;
+  }
+  }
+
+  // Emit the suspend point.
+  const bool IsFinalSuspend = (Kind == AwaitKind::Final);
+  llvm::Function *CoroSuspend =
+      CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
+  auto *SuspendResult = Builder.CreateCall(
+      CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
+
+  // Create a switch capturing three possible continuations.
+  auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
+  Switch->addCase(Builder.getInt8(0), ReadyBlock);
+  Switch->addCase(Builder.getInt8(1), CleanupBlock);
+
+  // Emit cleanup for this suspend point.
+  CGF.EmitBlock(CleanupBlock);
+  CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+}
+
 static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
@@ -320,8 +412,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   // Otherwise, emit suspend logic.
   CGF.EmitBlock(SuspendBlock);
 
-  auto &Builder = CGF.Builder;
-
   auto SuspendWrapper = CodeGenFunction(CGF.CGM).generateAwaitSuspendWrapper(
       CGF.CurFn->getName(), Prefix, S);
 
@@ -343,82 +433,9 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
     emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame,
                             AwaitSuspendCanThrow);
   } else { // Normal suspend path -- can actually suspend, uses intrinsics
-    CGF.CurCoro.InSuspendBlock = true;
-
-    SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
-    SuspendIntrinsicCallArgs.push_back(Awaiter);
-    SuspendIntrinsicCallArgs.push_back(Frame);
-    SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
-    BasicBlock *CleanupBlock = CGF.createBasicBlock(Prefix + Twine(".cleanup"));
-
-    llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
-    auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
-    auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
-
-    llvm::Intrinsic::ID AwaitSuspendIID;
-
-    switch (SuspendReturnType) {
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
-      break;
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
-      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
-      break;
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
-      AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
-      break;
-    }
-
-    llvm::Function *AwaitSuspendIntrinsic =
-        CGF.CGM.getIntrinsic(AwaitSuspendIID);
-
-    llvm::CallBase *SuspendRet = nullptr;
-    // FIXME: add call attributes?
-    if (AwaitSuspendCanThrow)
-      SuspendRet =
-          CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
-    else
-      SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
-                                               SuspendIntrinsicCallArgs);
-
-    assert(SuspendRet);
-    CGF.CurCoro.InSuspendBlock = false;
-
-    switch (SuspendReturnType) {
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-      assert(SuspendRet->getType()->isVoidTy());
-      break;
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendBool: {
-      assert(SuspendRet->getType()->isIntegerTy());
-
-      // Veto suspension if requested by bool returning await_suspend.
-      BasicBlock *RealSuspendBlock =
-          CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
-      CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
-      CGF.EmitBlock(RealSuspendBlock);
-      break;
-    }
-    case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle: {
-      assert(SuspendRet->getType()->isVoidTy());
-      break;
-    }
-    }
-
-    // Emit the suspend point.
-    const bool IsFinalSuspend = (Kind == AwaitKind::Final);
-    llvm::Function *CoroSuspend =
-        CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_suspend);
-    auto *SuspendResult = Builder.CreateCall(
-        CoroSuspend, {SaveCall, Builder.getInt1(IsFinalSuspend)});
-
-    // Create a switch capturing three possible continuations.
-    auto *Switch = Builder.CreateSwitch(SuspendResult, Coro.SuspendBB, 2);
-    Switch->addCase(Builder.getInt8(0), ReadyBlock);
-    Switch->addCase(Builder.getInt8(1), CleanupBlock);
-
-    // Emit cleanup for this suspend point.
-    CGF.EmitBlock(CleanupBlock);
-    CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
+    emitStandardAwaitSuspend(CGF, Coro, S, SuspendWrapper, Awaiter, Frame,
+                             AwaitSuspendCanThrow, Prefix, ReadyBlock, Kind,
+                             SuspendReturnType);
   }
 
   // Emit await_resume expression.
@@ -429,6 +446,7 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   CXXTryStmt *TryStmt = nullptr;
   if (Coro.ExceptionHandler && Kind == AwaitKind::Init &&
       StmtCanThrow(S.getResumeExpr())) {
+    auto &Builder = CGF.Builder;
     Coro.ResumeEHVar =
         CGF.CreateTempAlloca(Builder.getInt1Ty(), Prefix + Twine("resume.eh"));
     Builder.CreateFlagStore(true, Coro.ResumeEHVar);

>From f1e885c45837b0e1bd3925fb69e16c6189ebbb80 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Sun, 17 Aug 2025 11:41:17 -0700
Subject: [PATCH 13/15] Improvements in response to comments

- Switch back to a member function attribute so that each overload can independently decide whether it uses `await_suspend_destroy()`
- Emit a diagnostic when `await_suspend` and `await_suspend_destroy` have mismatched types
- Add UseAwaitSuspendDestroy bit to CoSuspendExpr
- Set UseAwaitSuspendDestroy bit from SemaCoroutine
- Use UseAwaitSuspendDestroy in CGCoroutine
- Move `lit` tests for diagnostics for the new attribute into `test/SemaCXX/`
- Add new IR test for issue #148380
- Improve main IR test per feedback.
- Improve main IR test to cover "multiple overloads in one awaiter"
- Improve AttrDocs per feedback
- clang-format
---
 clang/docs/ReleaseNotes.rst                   |  12 +-
 clang/include/clang/AST/ExprCXX.h             |  11 +
 clang/include/clang/AST/Stmt.h                |  16 +-
 clang/include/clang/Basic/Attr.td             |   2 +-
 clang/include/clang/Basic/AttrDocs.td         | 124 +++---
 .../clang/Basic/DiagnosticSemaKinds.td        |   3 +
 clang/lib/CodeGen/CGCoroutine.cpp             |  70 +---
 clang/lib/Sema/SemaCoroutine.cpp              |  98 +++--
 clang/lib/Serialization/ASTReaderStmt.cpp     |   2 +
 clang/lib/Serialization/ASTWriterStmt.cpp     |   1 +
 .../coro-await-suspend-destroy-errors.cpp     |  55 ---
 .../coro-await-suspend-destroy.cpp            | 354 +++++++++++++-----
 clang/test/CodeGenCoroutines/issue148380.cpp  |  42 +++
 ...a-attribute-supported-attributes-list.test |   2 +-
 .../coro-await-suspend-destroy-errors.cpp     |  61 +++
 15 files changed, 535 insertions(+), 318 deletions(-)
 delete mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
 create mode 100644 clang/test/CodeGenCoroutines/issue148380.cpp
 create mode 100644 clang/test/SemaCXX/coro-await-suspend-destroy-errors.cpp

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0de047c10fdb1..2da187c6be627 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -154,10 +154,12 @@ Attribute Changes in Clang
 --------------------------
 
 - Introduced a new attribute ``[[clang::coro_await_suspend_destroy]]``.  When
-  applied to a coroutine awaiter class, it causes suspensions into this awaiter
-  to use a new `await_suspend_destroy(Promise&)` method instead of the standard
-  `await_suspend(std::coroutine_handle<...>)`.  The coroutine is then destroyed.
-  This improves code speed & size for "short-circuiting" coroutines.
+  applied to an `await_suspend(std::coroutine_handle<Promise>)` member of a
+  coroutine awaiter, it causes suspensions into this awaiter to use a new
+  `await_suspend_destroy(Promise&)` method.  The coroutine is then immediately
+  destroyed.  This flow bypasses the original `await_suspend()` (though it
+  must contain a compatibility stub), and omits suspend intrinsics.  The net
+  effect is improved code speed & size for "short-circuiting" coroutines.
 
 Improvements to Clang's diagnostics
 -----------------------------------
@@ -179,7 +181,7 @@ Improvements to Clang's diagnostics
   "format specifies type 'unsigned int' but the argument has type 'int', which differs in signedness [-Wformat-signedness]"
   "signedness of format specifier 'u' is incompatible with 'c' [-Wformat-signedness]"
   and the API-visible diagnostic id will be appropriate.
-  
+
 - Fixed false positives in ``-Waddress-of-packed-member`` diagnostics when
   potential misaligned members get processed before they can get discarded.
   (#GH144729)
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index 9fedb230ce397..de17f6f8c1f56 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -5266,6 +5266,7 @@ class CoroutineSuspendExpr : public Expr {
       : Expr(SC, Resume->getType(), Resume->getValueKind(),
              Resume->getObjectKind()),
         KeywordLoc(KeywordLoc), OpaqueValue(OpaqueValue) {
+    CoroutineSuspendExprBits.UseAwaitSuspendDestroy = false;
     SubExprs[SubExpr::Operand] = Operand;
     SubExprs[SubExpr::Common] = Common;
     SubExprs[SubExpr::Ready] = Ready;
@@ -5279,6 +5280,7 @@ class CoroutineSuspendExpr : public Expr {
       : Expr(SC, Ty, VK_PRValue, OK_Ordinary), KeywordLoc(KeywordLoc) {
     assert(Common->isTypeDependent() && Ty->isDependentType() &&
            "wrong constructor for non-dependent co_await/co_yield expression");
+    CoroutineSuspendExprBits.UseAwaitSuspendDestroy = false;
     SubExprs[SubExpr::Operand] = Operand;
     SubExprs[SubExpr::Common] = Common;
     SubExprs[SubExpr::Ready] = nullptr;
@@ -5288,6 +5290,7 @@ class CoroutineSuspendExpr : public Expr {
   }
 
   CoroutineSuspendExpr(StmtClass SC, EmptyShell Empty) : Expr(SC, Empty) {
+    CoroutineSuspendExprBits.UseAwaitSuspendDestroy = false;
     SubExprs[SubExpr::Operand] = nullptr;
     SubExprs[SubExpr::Common] = nullptr;
     SubExprs[SubExpr::Ready] = nullptr;
@@ -5295,6 +5298,14 @@ class CoroutineSuspendExpr : public Expr {
     SubExprs[SubExpr::Resume] = nullptr;
   }
 
+  bool useAwaitSuspendDestroy() const {
+    return CoroutineSuspendExprBits.UseAwaitSuspendDestroy;
+  }
+
+  void setUseAwaitSuspendDestroy(bool Use) {
+    CoroutineSuspendExprBits.UseAwaitSuspendDestroy = Use;
+  }
+
   Expr *getCommonExpr() const {
     return static_cast<Expr*>(SubExprs[SubExpr::Common]);
   }
diff --git a/clang/include/clang/AST/Stmt.h b/clang/include/clang/AST/Stmt.h
index a5b0d5053003f..27cbfbfa3d319 100644
--- a/clang/include/clang/AST/Stmt.h
+++ b/clang/include/clang/AST/Stmt.h
@@ -1258,12 +1258,23 @@ class alignas(void *) Stmt {
 
   //===--- C++ Coroutines bitfields classes ---===//
 
-  class CoawaitExprBitfields {
-    friend class CoawaitExpr;
+  class CoroutineSuspendExprBitfields {
+    friend class CoroutineSuspendExpr;
 
     LLVM_PREFERRED_TYPE(ExprBitfields)
     unsigned : NumExprBits;
 
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned UseAwaitSuspendDestroy : 1;
+  };
+  enum { NumCoroutineSuspendExprBits = NumExprBits + 1 };
+
+  class CoawaitExprBitfields {
+    friend class CoawaitExpr;
+
+    LLVM_PREFERRED_TYPE(CoroutineSuspendExprBitfields)
+    unsigned : NumCoroutineSuspendExprBits;
+
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsImplicit : 1;
   };
@@ -1388,6 +1399,7 @@ class alignas(void *) Stmt {
     PackIndexingExprBitfields PackIndexingExprBits;
 
     // C++ Coroutines expressions
+    CoroutineSuspendExprBitfields CoroutineSuspendExprBits;
     CoawaitExprBitfields CoawaitBits;
 
     // Obj-C Expressions
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 646a101459f86..e4cedd5e55784 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1354,7 +1354,7 @@ def CoroAwaitElidableArgument : InheritableAttr {
 
 def CoroAwaitSuspendDestroy: InheritableAttr {
   let Spellings = [Clang<"coro_await_suspend_destroy">];
-  let Subjects = SubjectList<[CXXRecord]>;
+  let Subjects = SubjectList<[CXXMethod]>;
   let LangOpts = [CPlusPlus];
   let Documentation = [CoroAwaitSuspendDestroyDoc];
   let SimpleHandler = 1;
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 4c00bc31e9efd..19ad3e452fed4 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9364,101 +9364,109 @@ Example:
 }
 
 def CoroAwaitSuspendDestroyDoc : Documentation {
-  let Category = DocCatDecl;
+  let Category = DocCatFunction;
   let Content = [{
 
-The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
-coroutine awaiter type.  When this attribute is present, the awaiter must
-implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
-returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
-called directly.  The coroutine being suspended will then be immediately
-destroyed.
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle<Promise>)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
 
-The new behavior is equivalent to this standard code:
+Each annotated ``await_suspend`` member must contain a compatibility stub:
 
 .. code-block:: c++
 
-  void await_suspend_destroy(YourPromise&) { ... }
-  void await_suspend(auto handle) {
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle<Promise> handle) {
     await_suspend_destroy(handle.promise());
     handle.destroy();
   }
 
-This enables `await_suspend_destroy()` usage in portable awaiters — just add a
-stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
-support, the awaiter will behave nearly identically, with the only difference
-being heap allocation instead of stack allocation for the coroutine frame.
+An awaiter type may provide both annotated and non-annotated overloads of
+`await_suspend()`, as long as each invocation of an annotated overload has a
+corresponding `await_suspend_destroy(Promise&)` overload.
+
+Instead of calling the annotated ``await_suspend()``, the coroutine calls
+``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine
+``await_suspend_destroy()`` must return ``void`` (Note: if desired, it
+would be straightforward to also support the "symmetric transfer"
+`std::coroutine_handle` return type.)
+
+This optimization improves code speed and size for "short-circuiting"
+coroutines — those that use coroutine syntax **exclusively** for early returns
+and control flow rather than true asynchronous operations.
+
+Specifically, a short-circuiting awaiter is one that either proceeds
+immediately (``await_ready()`` returns ``true``, skipping to
+``await_resume()``) or terminates the coroutine execution.
 
-This attribute helps optimize short-circuiting coroutines.
+Then, a short-circuiting coroutine is one where **all** the awaiters (including
+``co_await``, ``co_yield``, initial, and final suspend) are short-circuiting.
 
-A short-circuiting coroutine is one where every ``co_await`` or ``co_yield``
-either immediately produces a value, or exits the coroutine.  In other words,
-they use coroutine syntax to concisely branch out of a synchronous function. 
-Here are close analogs in other languages:
+The short-circuiting coroutine concept introduced above has close analogs in
+other languages:
 
 - Rust has ``Result<T>`` and a ``?`` operator to unpack it, while
-  ``folly::result<T>`` is a C++ short-circuiting coroutine, with ``co_await``
-  acting just like ``?``.
+  ``folly::result<T>`` is a C++ short-circuiting coroutine, within which
+  ``co_await or_unwind(someResult())`` acts just like ``someResult()?``.
 
 - Haskell has ``Maybe`` & ``Error`` monads.  A short-circuiting ``co_await``
   loosely corresponds to the monadic ``>>=``, whereas a short-circuiting
   ``std::optional`` coro would be an exact analog of ``Maybe``.
 
-The C++ implementation relies on short-circuiting awaiters.  These either
-resume synchronously, or immediately destroy the awaiting coroutine and return
-control to the parent:
+Returning to C++, even non-short-circuiting coroutines, including asynchronous
+ones that suspend, may contain short-circuiting awaiters, and those might still
+see some performance benefit if annotated.
 
-.. code-block:: c++
+Marking your ``await_suspend_destroy`` as ``noexcept`` can sometimes further
+improve optimization.
 
-  T val;
-  if (awaiter.await_ready()) {
-    val = awaiter.await_resume();
-  } else {
-    awaiter.await_suspend();
-    return /* value representing the "execution short-circuited" outcome */;
-  }
+However, if **all** awaiters within a coroutine are short-circuiting, then the
+coro frame **can reliably be allocated on-stack**, making short-circuiting
+coros behave qualitatively more like plain functions -- with better
+optimization & more predictable behavior under memory pressure.
 
-Then, a short-ciruiting coroutine is one where all the suspend points are
-either (i) trivial (like ``std::suspend_never``), or (ii) short-circuiting.
+Technical aside: Heap elision becomes reliable because LLVM is allowed to elide
+heap allocations whenever it can prove that the handle doesn't "escape" from
+the coroutine.  User code can only access the handle via suspend intrinsics,
+and annotated short-circuiting awaiters simply don't use any.
 
-Although the coroutine machinery makes them harder to optimize, logically,
-short-circuiting coroutines are like syntax sugar for regular functions where:
+Note that a short-circuiting coroutine differs in one important way from a
+function that replaced each `co_await awaiter` with explicit control flow:
 
-- `co_await` allows expressions to return early.
-
-- `unhandled_exception()` lets the coroutine promise type wrap the function
-  body in an implicit try-catch.  This mandatory exception boundary behavior
-  can be desirable in robust, return-value-oriented programs that benefit from
-  short-circuiting coroutines.  If not, the promise can always re-throw.
-
-This attribute improves short-circuiting coroutines in a few ways:
-
-- **Avoid heap allocations for coro frames**: Allocating short-circuiting
-  coros on the stack makes code more predictable under memory pressure.
-  Without this attribute, LLVM cannot elide heap allocation even when all
-  awaiters are short-circuiting.
-
-- **Performance**: Significantly faster execution and smaller code size.
+.. code-block:: c++
 
-- **Build time**: Faster compilation due to less IR being generated.
+  T value;
+  if (awaiter.await_ready()) {
+    value = awaiter.await_resume();
+  } else {
+    // ... content of `await_suspend_destroy` ...
+    return /* early-termination return object */;
+  }
 
-Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
-further improve optimization.
+That key difference is that `unhandled_exception()` lets the promise type wrap
+the function body in an implicit try-catch.  This automatic exception boundary
+behavior can be desirable in robust, return-value-oriented programs that
+benefit from short-circuiting coroutines.  If not, the promise can re-throw.
 
-Here is a toy example of a portable short-circuiting awaiter:
+Here is an example of a short-circuiting awaiter for a hypothetical
+`std::optional` coroutine:
 
 .. code-block:: c++
 
   template <typename T>
-  struct [[clang::coro_await_suspend_destroy]] optional_awaiter {
+  struct optional_awaiter {
     std::optional<T> opt_;
     bool await_ready() const noexcept { return opt_.has_value(); }
     T await_resume() { return std::move(opt_).value(); }
     void await_suspend_destroy(auto& promise) {
-      // Assume the return object of the outer coro defaults to "empty".
+      // The return object of `promise`'s coro should default to "empty".
+      assert(!promise.returned_optional_ptr_->has_value());
     }
-    // Fallback for when `coro_await_suspend_destroy` is unavailable.
+    [[clang::coro_await_suspend_destroy]]
     void await_suspend(auto handle) {
+      // Fallback for when `coro_await_suspend_destroy` is unavailable.
       await_suspend_destroy(handle.promise());
       handle.destroy();
     }
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 6479b2c732917..8b2c187313765 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12510,6 +12510,9 @@ def err_await_suspend_invalid_return_type : Error<
 def err_await_suspend_destroy_invalid_return_type : Error<
   "return type of 'await_suspend_destroy' is required to be 'void' (have %0)"
 >;
+def err_await_suspend_suspend_destroy_return_type_mismatch : Error<
+  "return type of 'await_suspend' (%1) must match return type of 'await_suspend_destroy' (%0)"
+>;
 def note_await_ready_no_bool_conversion : Note<
   "return type of 'await_ready' is required to be contextually convertible to 'bool'"
 >;
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 883a45d2acfff..ab972dda1c7c4 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -174,66 +174,6 @@ static bool StmtCanThrow(const Stmt *S) {
   return false;
 }
 
-// Check if this suspend should be calling `await_suspend_destroy`
-static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) {
-  // This can only be an `await_suspend_destroy` suspend expression if it
-  // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this.
-  // Moreover, when `await_suspend` returns a handle, the outermost method call
-  // is `.address()` -- making it harder to get the actual class or method.
-  if (S.getSuspendReturnType() !=
-      CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) {
-    return false;
-  }
-
-  // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend
-  // expression uses `[[clang::coro_await_suspend_destroy]]`.
-  //
-  // Any mismatch is a serious bug -- we would either double-free, or fail to
-  // destroy the promise type. For this reason, we make our decision based on
-  // the method name, and fatal outside of the happy path -- including on
-  // failure to find a method name.
-  //
-  // As a debug-only check we also try to detect the `AwaiterClass`. This is
-  // secondary, because  detection of the awaiter type can be silently broken by
-  // small `buildCoawaitCalls` AST changes.
-  StringRef SuspendMethodName;           // Primary
-  CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort
-  if (auto *SuspendCall =
-          dyn_cast<CallExpr>(S.getSuspendExpr()->IgnoreImplicit())) {
-    if (auto *SuspendMember = dyn_cast<MemberExpr>(SuspendCall->getCallee())) {
-      if (auto *BaseExpr = SuspendMember->getBase()) {
-        // `IgnoreImplicitAsWritten` is critical since `await_suspend...` can be
-        // invoked on the base of the actual awaiter, and the base need not have
-        // the attribute. In such cases, the AST will show the true awaiter
-        // being upcast to the base.
-        AwaiterClass = BaseExpr->IgnoreImplicitAsWritten()
-                           ->getType()
-                           ->getAsCXXRecordDecl();
-      }
-      if (auto *SuspendMethod =
-              dyn_cast<CXXMethodDecl>(SuspendMember->getMemberDecl())) {
-        SuspendMethodName = SuspendMethod->getName();
-      }
-    }
-  }
-  if (SuspendMethodName == "await_suspend_destroy") {
-    assert(!AwaiterClass ||
-           AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>());
-    return true;
-  } else if (SuspendMethodName == "await_suspend") {
-    assert(!AwaiterClass ||
-           !AwaiterClass->hasAttr<CoroAwaitSuspendDestroyAttr>());
-    return false;
-  } else {
-    llvm::report_fatal_error(
-        "Wrong method in [[clang::coro_await_suspend_destroy]] check: "
-        "expected 'await_suspend' or 'await_suspend_destroy', but got '" +
-        SuspendMethodName + "'");
-  }
-
-  return false;
-}
-
 // Emit suspend expression which roughly looks like:
 //
 //   auto && x = CommonExpr();
@@ -391,10 +331,10 @@ static void emitStandardAwaitSuspend(
   CGF.EmitBranchThroughCleanup(Coro.CleanupJD);
 }
 
-static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
-                                    CoroutineSuspendExpr const &S,
-                                    AwaitKind Kind, AggValueSlot aggSlot,
-                                    bool ignoreResult, bool forLValue) {
+static LValueOrRValue
+emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
+                      CoroutineSuspendExpr const &S, AwaitKind Kind,
+                      AggValueSlot aggSlot, bool ignoreResult, bool forLValue) {
   auto *E = S.getCommonExpr();
 
   auto CommonBinder =
@@ -429,7 +369,7 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
       CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF);
   llvm::Value *Frame = CGF.CurCoro.Data->CoroBegin;
 
-  if (useCoroAwaitSuspendDestroy(S)) { // Call `await_suspend_destroy` & cleanup
+  if (S.useAwaitSuspendDestroy()) { // Call `await_suspend_destroy` & cleanup
     emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame,
                             AwaitSuspendCanThrow);
   } else { // Normal suspend path -- can actually suspend, uses intrinsics
diff --git a/clang/lib/Sema/SemaCoroutine.cpp b/clang/lib/Sema/SemaCoroutine.cpp
index 0f335f2b35279..21dcadcee6f06 100644
--- a/clang/lib/Sema/SemaCoroutine.cpp
+++ b/clang/lib/Sema/SemaCoroutine.cpp
@@ -313,21 +313,12 @@ static ExprResult buildPromiseRef(Sema &S, QualType PromiseType,
   return S.CreateBuiltinUnaryOp(Loc, UO_Deref, CastExpr.get());
 }
 
-static bool hasCoroAwaitSuspendDestroyAttr(Expr *Awaiter) {
-  QualType AwaiterType = Awaiter->getType();
-  if (auto *RD = AwaiterType->getAsCXXRecordDecl()) {
-    if (RD->hasAttr<CoroAwaitSuspendDestroyAttr>()) {
-      return true;
-    }
-  }
-  return false;
-}
-
 struct ReadySuspendResumeResult {
   enum AwaitCallType { ACT_Ready, ACT_Suspend, ACT_Resume };
   Expr *Results[3];
   OpaqueValueExpr *OpaqueValue;
   bool IsInvalid;
+  bool UseAwaitSuspendDestroy;
 };
 
 static ExprResult buildMemberCall(Sema &S, Expr *Base, SourceLocation Loc,
@@ -399,7 +390,8 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
 
   // Assume valid until we see otherwise.
   // Further operations are responsible for setting IsInalid to true.
-  ReadySuspendResumeResult Calls = {{}, Operand, /*IsInvalid=*/false};
+  ReadySuspendResumeResult Calls = {
+      {}, Operand, /*IsInvalid=*/false, /*UseAwaitSuspendDestroy=*/false};
 
   using ACT = ReadySuspendResumeResult::AwaitCallType;
 
@@ -433,32 +425,46 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
       Calls.Results[ACT::ACT_Ready] = S.MaybeCreateExprWithCleanups(Conv.get());
   }
 
-  // For awaiters with `[[clang::coro_await_suspend_destroy]]`, we call
-  // `void await_suspend_destroy(Promise&)` & promptly destroy the coro.
-  CallExpr *AwaitSuspend = nullptr;
-  bool UseAwaitSuspendDestroy = hasCoroAwaitSuspendDestroyAttr(Operand);
-  if (UseAwaitSuspendDestroy) {
-    ExprResult PromiseRefRes = buildPromiseRef(S, CoroPromise->getType(), Loc);
-    if (PromiseRefRes.isInvalid()) {
-      Calls.IsInvalid = true;
-      return Calls;
-    }
-    Expr *PromiseRef = PromiseRefRes.get();
-    AwaitSuspend = cast_or_null<CallExpr>(
-        BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
-  } else { // The standard `await_suspend(std::coroutine_handle<...>)`
-    ExprResult CoroHandleRes =
-        buildCoroutineHandle(S, CoroPromise->getType(), Loc);
-    if (CoroHandleRes.isInvalid()) {
-      Calls.IsInvalid = true;
-      return Calls;
-    }
-    Expr *CoroHandle = CoroHandleRes.get();
-    AwaitSuspend = cast_or_null<CallExpr>(
-        BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
+  ExprResult CoroHandleRes =
+      buildCoroutineHandle(S, CoroPromise->getType(), Loc);
+  if (CoroHandleRes.isInvalid()) {
+    Calls.IsInvalid = true;
+    return Calls;
   }
+  Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
+  CallExpr *AwaitSuspend = cast_or_null<CallExpr>(
+      BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
     return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  //     await_suspend_destroy(handle.promise());
+  //     handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+    if (AwaitSuspendCallee->hasAttr<CoroAwaitSuspendDestroyAttr>()) {
+      Calls.UseAwaitSuspendDestroy = true;
+      ExprResult PromiseRefRes =
+          buildPromiseRef(S, CoroPromise->getType(), Loc);
+      if (PromiseRefRes.isInvalid()) {
+        Calls.IsInvalid = true;
+        return Calls;
+      }
+      Expr *PromiseRef = PromiseRefRes.get();
+      PlainAwaitSuspend = AwaitSuspend;
+      AwaitSuspend = cast_or_null<CallExpr>(
+          BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+      if (!AwaitSuspend)
+        return Calls;
+    }
+  }
+
   if (!AwaitSuspend->getType()->isDependentType()) {
     // [expr.await]p3 [...]
     //   - await-suspend is the expression e.await_suspend(h), which shall be
@@ -466,17 +472,25 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
     //     type Z.
     QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
-    auto EmitAwaitSuspendDiag = [&](unsigned int DiagCode) {
-      S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(), DiagCode) << RetType;
+    auto EmitAwaitSuspendDiag = [&](unsigned int DiagCode, auto... args) {
+      ((S.Diag(AwaitSuspend->getCalleeDecl()->getLocation(), DiagCode)
+        << RetType)
+       << ... << args);
       S.Diag(Loc, diag::note_coroutine_promise_call_implicitly_required)
           << AwaitSuspend->getDirectCallee();
       Calls.IsInvalid = true;
     };
 
-    // `await_suspend_destroy` must return `void` -- and `CGCoroutine.cpp`
-    // critically depends on this in `hasCoroAwaitSuspendDestroyAttr`.
-    if (UseAwaitSuspendDestroy) {
-      if (RetType->isVoidType()) {
+    if (Calls.UseAwaitSuspendDestroy) {
+      // The return types of `await_suspend` and `await_suspend_destroy` must
+      // match. For now, the latter must return `void` -- though this could be
+      // extended to support returning handles.
+      QualType PlainRetType = PlainAwaitSuspend->getCallReturnType(S.Context);
+      if (!S.Context.hasSameType(PlainRetType, RetType)) {
+        EmitAwaitSuspendDiag(
+            diag::err_await_suspend_suspend_destroy_return_type_mismatch,
+            PlainRetType);
+      } else if (RetType->isVoidType()) {
         Calls.Results[ACT::ACT_Suspend] =
             S.MaybeCreateExprWithCleanups(AwaitSuspend);
       } else {
@@ -1015,6 +1029,8 @@ ExprResult Sema::BuildResolvedCoawaitExpr(SourceLocation Loc, Expr *Operand,
   Expr *Res = new (Context)
       CoawaitExpr(Loc, Operand, Awaiter, RSS.Results[0], RSS.Results[1],
                   RSS.Results[2], RSS.OpaqueValue, IsImplicit);
+  static_cast<CoroutineSuspendExpr *>(Res)->setUseAwaitSuspendDestroy(
+      RSS.UseAwaitSuspendDestroy);
 
   return Res;
 }
@@ -1072,6 +1088,8 @@ ExprResult Sema::BuildCoyieldExpr(SourceLocation Loc, Expr *E) {
   Expr *Res =
       new (Context) CoyieldExpr(Loc, Operand, E, RSS.Results[0], RSS.Results[1],
                                 RSS.Results[2], RSS.OpaqueValue);
+  static_cast<CoroutineSuspendExpr *>(Res)->setUseAwaitSuspendDestroy(
+      RSS.UseAwaitSuspendDestroy);
 
   return Res;
 }
diff --git a/clang/lib/Serialization/ASTReaderStmt.cpp b/clang/lib/Serialization/ASTReaderStmt.cpp
index 3f37dfbc3dea9..c83a2601d19e4 100644
--- a/clang/lib/Serialization/ASTReaderStmt.cpp
+++ b/clang/lib/Serialization/ASTReaderStmt.cpp
@@ -480,6 +480,7 @@ void ASTStmtReader::VisitCoawaitExpr(CoawaitExpr *E) {
   for (auto &SubExpr: E->SubExprs)
     SubExpr = Record.readSubStmt();
   E->OpaqueValue = cast_or_null<OpaqueValueExpr>(Record.readSubStmt());
+  E->setUseAwaitSuspendDestroy(Record.readInt() != 0);
   E->setIsImplicit(Record.readInt() != 0);
 }
 
@@ -489,6 +490,7 @@ void ASTStmtReader::VisitCoyieldExpr(CoyieldExpr *E) {
   for (auto &SubExpr: E->SubExprs)
     SubExpr = Record.readSubStmt();
   E->OpaqueValue = cast_or_null<OpaqueValueExpr>(Record.readSubStmt());
+  E->setUseAwaitSuspendDestroy(Record.readInt() != 0);
 }
 
 void ASTStmtReader::VisitDependentCoawaitExpr(DependentCoawaitExpr *E) {
diff --git a/clang/lib/Serialization/ASTWriterStmt.cpp b/clang/lib/Serialization/ASTWriterStmt.cpp
index be9bad9e96cc1..25c7ab165edf0 100644
--- a/clang/lib/Serialization/ASTWriterStmt.cpp
+++ b/clang/lib/Serialization/ASTWriterStmt.cpp
@@ -445,6 +445,7 @@ void ASTStmtWriter::VisitCoroutineSuspendExpr(CoroutineSuspendExpr *E) {
   for (Stmt *S : E->children())
     Record.AddStmt(S);
   Record.AddStmt(E->getOpaqueValue());
+  Record.push_back(E->useAwaitSuspendDestroy());
 }
 
 void ASTStmtWriter::VisitCoawaitExpr(CoawaitExpr *E) {
diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
deleted file mode 100644
index 6a082c15f2581..0000000000000
--- a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
+++ /dev/null
@@ -1,55 +0,0 @@
-// RUN: %clang_cc1 -std=c++20 -verify %s 
-
-#include "Inputs/coroutine.h"
-
-// Coroutine type with `std::suspend_never` for initial/final suspend
-struct Task {
-  struct promise_type {
-    Task get_return_object() { return {}; }
-    std::suspend_never initial_suspend() { return {}; }
-    std::suspend_never final_suspend() noexcept { return {}; }
-    void return_void() {}
-    void unhandled_exception() {}
-  };
-};
-
-struct [[clang::coro_await_suspend_destroy]] WrongReturnTypeAwaitable {
-  bool await_ready() { return false; }
-  bool await_suspend_destroy(auto& promise) { return true; } // expected-error {{return type of 'await_suspend_destroy' is required to be 'void' (have 'bool')}}
-  void await_suspend(auto handle) {
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-  void await_resume() {}
-};
-
-Task test_invalid_destroying_await() {
-  co_await WrongReturnTypeAwaitable{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}}
-}
-
-struct [[clang::coro_await_suspend_destroy]] MissingMethodAwaitable {
-  bool await_ready() { return false; }
-  // Missing await_suspend_destroy method
-  void await_suspend(auto handle) {
-    handle.destroy();
-  }
-  void await_resume() {}
-};
-
-Task test_missing_method() {
-  co_await MissingMethodAwaitable{}; // expected-error {{no member named 'await_suspend_destroy' in 'MissingMethodAwaitable'}}
-}
-
-struct [[clang::coro_await_suspend_destroy]] WrongParameterTypeAwaitable {
-  bool await_ready() { return false; }
-  void await_suspend_destroy(int x) {} // expected-note {{passing argument to parameter 'x' here}}
-  void await_suspend(auto handle) {
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-  void await_resume() {}
-};
-
-Task test_wrong_parameter_type() {
-  co_await WrongParameterTypeAwaitable{}; // expected-error {{no viable conversion from 'std::coroutine_traits<Task>::promise_type' (aka 'Task::promise_type') to 'int'}}
-}
diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
index fa1dbf475e56c..0169778993ae4 100644
--- a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
+++ b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy.cpp
@@ -1,129 +1,301 @@
 // RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
-// RUN:   -disable-llvm-passes | FileCheck %s --check-prefix=CHECK-INITIAL
+// RUN:   -disable-llvm-passes | FileCheck %s --check-prefix=CHECK
 // RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
-// RUN:   -O2 | FileCheck %s --check-prefix=CHECK-OPTIMIZED
+// RUN:   -O2 | FileCheck %s --check-prefix=CHECK-OPT
+
+// See `SemaCXX/coro-await-suspend-destroy-errors.cpp` for error checks.
 
 #include "Inputs/coroutine.h"
 
-// Awaitable with `coro_await_suspend_destroy` attribute
-struct [[clang::coro_await_suspend_destroy]] DestroyingAwaitable {
-  bool await_ready() { return false; }
-  void await_suspend_destroy(auto& promise) {}
-  void await_suspend(auto handle) {
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
+// This is used to implement a few `await_suspend()`s annotated with the
+// [[clang::coro_await_suspend_destroy]] attribute. As a consequence, it is only
+// test-called, never emitted.
+//
+// The `operator new()` is meant to fail subsequent "no allocation" checks if
+// this does get emitted.
+//
+// It is followed by the recommended `await_suspend` stub, to check it compiles.
+#define STUB_AWAIT_SUSPEND(handle) \
+    operator new(1); \
+    await_suspend_destroy(handle.promise()); \
+    handle.destroy()
+
+// Use a dynamic `await_ready()` to ensure the suspend branch cannot be
+// optimized away. Implements everything but `await_suspend()`.
+struct BaseAwaiter {
+  bool ready_;
+  bool await_ready() { return ready_; }
   void await_resume() {}
+  BaseAwaiter(bool ready) : ready_{ready} {}
 };
 
-// Awaitable without `coro_await_suspend_destroy` (normal behavior)
-struct NormalAwaitable {
-  bool await_ready() { return false; }
-  void await_suspend(std::coroutine_handle<> h) {}
-  void await_resume() {}
+// For a coroutine function to be a short-circuiting function, it needs a
+// coroutine type with `std::suspend_never` for initial/final suspend
+template <typename TaskT>
+struct BasePromiseType {
+  TaskT get_return_object() { return {}; }
+  std::suspend_never initial_suspend() { return {}; }
+  std::suspend_never final_suspend() noexcept { return {}; }
+  void return_void() {}
+  void unhandled_exception() {}
 };
 
-// Coroutine type with `std::suspend_never` for initial/final suspend
-struct Task {
-  struct promise_type {
-    Task get_return_object() { return {}; }
-    std::suspend_never initial_suspend() { return {}; }
-    std::suspend_never final_suspend() noexcept { return {}; }
-    void return_void() {}
-    void unhandled_exception() {}
-  };
+// The coros look the same, but `MaybeSuspendingAwaiter` handles them differently.
+struct NonSuspendingTask {
+  struct promise_type : BasePromiseType<NonSuspendingTask> {};
+};
+struct MaybeSuspendingTask {
+  struct promise_type : BasePromiseType<MaybeSuspendingTask> {};
+};
+
+// When a coro only uses short-circuiting awaiters, it should elide allocations.
+//   - `DestroyingAwaiter` is always short-circuiting
+//   - `MaybeSuspendingAwaiter` short-circuits only in `NonSuspendingTask`
+
+struct DestroyingAwaiter : BaseAwaiter {
+  void await_suspend_destroy(auto& promise) {}
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(auto handle) { STUB_AWAIT_SUSPEND(handle); }
+};
+
+struct MaybeSuspendingAwaiter : BaseAwaiter {
+  // Without the attribute, the coro will use `await.suspend` intrinsics, which
+  // currently trigger heap allocations for coro frames. Since the body isn't
+  // visible, escape analysis should prevent heap elision.
+  void await_suspend(std::coroutine_handle<MaybeSuspendingTask::promise_type>);
+
+  void await_suspend_destroy(NonSuspendingTask::promise_type&) {}
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle<NonSuspendingTask::promise_type> h) {
+    STUB_AWAIT_SUSPEND(h);
+  }
 };
 
-// Single co_await with coro_await_suspend_destroy.
 // Should result in no allocation after optimization.
-Task test_single_destroying_await() {
-  co_await DestroyingAwaitable{};
+NonSuspendingTask test_single_destroying_await(bool ready) {
+  co_await DestroyingAwaiter{ready};
+}
+
+// The reason this first `CHECK` test is so long is that it shows most of the
+// unoptimized IR before coroutine lowering. The granular detail is provided per
+// PR152623 code review, with the aim of helping future authors understand the
+// intended control flow.
+//
+// This mostly shows the standard coroutine flow. Find **ATTRIBUTE-SPECIFIC** in
+// the comments below to understand where the behavior diverges.
+
+// Basic coro setup
+
+// CHECK-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitb
+// CHECK: entry:
+// CHECK: %__promise = alloca %"struct.NonSuspendingTask::promise_type", align 1
+// CHECK: %[[PROMISE:.+]] = bitcast ptr %__promise to ptr
+// CHECK-NEXT: %[[CORO_ID:.+]] = call token @llvm.coro.id(i32 {{[0-9]+}}, ptr %[[PROMISE]],
+// CHECK-NEXT: %[[USE_DYNAMIC_ALLOC:.+]] = call i1 @llvm.coro.alloc(token %[[CORO_ID]])
+// CHECK-NEXT: br i1 %[[USE_DYNAMIC_ALLOC]], label %coro.alloc, label %coro.init
+
+// Conditional heap alloc -- must be elided after lowering
+
+// CHECK: coro.alloc: ; preds = %entry
+// CHECK: call{{.*}} @_Znwm
+
+// Init coro frame & handle initial suspend
+
+// CHECK: coro.init: ; preds = %coro.alloc, %entry
+// CHECK: %[[FRAME:.+]] = call ptr @llvm.coro.begin(token %[[CORO_ID]]
+//
+// CHECK: call{{.*}} @_ZN15BasePromiseTypeI17NonSuspendingTaskE15initial_suspendEv
+// CHECK-NEXT: %[[INIT_SUSPEND_READY:.+]] = call{{.*}} i1 @_ZNSt13suspend_never11await_readyEv
+// CHECK-NEXT: br i1 %[[INIT_SUSPEND_READY]], label %init.ready, label %init.suspend
+//
+// CHECK: init.suspend: ; preds = %coro.init
+// ... implementation omitted, not reached ...
+//
+// CHECK: init.ready: ; preds = %init.suspend, %coro.init
+
+// Handle the user-visible `co_await` suspend point:
+
+// CHECK: %[[CO_AWAIT_READY:.+]] = call{{.*}} i1 @_ZN11BaseAwaiter11await_readyEv(
+// CHECK-NEXT: br i1 %[[CO_AWAIT_READY]], label %await.ready, label %await.suspend
+
+// **ATTRIBUTE-SPECIFIC**
+//
+// This `co_await`'s suspend is trivial & lacks suspend intrinsics. For cleanup
+// we branch to the same location as `await_resume`, but diverge later.
+
+// CHECK: await.suspend:
+// CHECK-NEXT: call void @_Z28test_single_destroying_awaitb.__await_suspend_wrapper__await(ptr %{{.+}}, ptr %[[FRAME]])
+// CHECK-NEXT: br label %[[CO_AWAIT_CLEANUP:.+]]
+
+// When ready, call `await_resume` :
+
+// CHECK: await.ready:
+// CHECK-NEXT: call{{.*}} @_ZN11BaseAwaiter12await_resumeEv(ptr{{.*}} %{{.+}})
+// CHECK-NEXT: br label %[[CO_AWAIT_CLEANUP]]
+
+// Further cleanup is conditional on whether we did "ready" or "suspend":
+
+// CHECK: [[CO_AWAIT_CLEANUP]]: ; preds = %await.ready, %await.suspend
+// CHECK-NEXT: %[[CLEANUP_PHI:.+]] = phi i32 [ 0, %await.ready ], [ 2, %await.suspend ]
+// CHECK: switch i32 %[[CLEANUP_PHI]], label %[[ON_AWAIT_SUSPEND:.+]] [
+// CHECK: i32 0, label %[[ON_AWAIT_READY:.+]]
+// CHECK: ]
+
+// On "ready", we `co_return` and do final suspend (not shown).
+
+// CHECK: [[ON_AWAIT_READY]]: ; preds = %[[CO_AWAIT_CLEANUP]]
+// CHECK-NEXT: call void @_ZN15BasePromiseTypeI17NonSuspendingTaskE11return_voidEv(
+// CHECK-NEXT: br label %coro.final
+//
+// CHECK: coro.final: ; preds = %[[ON_AWAIT_READY]]
+//
+// ... here, we handle final suspend, and eventually ...
+//
+// CHECK: br label %[[ON_AWAIT_SUSPEND]]
+
+// This [[ON_AWAIT_SUSPEND]] is actually the "destroy scope" code path,
+// including conditional `operator delete`, which will be elided.
+
+// CHECK: [[ON_AWAIT_SUSPEND]]:
+// CHECK: %[[HEAP_OR_NULL:.+]] = call ptr @llvm.coro.free(token %[[CORO_ID]], ptr %[[FRAME]])
+// CHECK-NEXT: %[[NON_NULL:.+]] = icmp ne ptr %[[HEAP_OR_NULL]], null
+// CHECK-NEXT: br i1 %[[NON_NULL]], label %coro.free, label %after.coro.free
+
+// The `operator delete()` call will be removed by optimizations.
+
+// CHECK: coro.free:
+// CHECK-NEXT: %[[CORO_SIZE:.+]] = call i64 @llvm.coro.size.i64()
+// CHECK-NEXT: call void @_ZdlPvm(ptr noundef %[[HEAP_OR_NULL]], i64 noundef %[[CORO_SIZE]])
+// CHECK-NEXT: br label %after.coro.free
+
+// CHECK: after.coro.free:
+//
+// ... Not shown: Coro teardown finishes, and if we handle normal return vs
+// exception.
+
+// Don't let the matchers skip past the end of `test_single_destroying_await()`
+
+// CHECK: }
+
+// The optimized IR is thankfully brief.
+
+// CHECK-OPT: define{{.*}} void @_Z28test_single_destroying_awaitb({{.*}} {
+// CHECK-OPT-NEXT: entry:
+// CHECK-OPT-NEXT: ret void
+// CHECK-OPT-NEXT: }
+
+///////////////////////////////////////////////////////////////////////////////
+// The subsequent tests variations on the above theme. For brevity, they do not
+// repeat the above coroutine skeleton, but merely check for heap allocations.
+///////////////////////////////////////////////////////////////////////////////
+
+// Multiple `co_await`s, all with `coro_await_suspend_destroy`.
+NonSuspendingTask test_multiple_destroying_awaits(bool ready, bool condition) {
+  co_await DestroyingAwaiter{ready};
+  co_await MaybeSuspendingAwaiter{ready}; // Destroys `NonSuspendingTask`
+  if (condition) {
+    co_await DestroyingAwaiter{ready};
+  }
 }
 
-// CHECK-INITIAL-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv
-// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
-// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+// The unlowered IR has heaps allocs, but the optimized IR does not.
 
-// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z28test_single_destroying_awaitv
-// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+// CHECK-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
+// CHECK: call{{.*}} @_Znwm
+// CHECK: call{{.*}} @_ZdlPvm
+// CHECK: }
 
-// Test multiple `co_await`s, all with `coro_await_suspend_destroy`.
-// This should also result in no allocation after optimization.
-Task test_multiple_destroying_awaits(bool condition) {
-  co_await DestroyingAwaitable{};
-  co_await DestroyingAwaitable{};
+// CHECK-OPT-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
+// CHECK-OPT-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPT-NOT: call{{.*}} malloc
+// CHECK-OPT-NOT: call{{.*}} @_Znwm
+// CHECK-OPT: }
+
+// Same behavior as `test_multiple_destroying_awaits`, but with a
+// `MaybeSuspendingTask`, and without a `MaybeSuspendingAwaiter`.
+NonSuspendingTask test_multiple_destroying_awaits_too(bool ready, bool condition) {
+  co_await DestroyingAwaiter{ready};
+  co_await MaybeSuspendingAwaiter{ready}; // Destroys `NonSuspendingTask`
   if (condition) {
-    co_await DestroyingAwaitable{};
+    co_await DestroyingAwaiter{ready};
   }
 }
 
-// CHECK-INITIAL-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
-// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
-// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+// The unlowered IR has heaps allocs, but the optimized IR does not.
+
+// CHECK-LABEL: define{{.*}} void @_Z35test_multiple_destroying_awaits_toob
+// CHECK: call{{.*}} @_Znwm
+// CHECK: call{{.*}} @_ZdlPvm
+// CHECK: }
 
-// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z31test_multiple_destroying_awaitsb
-// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+// CHECK-OPT-LABEL: define{{.*}} void @_Z35test_multiple_destroying_awaits_toob
+// CHECK-OPT-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPT-NOT: call{{.*}} malloc
+// CHECK-OPT-NOT: call{{.*}} @_Znwm
+// CHECK-OPT: }
 
 // Mixed awaits - some with `coro_await_suspend_destroy`, some without.
-// We should still see allocation because not all awaits destroy the coroutine.
-Task test_mixed_awaits() {
-  co_await NormalAwaitable{}; // Must precede "destroy" to be reachable
-  co_await DestroyingAwaitable{};
+MaybeSuspendingTask test_mixed_awaits(bool ready) {
+  co_await MaybeSuspendingAwaiter{ready}; // Suspends `MaybeSuspendingTask`
+  co_await DestroyingAwaiter{ready};
 }
 
-// CHECK-INITIAL-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv
-// CHECK-INITIAL: call{{.*}} @llvm.coro.alloc
-// CHECK-INITIAL: call{{.*}} @llvm.coro.begin
+// Both the unlowered & optimized IR have a heap allocation because not all
+// awaits destroy the coroutine.
 
-// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z17test_mixed_awaitsv
-// CHECK-OPTIMIZED: call{{.*}} @_Znwm
+// CHECK-INITIAL-LABEL: define{{.*}} void @_Z17test_mixed_awaitsb
+// CHECK: call{{.*}} @_Znwm
+// CHECK: call{{.*}} @_ZdlPvm
+// CHECK: }
 
+// CHECK-OPT-LABEL: define{{.*}} void @_Z17test_mixed_awaitsb
+// CHECK-OPT: call{{.*}} @_Znwm
+// CHECK-OPT: call{{.*}} @_ZdlPvm
+// CHECK-OPT: }
 
-// Check the attribute detection affects control flow.  
-Task test_attribute_detection() {
-  co_await DestroyingAwaitable{};
+MaybeSuspendingTask test_unreachable_normal_suspend(bool ready) {
+  co_await DestroyingAwaiter{false};
   // Unreachable in OPTIMIZED, so those builds don't see an allocation.
-  co_await NormalAwaitable{};
+  co_await MaybeSuspendingAwaiter{ready}; // Would suspend `MaybeSuspendingTask`
 }
 
-// Check that we skip the normal suspend intrinsic and go directly to cleanup.
-//
-// CHECK-INITIAL-LABEL: define{{.*}} void @_Z24test_attribute_detectionv
-// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await
-// CHECK-INITIAL-NEXT: br label %cleanup5
-// CHECK-INITIAL-NOT: call{{.*}} @llvm.coro.suspend
-// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__await
-// CHECK-INITIAL: call{{.*}} @llvm.coro.suspend
-// CHECK-INITIAL: call{{.*}} @_Z24test_attribute_detectionv.__await_suspend_wrapper__final
-
-// Since `co_await DestroyingAwaitable{}` gets converted into an unconditional
-// branch, the `co_await NormalAwaitable{}` is unreachable in optimized builds.
-// 
-// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
-
-// Template awaitable with `coro_await_suspend_destroy` attribute
+// The unlowered IR has heaps allocs, but the optimized IR does not, since
+// `co_await DestroyingAwaiter{false}` is effectively a `co_return`.
+
+// CHECK-LABEL: define{{.*}} void @_Z31test_unreachable_normal_suspendb
+// CHECK: call{{.*}} @_Znwm
+// CHECK: call{{.*}} @_ZdlPvm
+// CHECK: }
+
+// CHECK-OPT-LABEL: define{{.*}} void @_Z31test_unreachable_normal_suspendb
+// CHECK-OPT-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPT-NOT: call{{.*}} malloc
+// CHECK-OPT-NOT: call{{.*}} @_Znwm
+// CHECK-OPT: }
+
+// Template awaitable with `coro_await_suspend_destroy` attribute. Checks for
+// bugs where we don't handle dependent types appropriately.
 template<typename T>
-struct [[clang::coro_await_suspend_destroy]] TemplateDestroyingAwaitable {
-  bool await_ready() { return false; }
+struct TemplateDestroyingAwaiter : BaseAwaiter {
   void await_suspend_destroy(auto& promise) {}
-  void await_suspend(auto handle) {
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-  void await_resume() {}
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(auto handle) { STUB_AWAIT_SUSPEND(handle); }
 };
 
-Task test_template_destroying_await() {
-  co_await TemplateDestroyingAwaitable<int>{};
+template <typename T>
+NonSuspendingTask test_template_destroying_await(bool ready) {
+  co_await TemplateDestroyingAwaiter<T>{ready};
 }
 
-// CHECK-OPTIMIZED-LABEL: define{{.*}} void @_Z30test_template_destroying_awaitv
-// CHECK-OPTIMIZED-NOT: call{{.*}} @llvm.coro.alloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} malloc
-// CHECK-OPTIMIZED-NOT: call{{.*}} @_Znwm
+template NonSuspendingTask test_template_destroying_await<int>(bool ready);
+
+// CHECK-LABEL: define{{.*}} void @_Z30test_template_destroying_awaitIiE17NonSuspendingTaskb
+// CHECK: call{{.*}} @_Znwm
+// CHECK: call{{.*}} @_ZdlPvm
+// CHECK: }
+
+// CHECK-OPT-LABEL: define{{.*}} void @_Z30test_template_destroying_awaitIiE17NonSuspendingTaskb
+// CHECK-OPT-NOT: call{{.*}} @llvm.coro.alloc
+// CHECK-OPT-NOT: call{{.*}} malloc
+// CHECK-OPT-NOT: call{{.*}} @_Znwm
+// CHECK-OPT: }
diff --git a/clang/test/CodeGenCoroutines/issue148380.cpp b/clang/test/CodeGenCoroutines/issue148380.cpp
new file mode 100644
index 0000000000000..a3dc429f20b64
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/issue148380.cpp
@@ -0,0 +1,42 @@
+// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
+// RUN:   -O2 | FileCheck %s --check-prefix=CHECK-OPT
+
+// This test just confirms that `[[clang::coro_await_suspend_destroy]]` works
+// around the optimization problem from PR148380.
+//
+// See `coro-await-suspend-destroy.cpp` for a test showing the detailed control
+// flow in un-lowered, un-optimized IR.
+
+#include "Inputs/coroutine.h"
+
+struct coro {
+  struct promise_type {
+    auto get_return_object() { return coro{}; }
+    auto initial_suspend() noexcept { return std::suspend_never{}; }
+    auto final_suspend() noexcept { return std::suspend_never{}; }
+    auto unhandled_exception() {}
+    auto return_void() {}
+  };
+
+  auto await_ready() { return false; }
+  void await_suspend_destroy(auto& promise) {}
+  [[clang::coro_await_suspend_destroy]] auto await_suspend(auto handle) {
+    // The attribute causes this stub not to be called.  Instead, we call
+    // `await_suspend_destroy()`, as on the next line.
+    await_suspend_destroy(handle.promise());
+    handle.destroy();
+  }
+  auto await_resume() {}
+};
+
+coro f1() noexcept;
+coro f2() noexcept
+{
+    co_await f1();
+}
+
+// CHECK-OPT: define{{.+}} void @_Z2f2v({{.+}} {
+// CHECK-OPT-NEXT: entry:
+// CHECK-OPT-NEXT: tail call void @_Z2f1v()
+// CHECK-OPT-NEXT: ret void
+// CHECK-OPT-NEXT: }
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index 4c1f3d3a1fc66..830c681303824 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -63,7 +63,7 @@
 // CHECK-NEXT: Convergent (SubjectMatchRule_function)
 // CHECK-NEXT: CoroAwaitElidable (SubjectMatchRule_record)
 // CHECK-NEXT: CoroAwaitElidableArgument (SubjectMatchRule_variable_is_parameter)
-// CHECK-NEXT: CoroAwaitSuspendDestroy (SubjectMatchRule_record)
+// CHECK-NEXT: CoroAwaitSuspendDestroy (SubjectMatchRule_function_is_member)
 // CHECK-NEXT: CoroDisableLifetimeBound (SubjectMatchRule_function)
 // CHECK-NEXT: CoroLifetimeBound (SubjectMatchRule_record)
 // CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record)
diff --git a/clang/test/SemaCXX/coro-await-suspend-destroy-errors.cpp b/clang/test/SemaCXX/coro-await-suspend-destroy-errors.cpp
new file mode 100644
index 0000000000000..3666fa0e28d20
--- /dev/null
+++ b/clang/test/SemaCXX/coro-await-suspend-destroy-errors.cpp
@@ -0,0 +1,61 @@
+// RUN: %clang_cc1 -std=c++20 -verify %s
+
+#include "Inputs/std-coroutine.h"
+
+// Coroutine type with `std::suspend_never` for initial/final suspend
+struct Task {
+  struct promise_type {
+    Task get_return_object() { return {}; }
+    std::suspend_never initial_suspend() { return {}; }
+    std::suspend_never final_suspend() noexcept { return {}; }
+    void return_void() {}
+    void unhandled_exception() {}
+  };
+};
+
+struct WrongReturnTypeAwaitable {
+  bool await_ready() { return false; }
+  bool await_suspend_destroy(auto& promise) { return true; } // expected-error {{return type of 'await_suspend_destroy' is required to be 'void' (have 'bool')}}
+  [[clang::coro_await_suspend_destroy]] 
+  bool await_suspend(auto handle) {}
+  void await_resume() {}
+};
+
+Task test_wrong_return_type() {
+  co_await WrongReturnTypeAwaitable{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}}
+}
+
+struct NoSuchMemberAwaitable {
+  bool await_ready() { return false; }
+  [[clang::coro_await_suspend_destroy]] 
+  void await_suspend(auto handle) {}
+  void await_resume() {}
+};
+
+Task test_no_method() {
+  co_await NoSuchMemberAwaitable{}; // expected-error {{no member named 'await_suspend_destroy' in 'NoSuchMemberAwaitable'}}
+}
+
+struct WrongOverloadAwaitable {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(int x) {} // expected-note {{passing argument to parameter 'x' here}}
+  [[clang::coro_await_suspend_destroy]] 
+  void await_suspend(auto handle) {}
+  void await_resume() {}
+};
+
+Task test_wrong_overload() {
+  co_await WrongOverloadAwaitable{}; // expected-error {{no viable conversion from 'std::coroutine_traits<Task>::promise_type' (aka 'typename Task::promise_type') to 'int'}}
+}
+
+struct ReturnTypeMismatchAwaiter {
+  bool await_ready() { return false; }
+  void await_suspend_destroy(auto& promise) {} // expected-error {{return type of 'await_suspend' ('bool') must match return type of 'await_suspend_destroy' ('void')}}
+  [[clang::coro_await_suspend_destroy]] 
+  bool await_suspend(auto handle) { return true; }
+  void await_resume() {}
+};
+
+Task test_return_type_mismatch() {
+  co_await ReturnTypeMismatchAwaiter{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}}
+}

>From 72274d2346ccd5091770717f6be682c22a0e4194 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Mon, 18 Aug 2025 22:42:28 -0700
Subject: [PATCH 14/15] Fix bad merge & some doc backticks

---
 clang/docs/ReleaseNotes.rst                   | 19 +++++++++++++++----
 clang/include/clang/Basic/AttrDocs.td         | 19 ++++++++++---------
 .../clang/Basic/DiagnosticSemaKinds.td        | 12 +++++++++---
 3 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 2da187c6be627..f239fd7be01fa 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -137,6 +137,12 @@ Non-comprehensive list of changes in this release
 - ``__builtin_elementwise_max`` and ``__builtin_elementwise_min`` functions for integer types can
   now be used in constant expressions.
 
+- Use of ``__has_feature`` to detect the ``ptrauth_qualifier`` and ``ptrauth_intrinsics``
+  features has been deprecated, and is restricted to the arm64e target only. The
+  correct method to check for these features is to test for the ``__PTRAUTH__``
+  macro.
+
+
 New Compiler Flags
 ------------------
 - New option ``-fno-sanitize-annotate-debug-info-traps`` added to disable emitting trap reasons into the debug info when compiling with trapping UBSan (e.g. ``-fsanitize-trap=undefined``).
@@ -154,10 +160,10 @@ Attribute Changes in Clang
 --------------------------
 
 - Introduced a new attribute ``[[clang::coro_await_suspend_destroy]]``.  When
-  applied to an `await_suspend(std::coroutine_handle<Promise>)` member of a
+  applied to an ``await_suspend(std::coroutine_handle<Promise>)`` member of a
   coroutine awaiter, it causes suspensions into this awaiter to use a new
-  `await_suspend_destroy(Promise&)` method.  The coroutine is then immediately
-  destroyed.  This flow bypasses the original `await_suspend()` (though it
+  ``await_suspend_destroy(Promise&)`` method.  The coroutine is then immediately
+  destroyed.  This flow bypasses the original ``await_suspend()`` (though it
   must contain a compatibility stub), and omits suspend intrinsics.  The net
   effect is improved code speed & size for "short-circuiting" coroutines.
 
@@ -170,6 +176,8 @@ Improvements to Clang's diagnostics
   an override of a virtual method.
 - Fixed fix-it hint for fold expressions. Clang now correctly places the suggested right
   parenthesis when diagnosing malformed fold expressions. (#GH151787)
+- ``-Wstring-concatenation`` now diagnoses every missing comma in an initializer list,
+  rather than stopping after the first. (#GH153745)
 
 - Fixed an issue where emitted format-signedness diagnostics were not associated with an appropriate
   diagnostic id. Besides being incorrect from an API standpoint, this was user visible, e.g.:
@@ -181,7 +189,7 @@ Improvements to Clang's diagnostics
   "format specifies type 'unsigned int' but the argument has type 'int', which differs in signedness [-Wformat-signedness]"
   "signedness of format specifier 'u' is incompatible with 'c' [-Wformat-signedness]"
   and the API-visible diagnostic id will be appropriate.
-
+  
 - Fixed false positives in ``-Waddress-of-packed-member`` diagnostics when
   potential misaligned members get processed before they can get discarded.
   (#GH144729)
@@ -201,6 +209,8 @@ Bug Fixes in This Version
   targets that treat ``_Float16``/``__fp16`` as native scalar types. Previously
   the warning was silently lost because the operands differed only by an implicit
   cast chain. (#GH149967).
+- Fixed a crash with incompatible pointer to integer conversions in designated
+  initializers involving string literals. (#GH154046)
 
 Bug Fixes to Compiler Builtins
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -322,6 +332,7 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``SpaceInEmptyBraces`` option and set it to ``Always`` for WebKit style.
 
 libclang
 --------
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 19ad3e452fed4..1a6a814147d06 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -9384,14 +9384,14 @@ Each annotated ``await_suspend`` member must contain a compatibility stub:
   }
 
 An awaiter type may provide both annotated and non-annotated overloads of
-`await_suspend()`, as long as each invocation of an annotated overload has a
-corresponding `await_suspend_destroy(Promise&)` overload.
+``await_suspend()``, as long as each invocation of an annotated overload has a
+corresponding ``await_suspend_destroy(Promise&)`` overload.
 
 Instead of calling the annotated ``await_suspend()``, the coroutine calls
 ``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine
 ``await_suspend_destroy()`` must return ``void`` (Note: if desired, it
 would be straightforward to also support the "symmetric transfer"
-`std::coroutine_handle` return type.)
+``std::coroutine_handle`` return type.)
 
 This optimization improves code speed and size for "short-circuiting"
 coroutines — those that use coroutine syntax **exclusively** for early returns
@@ -9433,7 +9433,7 @@ the coroutine.  User code can only access the handle via suspend intrinsics,
 and annotated short-circuiting awaiters simply don't use any.
 
 Note that a short-circuiting coroutine differs in one important way from a
-function that replaced each `co_await awaiter` with explicit control flow:
+function that replaced each ``co_await awaiter`` with explicit control flow:
 
 .. code-block:: c++
 
@@ -9445,13 +9445,14 @@ function that replaced each `co_await awaiter` with explicit control flow:
     return /* early-termination return object */;
   }
 
-That key difference is that `unhandled_exception()` lets the promise type wrap
-the function body in an implicit try-catch.  This automatic exception boundary
-behavior can be desirable in robust, return-value-oriented programs that
-benefit from short-circuiting coroutines.  If not, the promise can re-throw.
+That key difference is that ``unhandled_exception()`` lets the promise type
+wrap the function body in an implicit try-catch.  This automatic exception
+boundary behavior can be desirable in robust, return-value-oriented programs
+that benefit from short-circuiting coroutines.  If not, the promise can
+re-throw.
 
 Here is an example of a short-circuiting awaiter for a hypothetical
-`std::optional` coroutine:
+``std::optional`` coroutine:
 
 .. code-block:: c++
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 8b2c187313765..a151de47d2657 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -10671,9 +10671,15 @@ def warn_dangling_reference_captured_by_unknown : Warning<
    "object whose reference is captured will be destroyed at the end of "
    "the full-expression">, InGroup<DanglingCapture>;
 
-def warn_experimental_lifetime_safety_dummy_warning : Warning<
-   "todo: remove this warning after we have atleast one warning based on the lifetime analysis">, 
-   InGroup<LifetimeSafety>, DefaultIgnore;
+// Diagnostics based on the Lifetime safety analysis.
+def warn_lifetime_safety_loan_expires_permissive : Warning<
+   "object whose reference is captured does not live long enough">, 
+   InGroup<LifetimeSafetyPermissive>, DefaultIgnore;
+def warn_lifetime_safety_loan_expires_strict : Warning<
+   "object whose reference is captured may not live long enough">,
+   InGroup<LifetimeSafetyStrict>, DefaultIgnore;
+def note_lifetime_safety_used_here : Note<"later used here">;
+def note_lifetime_safety_destroyed_here : Note<"destroyed here">;
 
 // For non-floating point, expressions of the form x == x or x != x
 // should result in a warning, since these always evaluate to a constant.

>From 543bf07df36a2f10f5fcb356f3150e779a9858d2 Mon Sep 17 00:00:00 2001
From: Alexey <snarkmaster at gmail.com>
Date: Mon, 18 Aug 2025 23:18:48 -0700
Subject: [PATCH 15/15] Remove another leftover file from bad merge

---
 .../coro-await-suspend-destroy-errors.cpp     | 55 -------------------
 1 file changed, 55 deletions(-)
 delete mode 100644 clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp

diff --git a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp b/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
deleted file mode 100644
index 6a082c15f2581..0000000000000
--- a/clang/test/CodeGenCoroutines/coro-await-suspend-destroy-errors.cpp
+++ /dev/null
@@ -1,55 +0,0 @@
-// RUN: %clang_cc1 -std=c++20 -verify %s 
-
-#include "Inputs/coroutine.h"
-
-// Coroutine type with `std::suspend_never` for initial/final suspend
-struct Task {
-  struct promise_type {
-    Task get_return_object() { return {}; }
-    std::suspend_never initial_suspend() { return {}; }
-    std::suspend_never final_suspend() noexcept { return {}; }
-    void return_void() {}
-    void unhandled_exception() {}
-  };
-};
-
-struct [[clang::coro_await_suspend_destroy]] WrongReturnTypeAwaitable {
-  bool await_ready() { return false; }
-  bool await_suspend_destroy(auto& promise) { return true; } // expected-error {{return type of 'await_suspend_destroy' is required to be 'void' (have 'bool')}}
-  void await_suspend(auto handle) {
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-  void await_resume() {}
-};
-
-Task test_invalid_destroying_await() {
-  co_await WrongReturnTypeAwaitable{}; // expected-note {{call to 'await_suspend_destroy<Task::promise_type>' implicitly required by coroutine function here}}
-}
-
-struct [[clang::coro_await_suspend_destroy]] MissingMethodAwaitable {
-  bool await_ready() { return false; }
-  // Missing await_suspend_destroy method
-  void await_suspend(auto handle) {
-    handle.destroy();
-  }
-  void await_resume() {}
-};
-
-Task test_missing_method() {
-  co_await MissingMethodAwaitable{}; // expected-error {{no member named 'await_suspend_destroy' in 'MissingMethodAwaitable'}}
-}
-
-struct [[clang::coro_await_suspend_destroy]] WrongParameterTypeAwaitable {
-  bool await_ready() { return false; }
-  void await_suspend_destroy(int x) {} // expected-note {{passing argument to parameter 'x' here}}
-  void await_suspend(auto handle) {
-    await_suspend_destroy(handle.promise());
-    handle.destroy();
-  }
-  void await_resume() {}
-};
-
-Task test_wrong_parameter_type() {
-  co_await WrongParameterTypeAwaitable{}; // expected-error {{no viable conversion from 'std::coroutine_traits<Task>::promise_type' (aka 'Task::promise_type') to 'int'}}
-}