[clang] [llvm] Reland: [llvm][clang] Allocate a new stack instead of spawning a new thread to get more stack space (PR #136046)

Wed Apr 16 15:35:05 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Michael Spencer (Bigcheese)

<details>
<summary>Changes</summary>

Reland https://github.com/llvm/llvm-project/pull/133173. This changes the assembly code to use `.cfi_{start,end}proc` directly in a file scope asm statement and restricts enabling it to MachO to help ensure that the assembly is only used where it's valid. The previous code had issues with the `-fno-asynchronous-unwind-tables -g0` configuration due to naked functions being responsible for emitting `.cfi_{start,end}proc`. There's not a good way to detect if CFI is enabled in general, so we always emit it. Removing CFI is only done as a size optimization, so having it for one function is fine.

GCC actually does have `__GCC_HAVE_DWARF2_CFI_ASM` when it's emitting CFI directives, but Clang does not. compiler-rt just always emits CFI directives when built using Clang.

This also includes the other follow up fixes for GCC compatibility and fixing thread.h

---

Clang spawns a new thread to avoid running out of stack space. This can make debugging and performance analysis more difficult as how the threads are connected is difficult to recover.

This patch introduces `runOnNewStack` and applies it in Clang. On platforms that have good support for it this allocates a new stack and moves to it using assembly. Doing split stacks like this actually runs on most platforms, but many debuggers and unwinders reject the large or backwards stack offsets that occur. Apple platforms and tools are known to support this, so this only enables it there for now.

---
Full diff: https://github.com/llvm/llvm-project/pull/136046.diff


12 Files Affected:

- (modified) clang/docs/ReleaseNotes.rst (+4) 
- (modified) clang/include/clang/Basic/Stack.h (+4-1) 
- (modified) clang/lib/Basic/Stack.cpp (+12-28) 
- (modified) clang/lib/Frontend/CompilerInstance.cpp (+1-1) 
- (modified) llvm/include/llvm/Support/CrashRecoveryContext.h (+3) 
- (added) llvm/include/llvm/Support/ProgramStack.h (+63) 
- (modified) llvm/include/llvm/Support/thread.h (+1) 
- (modified) llvm/lib/Support/CMakeLists.txt (+1) 
- (modified) llvm/lib/Support/CrashRecoveryContext.cpp (+11) 
- (added) llvm/lib/Support/ProgramStack.cpp (+127) 
- (modified) llvm/unittests/Support/CMakeLists.txt (+1) 
- (added) llvm/unittests/Support/ProgramStackTest.cpp (+35) 


``````````diff

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 07ff1251fc1ad..7d714d93ce854 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -195,6 +195,10 @@ Non-comprehensive list of changes in this release
 - Added `__builtin_elementwise_exp10`.
 - For AMDPGU targets, added `__builtin_v_cvt_off_f32_i4` that maps to the `v_cvt_off_f32_i4` instruction.
 - Added `__builtin_elementwise_minnum` and `__builtin_elementwise_maxnum`.
+- Clang itself now uses split stacks instead of threads for allocating more
+  stack space when running on Apple AArch64 based platforms. This means that
+  stack traces of Clang from debuggers, crashes, and profilers may look
+  different than before.
 
 New Compiler Flags
 ------------------
diff --git a/clang/include/clang/Basic/Stack.h b/clang/include/clang/Basic/Stack.h
index 30ebd94aedd1f..9674b9d9b62c3 100644
--- a/clang/include/clang/Basic/Stack.h
+++ b/clang/include/clang/Basic/Stack.h
@@ -27,7 +27,10 @@ namespace clang {
 
   /// Call this once on each thread, as soon after starting the thread as
   /// feasible, to note the approximate address of the bottom of the stack.
-  void noteBottomOfStack();
+  ///
+  /// \param ForceSet set to true if you know the call is near the bottom of a
+  ///                 new stack. Used for split stacks.
+  void noteBottomOfStack(bool ForceSet = false);
 
   /// Determine whether the stack is nearly exhausted.
   bool isStackNearlyExhausted();
diff --git a/clang/lib/Basic/Stack.cpp b/clang/lib/Basic/Stack.cpp
index aa15d8e66950f..8cbb84943f8d3 100644
--- a/clang/lib/Basic/Stack.cpp
+++ b/clang/lib/Basic/Stack.cpp
@@ -13,33 +13,13 @@
 
 #include "clang/Basic/Stack.h"
 #include "llvm/Support/CrashRecoveryContext.h"
+#include "llvm/Support/ProgramStack.h"
 
-#ifdef _MSC_VER
-#include <intrin.h>  // for _AddressOfReturnAddress
-#endif
+static LLVM_THREAD_LOCAL uintptr_t BottomOfStack = 0;
 
-static LLVM_THREAD_LOCAL void *BottomOfStack = nullptr;
-
-static void *getStackPointer() {
-#if __GNUC__ || __has_builtin(__builtin_frame_address)
-  return __builtin_frame_address(0);
-#elif defined(_MSC_VER)
-  return _AddressOfReturnAddress();
-#else
-  char CharOnStack = 0;
-  // The volatile store here is intended to escape the local variable, to
-  // prevent the compiler from optimizing CharOnStack into anything other
-  // than a char on the stack.
-  //
-  // Tested on: MSVC 2015 - 2019, GCC 4.9 - 9, Clang 3.2 - 9, ICC 13 - 19.
-  char *volatile Ptr = &CharOnStack;
-  return Ptr;
-#endif
-}
-
-void clang::noteBottomOfStack() {
-  if (!BottomOfStack)
-    BottomOfStack = getStackPointer();
+void clang::noteBottomOfStack(bool ForceSet) {
+  if (!BottomOfStack || ForceSet)
+    BottomOfStack = llvm::getStackPointer();
 }
 
 bool clang::isStackNearlyExhausted() {
@@ -51,7 +31,8 @@ bool clang::isStackNearlyExhausted() {
   if (!BottomOfStack)
     return false;
 
-  intptr_t StackDiff = (intptr_t)getStackPointer() - (intptr_t)BottomOfStack;
+  intptr_t StackDiff =
+      (intptr_t)llvm::getStackPointer() - (intptr_t)BottomOfStack;
   size_t StackUsage = (size_t)std::abs(StackDiff);
 
   // If the stack pointer has a surprising value, we do not understand this
@@ -66,9 +47,12 @@ bool clang::isStackNearlyExhausted() {
 void clang::runWithSufficientStackSpaceSlow(llvm::function_ref<void()> Diag,
                                             llvm::function_ref<void()> Fn) {
   llvm::CrashRecoveryContext CRC;
-  CRC.RunSafelyOnThread([&] {
-    noteBottomOfStack();
+  // Preserve the BottomOfStack in case RunSafelyOnNewStack uses split stacks.
+  uintptr_t PrevBottom = BottomOfStack;
+  CRC.RunSafelyOnNewStack([&] {
+    noteBottomOfStack(true);
     Diag();
     Fn();
   }, DesiredStackSize);
+  BottomOfStack = PrevBottom;
 }
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp
index 243e0a3c15b05..5fe80fc16482e 100644
--- a/clang/lib/Frontend/CompilerInstance.cpp
+++ b/clang/lib/Frontend/CompilerInstance.cpp
@@ -1265,7 +1265,7 @@ bool CompilerInstance::compileModule(SourceLocation ImportLoc,
 
   // Execute the action to actually build the module in-place. Use a separate
   // thread so that we get a stack large enough.
-  bool Crashed = !llvm::CrashRecoveryContext().RunSafelyOnThread(
+  bool Crashed = !llvm::CrashRecoveryContext().RunSafelyOnNewStack(
       [&]() {
         GenerateModuleFromModuleMapAction Action;
         Instance.ExecuteAction(Action);
diff --git a/llvm/include/llvm/Support/CrashRecoveryContext.h b/llvm/include/llvm/Support/CrashRecoveryContext.h
index 26ddf97b3ef02..31293d6715757 100644
--- a/llvm/include/llvm/Support/CrashRecoveryContext.h
+++ b/llvm/include/llvm/Support/CrashRecoveryContext.h
@@ -97,6 +97,9 @@ class CrashRecoveryContext {
     return RunSafelyOnThread([&]() { Fn(UserData); }, RequestedStackSize);
   }
 
+  bool RunSafelyOnNewStack(function_ref<void()>,
+                           unsigned RequestedStackSize = 0);
+
   /// Explicitly trigger a crash recovery in the current process, and
   /// return failure from RunSafely(). This function does not return.
   [[noreturn]] void HandleExit(int RetCode);
diff --git a/llvm/include/llvm/Support/ProgramStack.h b/llvm/include/llvm/Support/ProgramStack.h
new file mode 100644
index 0000000000000..55964c9779209
--- /dev/null
+++ b/llvm/include/llvm/Support/ProgramStack.h
@@ -0,0 +1,63 @@
+//===--- ProgramStack.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_PROGRAMSTACK_H
+#define LLVM_SUPPORT_PROGRAMSTACK_H
+
+#include "llvm/ADT/STLFunctionalExtras.h"
+
+// LLVM_HAS_SPLIT_STACKS is exposed in the header because CrashRecoveryContext
+// needs to know if it's running on another thread or not.
+//
+// Currently only Apple AArch64 is known to support split stacks in the debugger
+// and other tooling.
+#if defined(__APPLE__) && defined(__MACH__) && defined(__aarch64__) &&         \
+    __has_extension(gnu_asm)
+# define LLVM_HAS_SPLIT_STACKS
+# define LLVM_HAS_SPLIT_STACKS_AARCH64
+#endif
+
+namespace llvm {
+
+/// \returns an address close to the current value of the stack pointer.
+///
+/// The value is not guaranteed to point to anything specific. It can be used to
+/// estimate how much stack space has been used since the previous call.
+uintptr_t getStackPointer();
+
+/// \returns the default stack size for this platform.
+///
+/// Based on \p RLIMIT_STACK or the equivalent.
+unsigned getDefaultStackSize();
+
+/// Runs Fn on a new stack of at least the given size.
+///
+/// \param StackSize requested stack size. A size of 0 uses the default stack
+///                  size of the platform.
+///
+/// The preferred implementation is split stacks on platforms that have a good
+/// debugging experience for them. On other platforms a new thread is used.
+void runOnNewStack(unsigned StackSize, function_ref<void()> Fn);
+
+template <typename R, typename... Ts>
+std::enable_if_t<!std::is_same_v<R, void>, R>
+runOnNewStack(unsigned StackSize, function_ref<R(Ts...)> Fn, Ts &&...Args) {
+  std::optional<R> Ret;
+  runOnNewStack(StackSize, [&]() { Ret = Fn(std::forward<Ts>(Args)...); });
+  return std::move(*Ret);
+}
+
+template <typename... Ts>
+void runOnNewStack(unsigned StackSize, function_ref<void(Ts...)> Fn,
+                   Ts &&...Args) {
+  runOnNewStack(StackSize, [&]() { Fn(std::forward<Ts>(Args)...); });
+}
+
+} // namespace llvm
+
+#endif // LLVM_SUPPORT_PROGRAMSTACK_H
diff --git a/llvm/include/llvm/Support/thread.h b/llvm/include/llvm/Support/thread.h
index e3005fdb63175..ef2fba822cb1c 100644
--- a/llvm/include/llvm/Support/thread.h
+++ b/llvm/include/llvm/Support/thread.h
@@ -213,6 +213,7 @@ inline thread::id get_id() { return std::this_thread::get_id(); }
 
 #else // !LLVM_ENABLE_THREADS
 
+#include "llvm/Support/ErrorHandling.h"
 #include <utility>
 
 namespace llvm {
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 98ffd829d80b8..def37f3f278d0 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -295,6 +295,7 @@ add_llvm_component_library(LLVMSupport
   Path.cpp
   Process.cpp
   Program.cpp
+  ProgramStack.cpp
   RWMutex.cpp
   Signals.cpp
   Threading.cpp
diff --git a/llvm/lib/Support/CrashRecoveryContext.cpp b/llvm/lib/Support/CrashRecoveryContext.cpp
index f53aea177d612..88c38d7526e71 100644
--- a/llvm/lib/Support/CrashRecoveryContext.cpp
+++ b/llvm/lib/Support/CrashRecoveryContext.cpp
@@ -10,6 +10,7 @@
 #include "llvm/Config/llvm-config.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/ExitCodes.h"
+#include "llvm/Support/ProgramStack.h"
 #include "llvm/Support/Signals.h"
 #include "llvm/Support/thread.h"
 #include <cassert>
@@ -523,3 +524,13 @@ bool CrashRecoveryContext::RunSafelyOnThread(function_ref<void()> Fn,
     CRC->setSwitchedThread();
   return Info.Result;
 }
+
+bool CrashRecoveryContext::RunSafelyOnNewStack(function_ref<void()> Fn,
+                                               unsigned RequestedStackSize) {
+#ifdef LLVM_HAS_SPLIT_STACKS
+  return runOnNewStack(RequestedStackSize,
+                       function_ref<bool()>([&]() { return RunSafely(Fn); }));
+#else
+  return RunSafelyOnThread(Fn, RequestedStackSize);
+#endif
+}
diff --git a/llvm/lib/Support/ProgramStack.cpp b/llvm/lib/Support/ProgramStack.cpp
new file mode 100644
index 0000000000000..66f74ff660b20
--- /dev/null
+++ b/llvm/lib/Support/ProgramStack.cpp
@@ -0,0 +1,127 @@
+//===--- RunOnNewStack.cpp - Crash Recovery -------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/ProgramStack.h"
+#include "llvm/Config/config.h"
+#include "llvm/Support/Compiler.h"
+
+#ifdef LLVM_ON_UNIX
+# include <sys/resource.h> // for getrlimit
+#endif
+
+#ifdef _MSC_VER
+# include <intrin.h>  // for _AddressOfReturnAddress
+#endif
+
+#ifndef LLVM_HAS_SPLIT_STACKS
+# include "llvm/Support/thread.h"
+#endif
+
+using namespace llvm;
+
+uintptr_t llvm::getStackPointer() {
+#if __GNUC__ || __has_builtin(__builtin_frame_address)
+  return (uintptr_t)__builtin_frame_address(0);
+#elif defined(_MSC_VER)
+  return (uintptr_t)_AddressOfReturnAddress();
+#else
+  volatile char CharOnStack = 0;
+  // The volatile store here is intended to escape the local variable, to
+  // prevent the compiler from optimizing CharOnStack into anything other
+  // than a char on the stack.
+  //
+  // Tested on: MSVC 2015 - 2019, GCC 4.9 - 9, Clang 3.2 - 9, ICC 13 - 19.
+  char *volatile Ptr = &CharOnStack;
+  return (uintptr_t)Ptr;
+#endif
+}
+
+unsigned llvm::getDefaultStackSize() {
+#ifdef LLVM_ON_UNIX
+  rlimit RL;
+  getrlimit(RLIMIT_STACK, &RL);
+  return RL.rlim_cur;
+#else
+  // Clang recursively parses, instantiates templates, and evaluates constant
+  // expressions. We've found 8MiB to be a reasonable stack size given the way
+  // Clang works and the way C++ is commonly written.
+  return 8 << 20;
+#endif
+}
+
+// Not an anonymous namespace to avoid warning about undefined local function.
+namespace llvm {
+#ifdef LLVM_HAS_SPLIT_STACKS_AARCH64
+void runOnNewStackImpl(void *Stack, void (*Fn)(void *), void *Ctx) __asm__(
+    "_ZN4llvm17runOnNewStackImplEPvPFvS0_ES0_");
+
+// This can't use naked functions because there is no way to know if cfi
+// directives are being emitted or not.
+//
+// When adding new platforms it may be better to move to a .S file with macros
+// for dealing with platform differences.
+__asm__ (
+    ".globl  _ZN4llvm17runOnNewStackImplEPvPFvS0_ES0_\n\t"
+    ".p2align  2\n\t"
+    "_ZN4llvm17runOnNewStackImplEPvPFvS0_ES0_:\n\t"
+    ".cfi_startproc\n\t"
+    "mov       x16, sp\n\t"
+    "sub       x0, x0, #0x20\n\t"            // subtract space from stack
+    "stp       xzr, x16, [x0, #0x00]\n\t"    // save old sp
+    "stp       x29, x30, [x0, #0x10]\n\t"    // save fp, lr
+    "mov       sp, x0\n\t"                   // switch to new stack
+    "add       x29, x0, #0x10\n\t"           // switch to new frame
+    ".cfi_def_cfa w29, 16\n\t"
+    ".cfi_offset w30, -8\n\t"                // lr
+    ".cfi_offset w29, -16\n\t"               // fp
+
+    "mov       x0, x2\n\t"                   // Ctx is the only argument
+    "blr       x1\n\t"                       // call Fn
+
+    "ldp       x29, x30, [sp, #0x10]\n\t"    // restore fp, lr
+    "ldp       xzr, x16, [sp, #0x00]\n\t"    // load old sp
+    "mov       sp, x16\n\t"
+    "ret\n\t"
+    ".cfi_endproc"
+);
+#endif
+} // namespace llvm
+
+namespace {
+#ifdef LLVM_HAS_SPLIT_STACKS
+void callback(void *Ctx) {
+  (*reinterpret_cast<function_ref<void()> *>(Ctx))();
+}
+#endif
+} // namespace
+
+#ifdef LLVM_HAS_SPLIT_STACKS
+void llvm::runOnNewStack(unsigned StackSize, function_ref<void()> Fn) {
+  if (StackSize == 0)
+    StackSize = getDefaultStackSize();
+
+  // We use malloc here instead of mmap because:
+  //   - it's simpler,
+  //   - many malloc implementations will reuse the allocation in cases where
+  //     we're bouncing accross the edge of a stack boundry, and
+  //   - many malloc implemenations will already provide guard pages for
+  //     allocations this large.
+  void *Stack = malloc(StackSize);
+  void *BottomOfStack = (char *)Stack + StackSize;
+
+  runOnNewStackImpl(BottomOfStack, callback, &Fn);
+
+  free(Stack);
+}
+#else
+void llvm::runOnNewStack(unsigned StackSize, function_ref<void()> Fn) {
+  llvm::thread Thread(
+      StackSize == 0 ? std::nullopt : std::optional<unsigned>(StackSize), Fn);
+  Thread.join();
+}
+#endif
diff --git a/llvm/unittests/Support/CMakeLists.txt b/llvm/unittests/Support/CMakeLists.txt
index 4a12a928af119..76cdc0e11d40c 100644
--- a/llvm/unittests/Support/CMakeLists.txt
+++ b/llvm/unittests/Support/CMakeLists.txt
@@ -71,6 +71,7 @@ add_llvm_unittest(SupportTests
   PerThreadBumpPtrAllocatorTest.cpp
   ProcessTest.cpp
   ProgramTest.cpp
+  ProgramStackTest.cpp
   RecyclerTest.cpp
   RegexTest.cpp
   ReverseIterationTest.cpp
diff --git a/llvm/unittests/Support/ProgramStackTest.cpp b/llvm/unittests/Support/ProgramStackTest.cpp
new file mode 100644
index 0000000000000..31dfb3b88ade6
--- /dev/null
+++ b/llvm/unittests/Support/ProgramStackTest.cpp
@@ -0,0 +1,35 @@
+//===- unittest/Support/ProgramStackTest.cpp ------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/ProgramStack.h"
+#include "llvm/Support/Process.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+
+static uintptr_t func(int &A) {
+  A = 7;
+  return getStackPointer();
+}
+
+static void func2(int &A) {
+  A = 5;
+}
+
+TEST(ProgramStackTest, runOnNewStack) {
+  int A = 0;
+  uintptr_t Stack = runOnNewStack(0, function_ref<uintptr_t(int &)>(func), A);
+  EXPECT_EQ(A, 7);
+  intptr_t StackDiff = (intptr_t)llvm::getStackPointer() - (intptr_t)Stack;
+  size_t StackDistance = (size_t)std::abs(StackDiff);
+  // Page size is used as it's large enough to guarantee were not on the same
+  // stack but not too large to cause spurious failures.
+  EXPECT_GT(StackDistance, llvm::sys::Process::getPageSizeEstimate());
+  runOnNewStack(0, function_ref<void(int &)>(func2), A);
+  EXPECT_EQ(A, 5);
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/136046