[clang] [clang][docs] Add documentation for EH codegen (PR #176236)
Andy Kaylor via cfe-commits
cfe-commits at lists.llvm.org
Fri Jan 16 11:35:22 PST 2026
https://github.com/andykaylor updated https://github.com/llvm/llvm-project/pull/176236
>From 0a3deb68485d7c79aa19be60483e5518956a929b Mon Sep 17 00:00:00 2001
From: Andy Kaylor <akaylor at nvidia.com>
Date: Thu, 15 Jan 2026 12:04:27 -0800
Subject: [PATCH 1/2] [clang][docs] Add documentation for EH codegen
This adds a document describing the implementation of LLVM IR generation
for exceptions and C++ cleanup handling. This will be used as a point of
reference for future CIR exception handling design work.
This document was generated using AI, with some modifications afterwards.
---
clang/docs/LLVMExceptionHandlingCodeGen.rst | 266 ++++++++++++++++++++
clang/docs/index.rst | 1 +
2 files changed, 267 insertions(+)
create mode 100644 clang/docs/LLVMExceptionHandlingCodeGen.rst
diff --git a/clang/docs/LLVMExceptionHandlingCodeGen.rst b/clang/docs/LLVMExceptionHandlingCodeGen.rst
new file mode 100644
index 0000000000000..3dbe2fc6dd618
--- /dev/null
+++ b/clang/docs/LLVMExceptionHandlingCodeGen.rst
@@ -0,0 +1,266 @@
+========================================
+LLVM IR Generation for EH and Cleanups
+========================================
+
+.. contents::
+ :local:
+
+Overview
+========
+
+This document describes how Clang's LLVM IR generation represents exception
+handling (EH) and C++ cleanups. It focuses on the data structures and control
+flow patterns used to model normal and exceptional exits, and it outlines how
+the generated IR differs across common ABI models.
+
+Core Model
+==========
+
+EH and cleanup handling is centered around an ``EHScopeStack`` that records
+nested scopes for:
+
+- **Cleanups**, which run on normal control flow, exceptional control flow, or
+ both. These are used for destructors, full-expression cleanups, and other
+ scope-exit actions.
+- **Catch scopes**, which represent ``try``/``catch`` handlers.
+- **Filter scopes**, used to model dynamic exception specifications and some
+ platform-specific filters.
+- **Terminate scopes**, used for ``noexcept`` and similar termination paths.
+
+Each cleanup is a small object with an ``Emit`` method. When a cleanup scope is
+popped, the IR generator decides whether it must materialize a normal cleanup
+block (for fallthrough, branch-through, or unresolved ``goto`` fixups) and/or an
+EH cleanup entry (when exceptional control flow can reach the cleanup). This
+results in a flattened CFG where cleanup lifetime is represented by the blocks
+and edges that flow into those blocks.
+
+Key Components
+==============
+
+The LLVM IR generation for EH and cleanups is spread across several core
+components:
+
+- ``CodeGenModule`` owns module-wide state such as the LLVM module, target
+ information, and the selected EH personality function. It provides access to
+ ABI helpers via ``CGCXXABI`` and target-specific hooks.
+- ``CodeGenFunction`` manages per-function state and IR building. It owns the
+ ``EHScopeStack``, tracks the current insertion point, and emits blocks, calls,
+ and branches. Most cleanup and EH control flow is built here.
+- ``EHScopeStack`` is the central stack of scopes used to model EH and cleanup
+ semantics. It stores ``EHCleanupScope`` entries for cleanups, along with
+ ``EHCatchScope``, ``EHFilterScope``, and ``EHTerminateScope`` for handlers and
+ termination logic.
+- ``EHCleanupScope`` stores the cleanup object plus state data (active flags,
+ fixup depth, and enclosing scope links). When a cleanup scope is popped,
+ ``CodeGenFunction`` decides whether to emit a normal cleanup block, an EH
+ cleanup entry, or both.
+- Cleanup emission helpers implement the mechanics of branching through
+ cleanups, threading fixups, and emitting cleanup blocks.
+- Exception emission helpers implement landing pads, dispatch blocks,
+ personality selection, and helper routines for try/catch, filters, and
+ terminate handling.
+- ``CGCXXABI`` (and its ABI-specific implementations such as
+ ``ItaniumCXXABI`` and ``MicrosoftCXXABI``) provide ABI-specific lowering for
+ throws, catch handling, and destructor emission details.
+- C++ expression, class, and statement emission logic drives construction and
+ destruction, and is responsible for pushing/popping cleanups in response to
+ AST constructs.
+
+These components interact along a consistent pattern: AST traversal in
+``CodeGenFunction`` emits code and pushes cleanups or EH scopes; ``EHScopeStack``
+records scope nesting; cleanup and exception helpers materialize the CFG as
+scopes are popped; and ``CGCXXABI`` supplies ABI-specific details for landing
+pads or funclets.
+
+Normal Cleanups and Branch Fixups
+=================================
+
+Normal control flow exits (``return``, ``break``, ``goto``, fallthrough, etc.)
+are threaded through cleanups by creating explicit cleanup blocks. The
+implementation supports unresolved branches to labels by emitting an optimistic
+branch and recording a fixup. When a cleanup is popped, fixups are threaded
+through the cleanup by turning that optimistic branch into a switch that
+dispatches to the correct destination after the cleanup runs.
+
+Cleanups use a switch on an internal "cleanup destination" slot even for simple
+source constructs. It is a general mechanism that allows multiple exits to share
+the same cleanup code while still reaching the correct final destination.
+
+Exceptional Cleanups and EH Dispatch
+====================================
+
+Exceptional exits (``throw``, ``invoke`` unwinds) are routed through EH cleanup
+entries, which are reached via a landing pad or a funclet dispatch block,
+depending on the target ABI.
+
+For Itanium-style EH (such as is used on x86-64 Linux), the IR uses ``invoke``
+to call potentially-throwing operations and a ``landingpad`` instruction to
+capture the exception and selector values. The landing pad aggregates the
+in-scope catch, filter, and cleanup clauses, then branches to a dispatch block
+that compares the selector to type IDs and jumps to the appropriate handler.
+
+For Windows, LLVM IR uses funclet-style EH: ``catchswitch`` and ``catchpad`` for
+handlers, and ``cleanuppad`` for cleanups, with ``catchret`` and ``cleanupret``
+edges to resume normal flow. The personality function determines how these pads
+are interpreted by the backend.
+
+Personality and ABI Selection
+=============================
+
+The IR generation selects a personality function based on language options and
+the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This decision affects:
+
+- Whether the IR uses landing pads or funclet pads.
+- The shape of dispatch logic for catch and filter scopes.
+- How termination or rethrow paths are modeled.
+
+Because the personality choice is made during IR generation, the CFG shape
+directly reflects ABI-specific details.
+
+Example: Array of Objects with Throwing Constructor
+===================================================
+
+Consider:
+
+.. code-block:: c++
+
+ class MyClass {
+ public:
+ MyClass(); // may throw
+ ~MyClass();
+ };
+ void doSomething(); // may throw
+ void f() {
+ MyClass arr[4];
+ doSomething();
+ }
+
+High-level behavior
+-------------------
+
+- Construction of ``arr`` proceeds element-by-element. If an element constructor
+ throws, destructors must run for any elements that were successfully
+ constructed before the throw in reverse order of construction.
+- After full construction, the call to ``doSomething`` may throw, in which case
+ the destructors for all constructed elements must run, in reverse order.
+- On normal exit, destructors for all elements run in reverse order.
+
+Codegen flow and key components
+-------------------------------
+
+- ``CodeGenFunction::EmitDecl`` routes the local variable to
+ ``CodeGenFunction::EmitVarDecl`` and then ``CodeGenFunction::EmitAutoVarDecl``,
+ which in turn calls ``EmitAutoVarAlloca``, ``EmitAutoVarInit``, and
+ ``EmitAutoVarCleanups``.
+- ``CodeGenFunction::EmitCXXAggrConstructorCall`` emits the array constructor
+ loop. While emitting the loop body, it enters a ``RunCleanupsScope`` and uses
+ ``CodeGenFunction::pushRegularPartialArrayCleanup`` to register a
+ cleanup before calling ``CodeGenFunction::EmitCXXConstructorCall`` for one
+ element in the loop iteration. If this constructor were to throw an exception,
+ the cleanup handler would destroy the previously constructed elements in
+ reverse order.
+- ``CodeGenFunction::EmitAutoVarCleanups`` calls ``emitAutoVarTypeCleanup``,
+ which ultimately registers a ``DestroyObject`` cleanup via
+ ``CodeGenFunction::pushDestroy`` / ``pushFullExprCleanup`` for the full-array
+ destructor path.
+- ``DestroyObject`` uses ``CodeGenFunction::destroyCXXObject``, which emits the
+ actual destructor call via ``CodeGenFunction::EmitCXXDestructorCall``.
+- Cleanup emission helpers (e.g., ``CodeGenFunction::PopCleanupBlock`` and
+ ``CodeGenFunction::EmitBranchThroughCleanup``) thread both normal and EH exits
+ through the cleanup blocks as scopes are popped.
+- The cleanup is represented as an ``EHCleanupScope`` on ``EHScopeStack``, and
+ its ``Emit`` method generates a loop that calls the destructor on the
+ initialized range in reverse order.
+
+Call-Graph Summary
+------------------
+
+.. code-block:: text
+
+ EmitDecl
+ -> EmitVarDecl
+ -> EmitAutoVarDecl
+ -> EmitAutoVarAlloca
+ -> EmitAutoVarInit
+ -> EmitCXXAggrConstructorCall
+ -> RunCleanupsScope
+ -> pushRegularPartialArrayCleanup
+ -> EmitCXXConstructorCall (per element)
+ -> EmitAutoVarCleanups
+ -> emitAutoVarTypeCleanup
+ -> pushDestroy / pushFullExprCleanup
+ -> DestroyObject cleanup
+ -> destroyCXXObject
+ -> EmitCXXDestructorCall
+
+Example: Temporary object materialization
+=========================================
+
+Consider:
+
+.. code-block:: c++
+
+ class MyClass {
+ public:
+ MyClass();
+ ~MyClass();
+ };
+ void useMyClass(MyClass &);
+ void f() {
+ useMyClass(MyClass());
+ }
+
+High-level behavior
+-------------------
+
+- The temporary ``MyClass`` is materialized for the call argument.
+- The temporary must be destroyed at the end of the full-expression, both on
+ the normal path and on the exceptional path if ``useMyClass`` throws.
+- If the constructor throws, the temporary is not considered constructed and no
+ destructor runs.
+
+Codegen flow and key functions
+------------------------------
+
+- ``CodeGenFunction::EmitExprWithCleanups`` wraps the full-expression in a
+ ``RunCleanupsScope`` so that full-expression cleanups are run after the call.
+- ``CodeGenFunction::EmitMaterializeTemporaryExpr`` creates storage for the
+ temporary via ``createReferenceTemporary`` and initializes it. For record
+ temporaries this flows through ``EmitAnyExprToMem`` and
+ ``CodeGenFunction::EmitCXXConstructExpr``, which calls
+ ``CodeGenFunction::EmitCXXConstructorCall``.
+- ``pushTemporaryCleanup`` registers the destructor as a full-expression
+ cleanup by calling ``CodeGenFunction::pushDestroy`` for
+ ``SD_FullExpression`` temporaries.
+- The cleanup ultimately uses ``DestroyObject`` and
+ ``CodeGenFunction::destroyCXXObject``, which emits
+ ``CodeGenFunction::EmitCXXDestructorCall``.
+- The call to ``useMyClass`` is emitted while the temporary is live, and the
+ cleanup scope ensures the destructor runs on both normal and EH exits.
+
+Call-Graph Summary
+------------------
+
+.. code-block:: text
+
+ EmitExprWithCleanups
+ -> RunCleanupsScope
+ -> EmitMaterializeTemporaryExpr
+ -> createReferenceTemporary
+ -> EmitAnyExprToMem
+ -> EmitCXXConstructExpr
+ -> EmitCXXConstructorCall
+ -> pushTemporaryCleanup
+ -> pushDestroy
+ -> DestroyObject cleanup
+ -> destroyCXXObject
+ -> EmitCXXDestructorCall
+
+Notes on Variations
+===================
+
+The exact shape of generated LLVM IR depends on target ABI, language options,
+and optimization level. For example, filters, ``noexcept`` termination scopes,
+and async EH options can introduce additional dispatch blocks, personality
+selection differences, or outlined helper functions. The patterns above capture
+the essential structure used for EH and cleanup handling on the named targets.
diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index a0d0401ed1c86..c4464c4dbf0a2 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -122,6 +122,7 @@ Design Documents
ControlFlowIntegrityDesign
HardwareAssistedAddressSanitizerDesign.rst
ConstantInterpreter
+ LLVMExceptionHandlingCodeGen
ClangIRCodeDuplication
Indices and tables
>From d924d5ead2934436caf19c80e31b0276be65f824 Mon Sep 17 00:00:00 2001
From: Andy Kaylor <akaylor at nvidia.com>
Date: Fri, 16 Jan 2026 11:34:34 -0800
Subject: [PATCH 2/2] Address review feedback
---
clang/docs/LLVMExceptionHandlingCodeGen.rst | 139 +++++++++-----------
1 file changed, 62 insertions(+), 77 deletions(-)
diff --git a/clang/docs/LLVMExceptionHandlingCodeGen.rst b/clang/docs/LLVMExceptionHandlingCodeGen.rst
index 3dbe2fc6dd618..b3c995f0d60b4 100644
--- a/clang/docs/LLVMExceptionHandlingCodeGen.rst
+++ b/clang/docs/LLVMExceptionHandlingCodeGen.rst
@@ -13,6 +13,9 @@ handling (EH) and C++ cleanups. It focuses on the data structures and control
flow patterns used to model normal and exceptional exits, and it outlines how
the generated IR differs across common ABI models.
+For details on the LLVM IR representation of exception handling, see
+:doc:`LLVM Exception Handling <https://llvm.org/docs/ExceptionHandling.html>`.
+
Core Model
==========
@@ -62,29 +65,44 @@ components:
- ``CGCXXABI`` (and its ABI-specific implementations such as
``ItaniumCXXABI`` and ``MicrosoftCXXABI``) provide ABI-specific lowering for
throws, catch handling, and destructor emission details.
-- C++ expression, class, and statement emission logic drives construction and
- destruction, and is responsible for pushing/popping cleanups in response to
- AST constructs.
-
-These components interact along a consistent pattern: AST traversal in
-``CodeGenFunction`` emits code and pushes cleanups or EH scopes; ``EHScopeStack``
-records scope nesting; cleanup and exception helpers materialize the CFG as
-scopes are popped; and ``CGCXXABI`` supplies ABI-specific details for landing
-pads or funclets.
-
-Normal Cleanups and Branch Fixups
-=================================
-
-Normal control flow exits (``return``, ``break``, ``goto``, fallthrough, etc.)
-are threaded through cleanups by creating explicit cleanup blocks. The
-implementation supports unresolved branches to labels by emitting an optimistic
-branch and recording a fixup. When a cleanup is popped, fixups are threaded
-through the cleanup by turning that optimistic branch into a switch that
-dispatches to the correct destination after the cleanup runs.
-
-Cleanups use a switch on an internal "cleanup destination" slot even for simple
-source constructs. It is a general mechanism that allows multiple exits to share
-the same cleanup code while still reaching the correct final destination.
+- The cleanup and exception handling code generation is driven by the flow of
+ ``CodeGenFunction`` and its helper classes traversing the AST to emit IR for
+ C++ expressions, classes, and statements.
+
+AST traversal in ``CodeGenFunction`` emits code and pushes cleanups or EH scopes,
+``EHScopeStack`` records scope nesting, cleanup and exception helpers materialize
+the CFG as scopes are popped, and ``CGCXXABI`` supplies ABI-specific details for
+landing pads or funclets.
+
+Cleanup Destination Routing
+===========================
+
+When multiple control flow exits (``return``, ``break``, ``continue``,
+fallthrough) pass through the same cleanup, the generated IR shares a single
+cleanup block among them. Before entering the cleanup, each exit path stores a
+unique index into a "cleanup destination" slot. After the cleanup code runs, a
+``switch`` instruction loads this index and dispatches to the appropriate final
+destination. This avoids duplicating cleanup code for each exit while preserving
+correct control flow.
+
+For example, if a function has both a ``return`` and a ``break`` that exit
+through the same destructor cleanup, both paths branch to the shared cleanup
+block after storing their respective destination indices. The cleanup epilogue
+then switches on the stored index to reach either the return block or the
+loop-exit block.
+
+When only a single exit passes through a cleanup (the common case), the switch
+is unnecessary and the cleanup block branches directly to its sole destination.
+
+Branch Fixups for Forward Gotos
+-------------------------------
+
+A ``goto`` statement that jumps forward to a label not yet seen poses a special
+problem. The destination's enclosing cleanup scope is unknown at the point the
+``goto`` is emitted. This is handled by emitting an optimistic branch and
+recording a "fixup." When the cleanup scope is later popped, any recorded fixups
+are resolved by rewriting the branch to thread through the cleanup block and
+adding the destination to the cleanup's switch.
Exceptional Cleanups and EH Dispatch
====================================
@@ -95,9 +113,10 @@ depending on the target ABI.
For Itanium-style EH (such as is used on x86-64 Linux), the IR uses ``invoke``
to call potentially-throwing operations and a ``landingpad`` instruction to
-capture the exception and selector values. The landing pad aggregates the
-in-scope catch, filter, and cleanup clauses, then branches to a dispatch block
-that compares the selector to type IDs and jumps to the appropriate handler.
+capture the exception and selector values. The landing pad aggregates any
+catch and cleanup clauses for the current scope, and branches to a dispatch
+block that compares the selector to type IDs and jumps to the appropriate
+handler.
For Windows, LLVM IR uses funclet-style EH: ``catchswitch`` and ``catchpad`` for
handlers, and ``cleanuppad`` for cleanups, with ``catchret`` and ``cleanupret``
@@ -107,12 +126,17 @@ are interpreted by the backend.
Personality and ABI Selection
=============================
-The IR generation selects a personality function based on language options and
-the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This decision affects:
+Each function with exception handling constructs is associated with a
+personality function (e.g. __gxx_personality_v0 for C++ on Linux). The
+personality function determines the ABI-specifc EH behavior of the
+function. The IR generation selects a personality function based on language
+options and the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This
+decision affects:
- Whether the IR uses landing pads or funclet pads.
- The shape of dispatch logic for catch and filter scopes.
- How termination or rethrow paths are modeled.
+- Whether certain helper functions such as exception filters must be outlined.
Because the personality choice is made during IR generation, the CFG shape
directly reflects ABI-specific details.
@@ -148,6 +172,9 @@ High-level behavior
Codegen flow and key components
-------------------------------
+- The surrounding compound statement enters a ``CodeGenFunction::LexicalScope``,
+ which is a ``RunCleanupsScope`` and is responsible for popping local cleanups
+ at the end of the block.
- ``CodeGenFunction::EmitDecl`` routes the local variable to
``CodeGenFunction::EmitVarDecl`` and then ``CodeGenFunction::EmitAutoVarDecl``,
which in turn calls ``EmitAutoVarAlloca``, ``EmitAutoVarInit``, and
@@ -172,26 +199,9 @@ Codegen flow and key components
its ``Emit`` method generates a loop that calls the destructor on the
initialized range in reverse order.
-Call-Graph Summary
-------------------
-
-.. code-block:: text
-
- EmitDecl
- -> EmitVarDecl
- -> EmitAutoVarDecl
- -> EmitAutoVarAlloca
- -> EmitAutoVarInit
- -> EmitCXXAggrConstructorCall
- -> RunCleanupsScope
- -> pushRegularPartialArrayCleanup
- -> EmitCXXConstructorCall (per element)
- -> EmitAutoVarCleanups
- -> emitAutoVarTypeCleanup
- -> pushDestroy / pushFullExprCleanup
- -> DestroyObject cleanup
- -> destroyCXXObject
- -> EmitCXXDestructorCall
+The above function names and flow are accurate as of LLVM 22.0, but this is
+subject to change as the code evolves, and this document might not be updated to
+reflect the exact functions used.
Example: Temporary object materialization
=========================================
@@ -235,32 +245,7 @@ Codegen flow and key functions
- The cleanup ultimately uses ``DestroyObject`` and
``CodeGenFunction::destroyCXXObject``, which emits
``CodeGenFunction::EmitCXXDestructorCall``.
-- The call to ``useMyClass`` is emitted while the temporary is live, and the
- cleanup scope ensures the destructor runs on both normal and EH exits.
-
-Call-Graph Summary
-------------------
-
-.. code-block:: text
-
- EmitExprWithCleanups
- -> RunCleanupsScope
- -> EmitMaterializeTemporaryExpr
- -> createReferenceTemporary
- -> EmitAnyExprToMem
- -> EmitCXXConstructExpr
- -> EmitCXXConstructorCall
- -> pushTemporaryCleanup
- -> pushDestroy
- -> DestroyObject cleanup
- -> destroyCXXObject
- -> EmitCXXDestructorCall
-
-Notes on Variations
-===================
-
-The exact shape of generated LLVM IR depends on target ABI, language options,
-and optimization level. For example, filters, ``noexcept`` termination scopes,
-and async EH options can introduce additional dispatch blocks, personality
-selection differences, or outlined helper functions. The patterns above capture
-the essential structure used for EH and cleanup handling on the named targets.
+
+The above function names and flow are accurate as of LLVM 22.0, but this is
+subject to change as the code evolves, and this document might not be updated to
+reflect the exact functions used.
More information about the cfe-commits
mailing list