[clang] [ClangIR] Add ABI Lowering Design Document (PR #178326)
Andy Kaylor via cfe-commits
cfe-commits at lists.llvm.org
Fri Feb 13 14:57:40 PST 2026
================
@@ -0,0 +1,602 @@
+# ClangIR ABI Lowering - Design Document
+
+## 1. Introduction
+
+This design describes calling convention lowering that **builds on the GSoC ABI
+Lowering Library** (PR #140112): we use its `abi::Type*` and target ABI logic
+and add an MLIR integration layer (MLIRTypeMapper, ABI lowering pass, and
+dialect rewriters). The framework relies on the LLVM ABI library in
+`llvm/lib/ABI/` as the single source of truth for ABI classification; MLIR
+dialects use it via an adapter layer. The design enables CIR to perform
+ABI-compliant calling convention lowering, be reusable by other MLIR dialects
+(particularly FIR), and achieve parity with the CIR incubator for x86_64 and
+AArch64. **What the design is, in concrete terms:** inputs are high-level
+function signatures in CIR, FIR, or other MLIR dialects; outputs are ABI-lowered
+signatures and call sites; lowering runs as an MLIR pass in the compilation
+pipeline, before dialect lowering to LLVM IR or other back ends.
+
+### 1.1 Problem Statement
+
+Calling convention lowering is currently implemented separately for each MLIR
+dialect that needs it. The CIR incubator has a partial implementation, but it's
+tightly coupled to CIR-specific types and operations, making it unsuitable for
+reuse by other dialects. This means that FIR (Fortran IR) and future MLIR
+dialects would need to duplicate this complex logic. While Classic Clang
+CodeGen contains mature ABI lowering code, it cannot be reused directly because
+it's tightly coupled to Clang's AST representation and LLVM IR generation.
+
+### 1.2 Design Goals
+
+Building on the GSoC library and adding an MLIR integration layer avoids
+duplicating complex ABI logic across MLIR dialects, reduces maintenance, and
+keeps a single source of ABI compliance in `llvm/lib/ABI/`. The separation
+between GSoC (classification) and dialect-specific ABIRewriteContext (rewriting)
+enables clearer testing and a straightforward migration path from the CIR
+incubator by porting useful algorithms into the GSoC library where appropriate.
+
+A central goal is that generated code be **call-compatible with Classic Clang
+CodeGen** (and other compilers). Parity is with Classic Clang CodeGen output,
+not only with the incubator. Success means CIR correctly lowers x86_64 and
+AArch64 calling conventions with full ABI compliance using the GSoC library
+and MLIR integration layer; FIR can adopt the same infrastructure with minimal
+dialect-specific adaptation (e.g. cdecl when calling C from Fortran). ABI
+compliance will be validated through differential testing against Classic Clang
+CodeGen, and performance overhead should remain under 5% compared to a direct,
+dialect-specific implementation. Initial scope focuses on fixed-argument
+functions; variadic support (varargs) is deferred.
+
+## 2. Background and Context
+
+### 2.1 What is Calling Convention Lowering?
+
+Calling convention lowering transforms high-level function signatures to match
+target ABI (Application Binary Interface) requirements. When a function is
+declared at the source level with convenient, language-level types, these types
+must be translated into the specific register assignments, memory layouts, and
+calling sequences that the target architecture expects. For example, on x86_64
+System V ABI, a struct containing two 64-bit integers might be "expanded" into
+two separate arguments passed in registers, rather than being passed as a single
+aggregate:
+
+```
+// High-level CIR
+func @foo(i32, struct<i64, i64>) -> i32
+
+// After ABI lowering
+func @foo(i32 %arg0, i64 %arg1, i64 %arg2) -> i32
+// ^ ^ ^ ^
+// | | +--------+ struct expanded into fields
+// | +---- first field passed in register
+// +---- small integer passed in register
+```
+
+Calling convention lowering is complex for several reasons: it is highly
+target-specific (each architecture has different rules for registers vs.
+memory), type-dependent (rules differ for integers, floats, structs, unions,
+arrays), and context-sensitive (varargs, virtual calls, conventions like
+vectorcall or preserve_most). The same target may have multiple ABI variants
+(e.g. x86_64 System V vs. Windows x64), adding further complexity.
+
+### 2.2 Existing Implementations
+
+#### Classic Clang CodeGen
+
+Classic Clang CodeGen (located in `clang/lib/CodeGen/`) transforms calling
+conventions during the AST-to-LLVM-IR lowering process. This implementation is
+mature and well-tested, handling all supported targets with comprehensive ABI
+coverage. However, it's tightly coupled to both Clang's AST representation and
+LLVM IR, making it difficult to reuse for MLIR-based frontends.
+
+#### CIR Incubator
+
+The CIR incubator includes a calling convention lowering pass in
+`clang/lib/CIR/Dialect/Transforms/TargetLowering/` that transforms CIR
+operations into ABI-lowered CIR operations as an MLIR pass. This implementation
+successfully adapted logic from Classic Clang CodeGen to work within the MLIR
+framework. However, it relies on CIR-specific types and operations, preventing
+reuse by other MLIR dialects.
+
+#### GSoC ABI Lowering Library
+
+A 2025 Google Summer of Code project produced [PR
+#140112](https://github.com/llvm/llvm-project/pull/140112), which proposes
+extracting Clang's ABI logic into a reusable library in `llvm/lib/ABI/`. The
+design centers on a shadow type system (`abi::Type*`) separate from both Clang's
+AST types and LLVM IR types, enabling the ABI classification algorithms to work
+independently of any specific frontend representation. The library includes
+abstract `ABIInfo` base classes and target-specific implementations (e.g.
+x86_64, BPF) and provides QualTypeMapper for Clang to map `QualType` to
+`abi::Type*`.
+
+Our approach is to complete and extend this library and use it as the single
+source of truth for ABI classification. One implementation in one place reduces
+duplication, simplifies bug fixes, and creates a path for Classic Clang CodeGen
+to use the same logic in the future. MLIR dialects (CIR, FIR, and others) will
+use the library via an adapter layer rather than reimplementing ABI logic.
+
+**Current state.** The x86_64 implementation is largely complete and under
+review. AArch64 and some other targets are not yet implemented; there is no
+MLIR integration today. The work is being upstreamed in smaller parts (e.g.
+[PR 158329](https://github.com/llvm/llvm-project/pull/158329)); progress is
+limited by reviewer bandwidth. The overhead of the shadow type system
+(converting to and from `abi::Type*`) has been measured at under 0.1% for clang
+-O0, so it is negligible for CIR. Our approach therefore depends on the GSoC
+library being merged upstream or our contributions to it being accepted.
+
+**Our approach.** The approach is to complete and extend the GSoC library (e.g.
+AArch64, review feedback, tests) and add an **MLIR integration layer** so that
+MLIR dialects can use it:
+
+- **MLIRTypeMapper**: maps `mlir::Type` to `abi::Type*`, analogous to
+ QualTypeMapper for Clang.
+
+- **MLIR ABI lowering pass**: uses the library's `ABIInfo` for classification,
+ then performs dialect-specific rewriting via `ABIRewriteContext` for CIR, FIR,
+ and other dialects.
+
+The CIR incubator serves as a **reference only** (e.g. for AArch64 algorithms).
+We do not upstream the incubator's CIR-specific ABI implementation as the
+long-term solution; we port useful algorithms into the GSoC library where
+appropriate.
+
+### 2.3 Requirements for MLIR Dialects
+
+CIR needs to lower C/C++ calling conventions correctly, with initial support for
+x86_64 and AArch64 targets. It must handle structs, unions, and complex types,
+as well as support instance methods and virtual calls. FIR's initial need is
+**cdecl for calling C from Fortran** (C interop); that is in scope.
+Fortran-specific ABI semantics (e.g. CHARACTER hidden length parameters, array
+descriptors) are out of initial scope; full Fortran ABI lowering is a broader
+goal. Both dialects share common requirements: strict target ABI compliance,
+efficient lowering with minimal overhead, extensibility for adding new target
+architectures, and comprehensive testability and validation capabilities.
+
+## 3. Proposed Solution
+
+**Core.** The GSoC library in `llvm/lib/ABI/` performs ABI classification on
+`abi::Type*`. It provides `ABIInfo` and target-specific implementations
+(x86_64, BPF, and eventually AArch64 and others). This is the single place
+where ABI rules are implemented.
+
+**MLIR side.** To use this library from MLIR dialects we add an integration
+layer: (1) **MLIRTypeMapper** maps `mlir::Type` to `abi::Type*` (analogous to
+QualTypeMapper for Clang). (2) A **generic ABI lowering pass** invokes the
+library's `ABIInfo` for classification, then (3) performs **dialect-specific
+rewriting** via the `ABIRewriteContext` interface—each dialect (CIR, FIR, etc.)
+implements only the glue to create its own operations (e.g. `cir.call`,
+`fir.call`). Classification logic is shared; operation creation is
+dialect-specific.
+
+The following diagram shows the layering. At the top, the GSoC library holds
+the ABI logic. In the middle, adapters connect frontends to it: Classic Clang
+CodeGen uses QualTypeMapper; MLIR uses MLIRTypeMapper and the ABI lowering pass.
+At the bottom, each dialect implements `ABIRewriteContext` only; FIR is shown as
+a consumer for cdecl/C interop (e.g. calling C from Fortran).
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ GSoC ABI Library (llvm/lib/ABI/) │
+│ ABIInfo, abi::Type*, target implementations (X86, AArch64,…) │
+└─────────────────────────────────────────────────────────────────┘
+ │
+ ┌─────────────────┴─────────────────┐
+ │ │
+ ▼ ▼
+┌───────────────────────┐ ┌───────────────────────────────┐
+│ Classic CodeGen │ │ MLIR adapter │
+│ QualTypeMapper │ │ MLIRTypeMapper + ABI pass │
+└───────────────────────┘ └───────────────────────────────┘
+ │
+ ┌────────────────┼────────────────┐
+ │ │ │
+ ▼ ▼ ▼
+ ┌────────────┐ ┌────────────┐ ┌────────────┐
+ │ CIR │ │ FIR │ │ Future │
+ │ ABIRewrite │ │ (cdecl/C │ │ Dialects │
+ │ Context │ │ interop) │ │ │
+ └────────────┘ └────────────┘ └────────────┘
+```
+
+## 4. Design Overview
+
+### 4.1 Architecture Diagram
+
+The following diagram shows how the design builds on the GSoC library (Section
+3). At the top, GSoC holds the ABI classification logic. The middle layer
+adapts MLIR to GSoC: MLIRTypeMapper converts `mlir::Type` to `abi::Type*`, and
+the MLIR ABI lowering pass invokes GSoC's `ABIInfo` and uses the classification
+to drive rewriting. At the bottom, each dialect implements only
+`ABIRewriteContext` for operation creation; there is no separate type
+abstraction layer in MLIR for classification—that lives in GSoC.
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│ GSoC ABI Library (llvm/lib/ABI/) — single source of truth │
+│ abi::Type*, ABIInfo, target implementations (X86_64, AArch64, …) │
+│ Input: abi::Type* → Output: classification (ABIArgInfo, etc.) │
+└─────────────────────────────────────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│ MLIR adapter │
+│ MLIRTypeMapper (mlir::Type → abi::Type*) + MLIR ABI lowering pass │
+│ (1) Map types (2) Call GSoC ABIInfo (3) Drive rewriting from │
+│ classification result │
+└─────────────────────────────────────────────────────────────────────────┘
+ │
+ ┌─────────────────┼─────────────────┐
+ ▼ ▼ ▼
+ ┌────────────┐ ┌────────────┐ ┌────────────┐
+ │ CIR │ │ FIR │ │ Future │
+ │ ABIRewrite │ │ ABIRewrite │ │ Dialects │
+ │ Context │ │ Context │ │ │
+ └────────────┘ └────────────┘ └────────────┘
+ Dialect-specific operation creation only (no type
+ abstraction for classification in MLIR)
+```
+
+### 4.2 GSoC, Adapter, and Dialect Layers
+
+The architecture has three parts. **GSoC** (`llvm/lib/ABI/`) is the single
+source of truth for ABI classification: it operates on `abi::Type*` and produces
+classification results (e.g. ABIArgInfo, ABIFunctionInfo as defined in GSoC).
+Target-specific `ABIInfo` implementations (X86_64, AArch64, etc.) live there.
+The **adapter layer** is MLIR-specific: MLIRTypeMapper maps `mlir::Type` to
+`abi::Type*`, and the MLIR ABI lowering pass (1) maps types, (2) calls GSoC's
+ABIInfo, and (3) uses the classification to drive rewriting. The **dialect
+layer** is only ABIRewriteContext: each dialect (CIR, FIR) implements operation
+creation (createFunction, createCall, createExtractValue, etc.). There is no
+type abstraction layer in MLIR for classification; type queries for ABI are
+performed on `abi::Type*` inside GSoC.
+
+### 4.3 Key Components
+
+The framework is built from the following components. **GSoC**
+(`llvm/lib/ABI/`) provides the single source of truth for ABI classification:
+the `abi::Type*` type system, the `ABIInfo` base and target-specific
+implementations (e.g. X86_64, AArch64), and the classification result types
+(e.g. ABIArgInfo, ABIFunctionInfo). **MLIRTypeMapper** maps `mlir::Type` to
+`abi::Type*` so that MLIR dialect types can be classified by GSoC. The **MLIR
+ABI lowering pass** orchestrates the flow: it uses MLIRTypeMapper, calls GSoC's
+ABIInfo, and drives rewriting from the classification result.
+**ABIRewriteContext** is the dialect-specific interface for operation creation
+(each dialect implements it to produce e.g. cir.call, fir.call). A **target
+registry** (or equivalent) is used to select the appropriate GSoC ABIInfo for
+the compilation target. There is no ABITypeInterface or separate "ABIInfo in
+MLIR"; classification lives entirely in GSoC.
+
+### 4.4 ABI Lowering Flow: How the Pieces Fit Together
+
+This section describes the end-to-end flow of ABI lowering, showing how all
+interfaces and components work together.
+
+#### Step 1: Function Signature Analysis
+
+The ABI lowering pass begins by analyzing the function signature. When it
+encounters a function operation, it extracts the parameter types and return type
+to prepare them for classification. At this stage, the types are still in their
+high-level, dialect-specific form (e.g., `!cir.struct` for CIR, or `!fir.type`
+for FIR). The pass collects these types into a list that will be fed to the
+classification logic in the next step.
+
+```
+Input: func @foo(%arg0: !cir.int<u, 32>,
+ %arg1: !cir.struct<{!cir.int<u, 64>,
+ !cir.int<u, 64>}>) -> !cir.int<u, 32>
+```
+
+#### Step 2: Type Mapping via MLIRTypeMapper
+
+For each argument and the return type, the pass maps `mlir::Type` to
+`abi::Type*` using MLIRTypeMapper. The mapper produces the representation that
+GSoC's ABIInfo expects; optionally, it can map back to MLIR types for coercion
+types when needed.
+
+```cpp
+// Map dialect types to GSoC's type system
+MLIRTypeMapper mlirTypeMapper(module.getDataLayout());
+abi::Type *arg0Abi = mlirTypeMapper.map(arg0Type); // i32 -> IntegerType
+abi::Type *arg1Abi = mlirTypeMapper.map(arg1Type); // struct -> RecordType
+abi::Type *retAbi = mlirTypeMapper.map(returnType);
+```
+
+**Key Point**: Classification runs in GSoC on `abi::Type*`; MLIRTypeMapper is
+the only bridge from dialect types to that representation.
+
+#### Step 3: ABI Classification (GSoC ABIInfo)
+
+GSoC's target-specific `ABIInfo` (e.g. X86_64) performs classification on
+`abi::Type*` and produces GSoC's classification result (e.g. ABIFunctionInfo
+and ABIArgInfo as defined in `llvm/lib/ABI/`):
+
+```cpp
+// Pass holds a GSoC ABIInfo (from target registry or module target)
+llvm::abi::ABIInfo *abiInfo = getABIInfo(); // e.g. X86_64
+llvm::abi::ABIFunctionInfo abiFI;
+abiInfo->computeInfo(abiFI, arg0Abi, arg1Abi, retAbi);
+// For struct<i64,i64> on x86_64: produces Expand (two i64 args)
+```
+
+Output: GSoC's classification (e.g. ABIFunctionInfo) for all arguments and
+return:
+- `%arg0 (i32)` → Direct (pass as-is)
+- `%arg1 (struct)` → Expand (split into two i64 fields)
+- Return type → Direct
+
+#### Step 4: Function Signature Rewriting
+
+After GSoC's classification is complete, the pass rewrites the function to match
+the ABI requirements using the dialect's `ABIRewriteContext`. The
+classification result (from GSoC) describes the lowered signature; the rewrite
+context creates the actual dialect operations. For example, if a struct is
+classified as "Expand", the new function signature will have multiple scalar
+parameters instead of the single struct parameter.
+
+```cpp
+ABIRewriteContext &ctx = getDialectRewriteContext();
+
+// Create new function with lowered signature
+FunctionType newType = ...; // (i32, i64, i64) -> i32
+Operation *newFunc = ctx.createFunction(loc, "foo", newType);
+```
+
+**Key Point**: The original function had signature `(i32, struct) -> i32`, but
+the ABI-lowered function has signature `(i32, i64, i64) -> i32` with the struct
+expanded into its constituent fields.
+
+#### Step 5: Argument Expansion
+
+With the function signature rewritten, the pass updates all call sites to match
+the new signature, using the classification from GSoC to drive rewriting via
+`ABIRewriteContext`. For arguments classified as "Expand", the pass breaks down
+the aggregate into its constituent parts (e.g. struct into two i64 values).
+The rewrite context provides operations to extract fields and construct the new
+call with the expanded argument list.
+
+```cpp
+// Original call: call @foo(%val0, %structVal)
+// Need to extract struct fields:
+
+Value field0 = ctx.createExtractValue(loc, structVal, {0}); // extract 1st i64
+Value field1 = ctx.createExtractValue(loc, structVal, {1}); // extract 2nd i64
+
+// New call with expanded arguments
+ctx.createCall(loc, newFunc, {resultType}, {val0, field0, field1});
+```
+
+**Key Point**: `ABIRewriteContext` abstracts the dialect-specific operation
+creation, so the lowering logic doesn't need to know about CIR operations.
+
+#### Step 6: Return Value Handling
+
+For functions returning large structs (indirect return):
+
+```cpp
+// If return type is classified as Indirect:
+Value sretPtr = ctx.createAlloca(loc, retType, alignment);
+ctx.createCall(loc, func, {}, {sretPtr, ...otherArgs});
+Value result = ctx.createLoad(loc, sretPtr);
+```
+
+#### Complete Flow Diagram
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ Input: High-Level Function (CIR/FIR/other dialect) │
+│ func @foo(%arg0: i32, %arg1: struct<i64,i64>) -> i32 │
+└────────────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Step 1: Extract Types │
+│ For each parameter: mlir::Type │
+└────────────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Step 2: Map Types (MLIRTypeMapper → abi::Type*) │
+│ mlirTypeMapper.map(argType) → abi::Type* │
+│ └─> Dialect types converted for GSoC │
+└────────────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Step 3: Classify (GSoC ABIInfo) │
+│ abiInfo->computeInfo(abiFI, ...) on abi::Type* │
+│ Applies target rules (e.g. x86_64 System V) │
+│ └─> Produces: GSoC ABIFunctionInfo / ABIArgInfo │
+└────────────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Step 4: Rewrite Function (ABIRewriteContext) │
+│ Use GSoC classification to build lowered signature │
+│ └─> ctx.createFunction(loc, name, newType); (i32, i64, i64) │
+└────────────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Step 5: Rewrite Call Sites (ABIRewriteContext) │
+│ ctx.createExtractValue() - expand struct; ctx.createCall() │
+│ └─> Dialect-specific operation creation │
+└────────────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Output: ABI-Lowered Function │
+│ func @foo(%arg0: i32, %arg1: i64, %arg2: i64) -> i32 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+#### Key Interactions Between Components
+
+Classification lives in GSoC: `ABIInfo` operates on `abi::Type*` and produces
+classification results (e.g. ABIArgInfo, ABIFunctionInfo). MLIR types reach
+GSoC only via MLIRTypeMapper, which converts `mlir::Type` to `abi::Type*`. The
+lowering pass (1) maps types with MLIRTypeMapper, (2) calls GSoC's ABIInfo to
+get classification, and (3) uses that result to drive rewriting through the
+dialect's ABIRewriteContext.
+
+ABIRewriteContext consumes the classification (e.g. "Expand" for a struct) and
+performs the actual IR changes: createFunction with the lowered signature,
+createExtractValue and createCall at call sites. Each dialect implements
+ABIRewriteContext to produce its own operations (e.g. cir.call, fir.call).
+This keeps classification in one place (GSoC) and limits dialect code to
+operation creation.
+
+## 5. Detailed Component Design
----------------
andykaylor wrote:
Except for 5.6 this section doesn't seem to provide any new information.
https://github.com/llvm/llvm-project/pull/178326
More information about the cfe-commits
mailing list