[clang] [llvm] [clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) (PR #165777)

Mon Nov 3 12:51:17 PST 2025

grigorypas wrote:

> > Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. alwaysinline should still mean the same thing.
> 
> You said patch 2 will update the alwaysinliner pass. `alwaysinline` has previously always inlined a function unless it was illegal to do so. You're now maybe not inlining depending on the `flatten_deep` attribute, which seems like a cost heuristic encoded in the IR to me.
> 
> > To clarify, our primary use case at Meta is to completely flatten functions by inlining the entire call tree. The max depth parameter is not intended as a core part of the user workflow, but rather as a safeguard to prevent issues if the call tree happens to be extremely deep.
> 
> So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions?

Thank you for the feedback! Let me clarify the design:
## `alwaysinline` Semantics Are Preserved
The `alwaysinline` semantics are **not** being changed. The original `alwaysinline` logic is applied first and takes precedence. The `flatten_deep` logic runs in the same pass but is applied at the end, after the standard `alwaysinline` processing. If a function has `alwaysinline`, it will be inlined according to the existing rules (unless illegal to do so), completely independent of any `flatten_deep` attributes.
You can see this in the suggested implementation here: [https://github.com/grigorypas/llvm-project/tree/full_flattening](https://github.com/grigorypas/llvm-project/tree/full_flattening)
## `flatten_deep` as a Natural Extension of `flatten`
`flatten_deep(N)` is a natural extension of the existing `flatten` attribute. While they differ in implementation, the motivation is similar:
- **`flatten`**: Inlines all immediate callsites (single level) - implemented at frontend by marking direct calls with `alwaysinline`
- **`flatten_deep(N)`**: Inlines recursively/transitively up to N levels deep - requires backend support to propagate through the call tree
Importantly, **full/deep flattening cannot be achieved today with existing attributes**. You can't achieve transitive inlining across the entire call tree with current mechanisms.
## Max Depth as a Safeguard
The max depth parameter is not a cost heuristic - it's a safety limit:
- **Primary use case**: Complete flattening of the call tree (large N)
- **Max depth parameter**: A safeguard to prevent compile-time explosions with unexpectedly deep call trees
This is similar to other compiler safety limits (e.g., `-fconstexpr-depth=N`) - we want to flatten the entire call tree in normal cases, but need a circuit breaker for pathological edge cases.
## Use Case
This feature is useful for performance-critical code where eliminating call overhead across the entire call tree is beneficial, such as:
- Deeply nested hot paths in performance-sensitive applications
- **PGO scenarios with stale profiles**: When adding new functions to hot paths, `flatten_deep(N)` may help where default bottom-up inlining decisions rely on incomplete or stale profile data
Does this clarification address your concerns?

https://github.com/llvm/llvm-project/pull/165777