[clang] [llvm] [KeyInstr] Add docs (PR #137991)

Tue Jul 15 04:01:13 PDT 2025

================
@@ -0,0 +1,114 @@
+# Key Instructions debug info in LLVM
+
+Key Instructions reduces the jumpiness of optimized code debug stepping. This document explains the feature and how it is implemented in LLVM. For Clang support please see the [Clang docs](../../clang/docs/KeyInstructionsClang.md)
+
+## Status
+
+In development, but mostly complete. The feature is currently disabled for coroutines.
+
+Tell Clang [not] to produce Key Instructions metadata with `-g[no-]key-instructions`. See the Clang docs for implementation info.
+
+The feature improves optimized code stepping; it's intended for the feature to be used with optimisations enabled. Although the feature works at O0 it is not recommended because in some cases the effect of editing variables may not always be immediately realised. (This is a quirk of the current implementation, rather than fundemental limitation, covered in more detail later).
+
+This is a DWARF-based feature. There is currently no plan to support CodeView.
+
+Set LLVM flag `-dwarf-use-key-instructions` to `false` to ignore Key Instructions metadata when emitting DWARF.
+
+## Problem statement
+
+A lot of the noise in stepping comes from code motion and instruction scheduling. Consider a long expression on a single line. It may involve multiple operations that optimisations move, re-order, and interleave with other instructions that have different line numbers.
+
+DWARF provides a helpful tool the compiler can employ to mitigate this jumpiness, the `is_stmt` flag, which indicates that an instruction is a recommended breakpoint location. However, LLVM's current approach to deciding `is_stmt` placement essentially reduces down to "is the associated line number different to the previous instruction's?".
+
+(Note: It's up to the debugger if it wants to interpret `is_stmt` or not, and at time of writing LLDB doesn't; possibly because until now LLVM's is_stmts convey no information that can't already be deduced from the rest of the line table.)
+
+## Solution overview
+
+Taking ideas from two papers [1][2] that explore the issue, especially C. Tice's:
+
+From the perspective of a source-level debugger user:
+
+* Source code is made up of interesting constructs; the level of granularity for “interesting” while stepping is typically assignments, calls, control flow. We’ll call these interesting constructs Atoms.
+
+* Atoms usually have one instruction that implements the functionality that a user can observe; once they step “off” that instruction, the atom is finalised. We’ll call that a Key Instruction.
+
+* Communicating where the key instructions are to the debugger (using DWARF’s is_stmt) avoids jumpiness introduced by scheduling non-key instructions without losing source attribution (because non-key instructions retain an associated source location, they’re just ignored for stepping).
+
+## Solution implementation
+
+1. `DILocation` has 2 new fields, `atomGroup` and `atomRank`. `DISubprogram` has a new field `keyInstructions`.
+2. Clang creates `DILocations` using the new fields to communicate which instructions are "interesting", and sets `keyInstructions` true in `DISubprogram`s to tell LLVM to interpret the new metadata in those functions.
+3. There’s some bookkeeping required by optimisations that duplicate control flow.
+4. During DWARF emission, the new metadata is collected (linear scan over instructions) to decide `is_stmt` placements.
+
+1. *The metadata* - The two new `DILocation` fields are `atomGroup` and `atomRank` and are both are unsigned integers. `atomGroup` is 61 bits and `atomRank` 3 bits. Instructions in the same function with the same `(atomGroup, inlinedAt)` pair are part of the same source atom. `atomRank` determines `is_stmt` preference within that group, where a lower number is higher precedence. Higher rank instructions act as "backup" `is_stmt` locations, providing good fallback locations if/when the primary candidate gets optimized away. The default values of 0 indicate the instruction isn’t interesting - it's not an `is_stmt` candidate. If `keyInstructions` in `DISubprogram` is false (default) then the new `DILocation` metadata is ignored for the function (including inlined instances) when emitting DWARF.
----------------
OCHyams wrote:

Ah, good catch and yes you're right. I've done a local build to fix link problems etc too. 

https://github.com/llvm/llvm-project/pull/137991