[clang] [llvm] [not for merge][RFC] Key Instructions front end demo (PR #130943)

Wed Mar 12 04:05:03 PDT 2025

llvmbot wrote:



@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-clang

Author: Orlando Cazalet-Hyams (OCHyams)

<details>
<summary>Changes</summary>

This draft pull request demonstrates our proposed approach to annotating instructions with Key Instructions metadata. It's not fully complete but works as a proof of concept and I welcome and encourage feedback on the approach, as I'm not particularly familiar with Clang.

The Key Instructions project is introduced here https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668, which includes a "quick summary" section at the top which adds context for this PR.

I'll walk through the changes first, followed by some questions at the end.

Note: this PR doesn't include the LLVM parts, so you won't be able to try it out locally.

## Overview

We'd like Clang to annotate instructions with additional metadata (bundled inside DILocations) so that LLVM can make better `is_stmt` placement decisions. Specifically we'll add two new fields to DILocations: An "atom group" number for instructions that perform key functionality ("interesting" behaviour from a debugger-user perspective, see _Key Instructions: Solving the Code Location Problem for Optimized Code (C. Tice, . S. L. Graham, 2000)_); and an "atom rank" number which indicates precedence within a set of instructions that have the same atom group number.

In the project prototype we interpreted IR in a pre-optimisation pass to decide the metadata placement. Having Clang perform the annotation comes with several benefits: it allows the front end to be opinionated about stepping locations, it's more performant for the front end to do it, and it hopefully improves the chance that it can be adopted into other front ends in which the prototype approach isn't approprite.

First, here's an example to highlight the new metadata we'll be producing:

`$ cat -n test.cpp`
```
1.  void f(int a) {
2.    int x[256] = {a};
3.  }
```
`clang++ test.cpp -gmlt -emit-llvm -S`
```
define hidden void @_Z1fi(i32 noundef %a) #0 !dbg !11 {
entry:
  %a.addr = alloca i32, align 4
  %x = alloca [256 x i32], align 16
  store i32 %a, ptr %a.addr, align 4
  call void @llvm.memset.p0.i64(ptr align 16 %x, i8 0, i64 1024, i1 false), !dbg !14      ; atom 1, rank 1
  %arrayinit.begin = getelementptr inbounds [256 x i32], ptr %x, i64 0, i64 0, !dbg !15
  %0 = load i32, ptr %a.addr, align 4, !dbg !16                                           ; atom 1, rank 2
  store i32 %0, ptr %arrayinit.begin, align 4, !dbg !17                                   ; atom 1, rank 1
  ret void, !dbg !18                                                                      ; atom 2, rank 1
}

...
!14 = !DILocation(line: 2, column: 7, scope: !11, atomGroup: 1, atomRank: 1)
!15 = !DILocation(line: 2, column: 16, scope: !11)
!16 = !DILocation(line: 2, column: 17, scope: !11, atomGroup: 1, atomRank: 2)
!17 = !DILocation(line: 2, column: 16, scope: !11, atomGroup: 1, atomRank: 1)
!18 = !DILocation(line: 3, column: 1, scope: !11, atomGroup: 2, atomRank: 1)
```
Atom 1 represents the aggregate initialization of `x`. The store and memset have rank 1. That gives them precedence the load of `a` with rank 2, which is essentially a "backup instruction" for the purposes of assigning `is_stmt`. If all rank 1 instructions are optimized away, we'll use that instead. Atom 2 represents the implicit return (given the source location of the closing brace).

<details>
<summary>
DWARF emission info for additional context (feel free to ignore).
</summary>

Not shown in this patch - During dwarf emission, for each atom group, the last instruction in a block that shares the lowest rank will be candidates for `is_stmt`. The wording is tricky, but essentially if any rank 1 instructions exist, higher rank (lower precedence) instructions won't be candidates for `is_stmt` even if they're in different blocks, but all the final rank 1 instructions in the group in different blocks will be.

As an optimisation for convinience (mostly to minimise unecessary difference when the feature is enabled, but also to improve variable availability in some cases), we apply a heuristc that "floats" the `is_stmt` up to the first instruction in a contiguous block of instructions with the same line number.

In the example above the `store` in atom 1 is the candidate for `is_stmt` (the `memset` comes before the `store` and the `load` is lower precedence as well as before the `store`). Because of the heuristic described above, the `is_stmt` flag floats up to the memset, as all the preceeding instructions have the same line number.
</details>

## Implementation

We need to annotate assignments, conditional branches, some unconditional branches, and calls as these are instructions implementing "key functionality". The GEP used to compute the offset at which to store a value for an assignment is not interesting, but the store itself is. We also want to annotate "backup instructions". E.g., the instruction computing the value being stored, or the `cmp` used for a conditional branch.

`ApplyAtomGroup` works similarly to and alongside `ApplyDebugLocation` as an RAII wrapper around a "current atom group" number during CodeGen. The current atom number is applied to certain instructions as they're emitted (stores, branches, etc) using `addInstToCurrentSourceAtom`.

Here's how it looks with the aggregate initialisation example from the overview section:

<details>
<summary>
`EmitAutoVarInit` creates a new atom group with `ApplyAtomGroup`.
</summary>

```
> clang::CodeGen::ApplyAtomGroup::ApplyAtomGroup(clang::CodeGen::CodeGenFunction & CGF) Line 184
  clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 1958 ++
  clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) Line 1358
  clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 219
  ...
```
</details>

<details>
<summary>
`CheckAggExprForMemSetUse` calls `addInstToCurrentSourceAtom` to add the memset to the current atom group.
</summary>

```
> clang::CodeGen::CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction * KeyInstruction, llvm::Value * Backup, unsigned   char KeyInstRank) Line 2551
  CheckAggExprForMemSetUse(clang::CodeGen::AggValueSlot & Slot, const clang::Expr * E, clang::CodeGen::CodeGenFunction & CGF) Line   2026
  clang::CodeGen::CodeGenFunction::EmitAggExpr(const clang::Expr * E, clang::CodeGen::AggValueSlot Slot) Line 2043
  clang::CodeGen::CodeGenFunction::EmitExprAsInit(const clang::Expr * init, const clang::ValueDecl * D, clang::CodeGen::LValue   lvalue, bool capturedByInit) Line 2104
  clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 2047
  clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) Line 1358
  clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 219
  ...
```
</details>

<details>
<summary>
`EmitStoreOfScalar` does the same for the store (note - this is the same atom group number as the memset).
</summary>

```
> clang::CodeGen::CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction * KeyInstruction, llvm::Value * Backup, unsigned char KeyInstRank) Line 2551
  clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * Value, clang::CodeGen::Address Addr, bool Volatile, clang::QualType Ty, clang::CodeGen::LValueBaseInfo BaseInfo, clang::CodeGen::TBAAAccessInfo TBAAInfo, bool isInit, bool isNontemporal) Line 2133
  clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * value, clang::CodeGen::LValue lvalue, bool isInit) Line 2152
  clang::CodeGen::CodeGenFunction::EmitStoreThroughLValue(clang::CodeGen::RValue Src, clang::CodeGen::LValue Dst, bool isInit) Line 2466
  clang::CodeGen::CodeGenFunction::EmitScalarInit(const clang::Expr * init, const clang::ValueDecl * D, clang::CodeGen::LValue lvalue, bool capturedByInit) Line 803
  `anonymous namespace'::AggExprEmitter::EmitInitializationToLValue(clang::Expr * E, clang::CodeGen::LValue LV) Line 1592
  `anonymous namespace'::AggExprEmitter::EmitArrayInit(clang::CodeGen::Address DestPtr, llvm::ArrayType * AType, clang::QualType rrayQTy, clang::Expr * ExprToVisit, llvm::ArrayRef<clang::Expr *> Args, clang::Expr * ArrayFiller) Line 614
  `anonymous namespace'::AggExprEmitter::VisitCXXParenListOrInitListExpr(clang::Expr * ExprToVisit, llvm::ArrayRef<clang::Expr *> nitExprs, clang::FieldDecl * InitializedFieldInUnion, clang::Expr * ArrayFiller) Line 1672
  `anonymous namespace'::AggExprEmitter::VisitInitListExpr(clang::InitListExpr * E) Line 1641
  clang::StmtVisitorBase<std::add_pointer,`anonymous namespace'::AggExprEmitter,void>::Visit(clang::Stmt * S) Line 352
  `anonymous namespace'::AggExprEmitter::Visit(clang::Expr * E) Line 114
  clang::CodeGen::CodeGenFunction::EmitAggExpr(const clang::Expr * E, clang::CodeGen::AggValueSlot Slot) Line 2045
  clang::CodeGen::CodeGenFunction::EmitExprAsInit(const clang::Expr * init, const clang::ValueDecl * D, clang::CodeGen::LValue value, bool capturedByInit) Line 2104
  clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 2047	C+
  clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) Line 1358
  clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 219
  ...
```

</details>

`addInstToCurrentSourceAtom` annotates the instruction that produces the stored value too, with a higher rank than the store (lower precedence).


## Rough edges and questions

* The single-block return statement handling (see changes in `EmitReturnBlock` in clang/lib/CodeGen/CGStmt.cpp and `addRetToOverrideOrNewSourceAtom` usage in `EmitFunctionEpilog`) doesn't fit the RAII model (I've not found any other cases yet though).
* There's some inefficiency: DILocations are attached to instructions then immediately replaced with a new version with Key Instructions metadata.
* It doesn't fail gracefully: if we accidentally fail to annotate instructions that should be key, those instructions won't be candidate for `is_stmt` and therefore will likely be stepped-over while debugging.
    * Is there a way to implement this that errs on over-application of annotations?
        * Perhaps every statement (in `EmitStmt`) could be assumed to be in an atom group by default, with particular statements opting out, rather than interesting ones opting in?
        * I think we could add all stores (and memsets etc) to a "new" atom group by default, unless there's an "active" RAII group already or they've opted-out. That would offer a graceful fallback for stores. It doesn't help so much with control flow (as I think only some branches are interesting, compared to most stores being interesting).

Finally, does this direction look reasonable overall?

---

Patch is 90.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130943.diff


48 Files Affected:

- (modified) clang/lib/CodeGen/CGBuiltin.cpp (+35-14) 
- (modified) clang/lib/CodeGen/CGCall.cpp (+15-1) 
- (modified) clang/lib/CodeGen/CGClass.cpp (+1) 
- (modified) clang/lib/CodeGen/CGCleanup.cpp (+3) 
- (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+148-1) 
- (modified) clang/lib/CodeGen/CGDebugInfo.h (+51) 
- (modified) clang/lib/CodeGen/CGDecl.cpp (+31-22) 
- (modified) clang/lib/CodeGen/CGDeclCXX.cpp (+3-1) 
- (modified) clang/lib/CodeGen/CGExpr.cpp (+21-5) 
- (modified) clang/lib/CodeGen/CGExprAgg.cpp (+7-2) 
- (modified) clang/lib/CodeGen/CGExprComplex.cpp (+10-2) 
- (modified) clang/lib/CodeGen/CGExprScalar.cpp (+4) 
- (modified) clang/lib/CodeGen/CGStmt.cpp (+57-9) 
- (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+44-3) 
- (modified) clang/lib/CodeGen/CodeGenFunction.h (+17-1) 
- (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+1) 
- (added) clang/test/KeyInstructions/agg-cpy.cpp (+13) 
- (added) clang/test/KeyInstructions/agg-init-bzero-plus-stores.cpp (+25) 
- (added) clang/test/KeyInstructions/agg-init.cpp (+24) 
- (added) clang/test/KeyInstructions/agg-null-init.cpp (+21) 
- (added) clang/test/KeyInstructions/agg-return-va-arg.cpp (+23) 
- (added) clang/test/KeyInstructions/assign.c (+11) 
- (added) clang/test/KeyInstructions/binop.c (+17) 
- (added) clang/test/KeyInstructions/bitfield.cpp (+15) 
- (added) clang/test/KeyInstructions/cast.cpp (+17) 
- (added) clang/test/KeyInstructions/complex.cpp (+28) 
- (added) clang/test/KeyInstructions/compound-assign.cpp (+13) 
- (added) clang/test/KeyInstructions/do.cpp (+20) 
- (added) clang/test/KeyInstructions/double-assign.cpp (+22) 
- (added) clang/test/KeyInstructions/for.cpp (+34) 
- (added) clang/test/KeyInstructions/if.cpp (+42) 
- (added) clang/test/KeyInstructions/inc.cpp (+13) 
- (added) clang/test/KeyInstructions/member-init.cpp (+13) 
- (added) clang/test/KeyInstructions/memset.c (+10) 
- (added) clang/test/KeyInstructions/multi-func.cpp (+28) 
- (added) clang/test/KeyInstructions/new.cpp (+36) 
- (added) clang/test/KeyInstructions/ret-agg.cpp (+23) 
- (added) clang/test/KeyInstructions/return-no-loc.c (+26) 
- (added) clang/test/KeyInstructions/return-ref.cpp (+19) 
- (added) clang/test/KeyInstructions/return.c (+36) 
- (added) clang/test/KeyInstructions/return.cpp (+26) 
- (added) clang/test/KeyInstructions/scalar-init.cpp (+17) 
- (added) clang/test/KeyInstructions/static-init.cpp (+13) 
- (added) clang/test/KeyInstructions/switch.cpp (+48) 
- (added) clang/test/KeyInstructions/try-catch.cpp (+20) 
- (added) clang/test/KeyInstructions/while.cpp (+20) 
- (modified) llvm/include/llvm/IR/LLVMContext.h (+4) 
- (modified) llvm/lib/IR/LLVMContext.cpp (+4) 


``````````diff

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index afd9798cd639b..8add3d7296c2e 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -13,6 +13,7 @@
 #include "ABIInfo.h"
 #include "CGCUDARuntime.h"
 #include "CGCXXABI.h"
+#include "CGDebugInfo.h"
 #include "CGHLSLRuntime.h"
 #include "CGObjCRuntime.h"
 #include "CGOpenCLRuntime.h"
@@ -39,11 +40,13 @@
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/FloatingPointMode.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/InlineAsm.h"
+#include "llvm/IR/Instruction.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsAArch64.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
@@ -97,6 +100,7 @@ static void initializeAlloca(CodeGenFunction &CGF, AllocaInst *AI, Value *Size,
   if (CGF.CGM.stopAutoInit())
     return;
   auto *I = CGF.Builder.CreateMemSet(AI, Byte, Size, AlignmentInBytes);
+  CGF.addInstToCurrentSourceAtom(I, nullptr);
   I->addAnnotationMetadata("auto-init");
 }
 
@@ -3543,6 +3547,14 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     }
   };
 
+  // FIXME(OCH): Should any of the maths builtins be key instructions?
+  auto Grp = ApplyAtomGroup(*this);
+  llvm::Instruction *InstForAtomGrp = nullptr;
+  auto Cleanup = llvm::make_scope_exit([&]() {
+    if (InstForAtomGrp)
+      addInstToCurrentSourceAtom(InstForAtomGrp, nullptr);
+  });
+
   switch (BuiltinIDIfNoAsmLabel) {
   default: break;
   case Builtin::BI__builtin___CFStringMakeConstantString:
@@ -3948,6 +3960,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
   case Builtin::BI_byteswap_ushort:
   case Builtin::BI_byteswap_ulong:
   case Builtin::BI_byteswap_uint64: {
+    // FIXME(OCH): Should bswap and similar intrinsics be key instructions?
+    // If the result is stored then that will be key - is that enough?
     return RValue::get(
         emitBuiltinWithOneOverloadedType<1>(*this, E, Intrinsic::bswap));
   }
@@ -4080,7 +4094,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     return RValue::get(Builder.CreateCall(F, {Begin, End}));
   }
   case Builtin::BI__builtin_trap:
-    EmitTrapCall(Intrinsic::trap);
+    InstForAtomGrp = EmitTrapCall(Intrinsic::trap);
     return RValue::get(nullptr);
   case Builtin::BI__builtin_verbose_trap: {
     llvm::DILocation *TrapLocation = Builder.getCurrentDebugLocation();
@@ -4095,10 +4109,10 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     return RValue::get(nullptr);
   }
   case Builtin::BI__debugbreak:
-    EmitTrapCall(Intrinsic::debugtrap);
+    InstForAtomGrp = EmitTrapCall(Intrinsic::debugtrap);
     return RValue::get(nullptr);
   case Builtin::BI__builtin_unreachable: {
-    EmitUnreachable(E->getExprLoc());
+    InstForAtomGrp = EmitUnreachable(E->getExprLoc());
 
     // We do need to preserve an insertion point.
     EmitBlock(createBasicBlock("unreachable.cont"));
@@ -4547,6 +4561,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         Matrix, Dst.emitRawPointer(*this),
         Align(Dst.getAlignment().getQuantity()), Stride, IsVolatile,
         MatrixTy->getNumRows(), MatrixTy->getNumColumns());
+    InstForAtomGrp = cast<llvm::Instruction>(Result);
     return RValue::get(Result);
   }
 
@@ -4667,6 +4682,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
             .getAsAlign();
     AllocaInst *AI = Builder.CreateAlloca(Builder.getInt8Ty(), Size);
     AI->setAlignment(SuitableAlignmentInBytes);
+    // NOTE(OCH): `initializeAlloca` adds Key Instruction metadata.
     if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
       initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
     LangAS AAS = getASTAllocaAddressSpace();
@@ -4689,6 +4705,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         CGM.getContext().toCharUnitsFromBits(AlignmentInBits).getAsAlign();
     AllocaInst *AI = Builder.CreateAlloca(Builder.getInt8Ty(), Size);
     AI->setAlignment(AlignmentInBytes);
+    // NOTE(OCH): `initializeAlloca` adds Key Instruction metadata.
     if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
       initializeAlloca(*this, AI, Size, AlignmentInBytes);
     LangAS AAS = getASTAllocaAddressSpace();
@@ -4707,7 +4724,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(1));
     EmitNonNullArgCheck(Dest, E->getArg(0)->getType(),
                         E->getArg(0)->getExprLoc(), FD, 0);
-    Builder.CreateMemSet(Dest, Builder.getInt8(0), SizeVal, false);
+    InstForAtomGrp =
+        Builder.CreateMemSet(Dest, Builder.getInt8(0), SizeVal, false);
     return RValue::get(nullptr);
   }
 
@@ -4722,7 +4740,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     EmitNonNullArgCheck(RValue::get(Dest.emitRawPointer(*this)),
                         E->getArg(1)->getType(), E->getArg(1)->getExprLoc(), FD,
                         0);
-    Builder.CreateMemMove(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(nullptr);
   }
 
@@ -4735,7 +4753,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
     EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
     EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
-    Builder.CreateMemCpy(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemCpy(Dest, Src, SizeVal, false);
     if (BuiltinID == Builtin::BImempcpy ||
         BuiltinID == Builtin::BI__builtin_mempcpy)
       return RValue::get(Builder.CreateInBoundsGEP(
@@ -4751,7 +4769,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         E->getArg(2)->EvaluateKnownConstInt(getContext()).getZExtValue();
     EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
     EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
-    Builder.CreateMemCpyInline(Dest, Src, Size);
+    InstForAtomGrp = Builder.CreateMemCpyInline(Dest, Src, Size);
     return RValue::get(nullptr);
   }
 
@@ -4772,7 +4790,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     Value *SizeVal = llvm::ConstantInt::get(Builder.getContext(), Size);
-    Builder.CreateMemCpy(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemCpy(Dest, Src, SizeVal, false);
     return RValue::get(Dest, *this);
   }
 
@@ -4798,7 +4816,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     Value *SizeVal = llvm::ConstantInt::get(Builder.getContext(), Size);
-    Builder.CreateMemMove(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(Dest, *this);
   }
 
@@ -4809,7 +4827,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
     EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
     EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
-    Builder.CreateMemMove(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(Dest, *this);
   }
   case Builtin::BImemset:
@@ -4820,7 +4838,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
     EmitNonNullArgCheck(Dest, E->getArg(0)->getType(),
                         E->getArg(0)->getExprLoc(), FD, 0);
-    Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
     return RValue::get(Dest, *this);
   }
   case Builtin::BI__builtin_memset_inline: {
@@ -4832,7 +4850,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     EmitNonNullArgCheck(RValue::get(Dest.emitRawPointer(*this)),
                         E->getArg(0)->getType(), E->getArg(0)->getExprLoc(), FD,
                         0);
-    Builder.CreateMemSetInline(Dest, ByteVal, Size);
+    InstForAtomGrp = Builder.CreateMemSetInline(Dest, ByteVal, Size);
     return RValue::get(nullptr);
   }
   case Builtin::BI__builtin___memset_chk: {
@@ -4849,10 +4867,13 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *ByteVal = Builder.CreateTrunc(EmitScalarExpr(E->getArg(1)),
                                          Builder.getInt8Ty());
     Value *SizeVal = llvm::ConstantInt::get(Builder.getContext(), Size);
-    Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
     return RValue::get(Dest, *this);
   }
   case Builtin::BI__builtin_wmemchr: {
+    // FIXME(OCH): Probably ok for none of the inline implementation to be key.
+    // If the result is stored, that store should be a stepping location.
+
     // The MSVC runtime library does not provide a definition of wmemchr, so we
     // need an inline implementation.
     if (!getTarget().getTriple().isOSMSVCRT())
@@ -6462,7 +6483,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Value *Val = EmitScalarExpr(E->getArg(0));
     Address Address = EmitPointerWithAlignment(E->getArg(1));
     Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
-    Builder.CreateStore(HalfVal, Address);
+    InstForAtomGrp = Builder.CreateStore(HalfVal, Address);
     return RValue::get(nullptr);
   }
   case Builtin::BI__builtin_load_half: {
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index bfcbc273dbda7..9c7286147d564 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -17,6 +17,7 @@
 #include "CGBlocks.h"
 #include "CGCXXABI.h"
 #include "CGCleanup.h"
+#include "CGDebugInfo.h"
 #include "CGRecordLayout.h"
 #include "CodeGenFunction.h"
 #include "CodeGenModule.h"
@@ -37,6 +38,7 @@
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/InlineAsm.h"
+#include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/Type.h"
@@ -3883,7 +3885,8 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
 
   // Functions with no result always return void.
   if (!ReturnValue.isValid()) {
-    Builder.CreateRetVoid();
+    auto *I = Builder.CreateRetVoid();
+    addRetToOverrideOrNewSourceAtom(I, nullptr);
     return;
   }
 
@@ -4065,6 +4068,9 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
 
   if (RetDbgLoc)
     Ret->setDebugLoc(std::move(RetDbgLoc));
+
+  llvm::Value *Backup = RV ? Ret->getOperand(0) : nullptr;
+  addRetToOverrideOrNewSourceAtom(cast<llvm::ReturnInst>(Ret), Backup);
 }
 
 void CodeGenFunction::EmitReturnValueCheck(llvm::Value *RV) {
@@ -5829,6 +5835,14 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
                               BundleList);
     EmitBlock(Cont);
   }
+
+  // NOTE(OCH) We only want the group to apply to the call instuction
+  // specifically. N.B. we currently apply is_stmt to all calls at DWARF
+  // emission time. That makes it easy to avoid "over propagating" is_stmt when
+  // calls are lowered. That's easiest, so we continue to do that for now.
+  // FIXME(OCH): Reinstate this once that is no longer the case.
+  // addInstToNewSourceAtom(CI, nullptr);
+
   if (CI->getCalledFunction() && CI->getCalledFunction()->hasName() &&
       CI->getCalledFunction()->getName().starts_with("_Z4sqrt")) {
     SetSqrtFPAccuracy(CI);
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index e54fd543f217b..d69fe1e509ffe 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -1339,6 +1339,7 @@ void CodeGenFunction::EmitCtorPrologue(const CXXConstructorDecl *CD,
     assert(!Member->isBaseInitializer());
     assert(Member->isAnyMemberInitializer() &&
            "Delegating initializer on non-delegating constructor");
+    auto Grp = ApplyAtomGroup(*this);
     CM.addMemberInitializer(Member);
   }
   CM.finish();
diff --git a/clang/lib/CodeGen/CGCleanup.cpp b/clang/lib/CodeGen/CGCleanup.cpp
index 7e1c5b7da9552..8462e4a1f3791 100644
--- a/clang/lib/CodeGen/CGCleanup.cpp
+++ b/clang/lib/CodeGen/CGCleanup.cpp
@@ -17,6 +17,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "CGCleanup.h"
+#include "CGDebugInfo.h"
 #include "CodeGenFunction.h"
 #include "llvm/Support/SaveAndRestore.h"
 
@@ -1118,6 +1119,8 @@ void CodeGenFunction::EmitBranchThroughCleanup(JumpDest Dest) {
 
   // Create the branch.
   llvm::BranchInst *BI = Builder.CreateBr(Dest.getBlock());
+  // This is the primary instruction for this atom, acting as a ret.
+  addInstToCurrentSourceAtom(BI, nullptr);
 
   // Calculate the innermost active normal cleanup.
   EHScopeStack::stable_iterator
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 0e6daa42ee7bf..8bf5d1418f431 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -43,6 +43,7 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/Metadata.h"
@@ -52,6 +53,7 @@
 #include "llvm/Support/SHA1.h"
 #include "llvm/Support/SHA256.h"
 #include "llvm/Support/TimeProfiler.h"
+#include <cstdint>
 #include <optional>
 using namespace clang;
 using namespace clang::CodeGen;
@@ -119,6 +121,144 @@ CGDebugInfo::~CGDebugInfo() {
          "Region stack mismatch, stack not empty!");
 }
 
+void CGDebugInfo::addInstSourceAtomMetadata(llvm::Instruction *I,
+                                            uint64_t Group, uint8_t Rank) {
+  if (!I->getDebugLoc() || Group == 0 || !I->getDebugLoc()->getLine())
+    return;
+
+  // Saturate the 3-bit rank.
+  Rank = std::min<uint8_t>(Rank, 7);
+
+  const llvm::DebugLoc &DL = I->getDebugLoc();
+
+  // Each instruction can only be attributed to one source atom (as an
+  // limitation of the implementation). If this instruction is already
+  // part of a source atom, pick the group in which it has highest
+  // precedence (lowest rank).
+  // TODO(OCH): Is there a better way to handle merging? Ideally we'd like
+  // to be able to have it attributed to both atoms.
+  if (DL.get()->getAtomGroup() && DL.get()->getAtomRank() &&
+      DL.get()->getAtomRank() < Rank) {
+    Group = DL.get()->getAtomGroup();
+    Rank = DL.get()->getAtomRank();
+  }
+
+  // Update the watermark to indicate this Group ID has been used
+  // in this function.
+  HighestEmittedAtomGroup = std::max(Group, HighestEmittedAtomGroup);
+
+  llvm::DILocation *NewDL = llvm::DILocation::get(
+      I->getContext(), DL.getLine(), DL.getCol(), DL.getScope(),
+      DL.getInlinedAt(), DL.isImplicitCode(), Group, Rank);
+  I->setDebugLoc(NewDL);
+};
+
+void CGDebugInfo::addInstToCurrentSourceAtom(llvm::Instruction *KeyInstruction,
+                                             llvm::Value *Backup,
+                                             uint8_t KeyInstRank) {
+  /* TODO(OCH):
+  if (key-instructions-is-not-enabled)
+    return;
+  */
+  uint64_t Group = CurrentAtomGroup;
+  if (!Group)
+    return;
+
+  KeyInstRank += CurrentAtomRankBase;
+  addInstSourceAtomMetadata(KeyInstruction, Group, KeyInstRank);
+
+  llvm::Instruction *BackupI =
+      llvm::dyn_cast_or_null<llvm::Instruction>(Backup);
+  if (!BackupI)
+    return;
+
+  // Add the backup instruction to the group.
+  addInstSourceAtomMetadata(BackupI, Group, /*Rank*/ ++KeyInstRank);
+
+  // Look through chains of casts too, as they're probably going to evaporate.
+  // FIXME(OCH): And other nops like zero length geps?
+  // FIXME(OCH): Should use Cast->isNoopCast()?
+  while (auto *Cast = dyn_cast<llvm::CastInst>(BackupI)) {
+    BackupI = dyn_cast<llvm::Instruction>(Cast->getOperand(0));
+    if (!BackupI)
+      break;
+    addInstSourceAtomMetadata(BackupI, Group, ++KeyInstRank);
+  }
+}
+
+void CGDebugInfo::addRetToOverrideOrNewSourceAtom(llvm::ReturnInst *Ret,
+                                                  llvm::Value *Backup,
+                                                  uint8_t KeyInstRank) {
+  if (RetAtomGroupOverride) {
+    uint64_t CurGrp = CurrentAtomGroup;
+    CurrentAtomGroup = RetAtomGroupOverride;
+    addInstToCurrentSourceAtom(Ret, Backup, KeyInstRank);
+    CurrentAtomGroup = CurGrp;
+    RetAtomGroupOverride = 0;
+  } else {
+    auto Grp = ApplyAtomGroup(this);
+    addInstToCurrentSourceAtom(Ret, Backup, KeyInstRank);
+  }
+}
+
+void CGDebugInfo::setRetInstSourceAtomOverride(uint64_t Group) {
+  assert(RetAtomGroupOverride == 0);
+  RetAtomGroupOverride = Group;
+}
+
+void CGDebugInfo::completeFunction() {
+  // Atoms are identified by a {AtomGroup, InlinedAt} pair, meaning AtomGroup
+  // numbers can be repeated across different functions. LLVM optimisations may
+  // need to assign new AtomGroups. In order to guarentee that those future
+  // transformations keep the numbers within functions unique, we just need to
+  // track the highest number used across all functions.
+  CGM.getLLVMContext().updateAtomGroupWaterline(NextAtomGroup);
+  NextAtomGroup = 1;
+  HighestEmittedAtomGroup = 0;
+  CurrentAtomGroup = 0;
+  RetAtomGroupOverride = 0;
+  CurrentAtomRankBase = 0;
+}
+
+ApplyAtomGroup::ApplyAtomGroup(CodeGenFunction &CGF)
+    : ApplyAtomGroup(CGF.getDebugInfo()) {}
+
+ApplyAtomGroup::ApplyAtomGroup(CGDebugInfo *DI) : DI(DI) {
+  if (!DI)
+    return;
+  OriginalAtomGroup = DI->CurrentAtomGroup;
+  DI->CurrentAtomGroup = DI->NextAtomGroup++;
+  // Reset rank-base as it should only apply to the group it was added to.
+  OriginalRankBase = DI->CurrentAtomRankBase;
+  DI->CurrentAtomRankBase = 0;
+}
+
+ApplyAtomGroup::~ApplyAtomGroup() {
+  if (!DI)
+    return;
+  if (DI->HighestEmittedAtomGroup < DI->NextAtomGroup - 1)
+    DI->NextAtomGroup = DI->HighestEmittedAtomGroup + 1;
+  DI->CurrentAtomGroup = OriginalAtomGroup;
+
+  DI->CurrentAtomRankBase = OriginalRankBase;
+}
+
+IncAtomRank::IncAtomRank(CodeGenFunction &CGF)
+    : IncAtomRank(CGF.getDebugInfo()) {}
+
+IncAtomRank::IncAtomRank(CGDebugInfo *DI) : DI(DI) {
+  if (!DI)
+    return;
+  ++DI->CurrentAtomRankBase;
+}
+
+IncAtomRank::~IncAtomRank() {
+  if (!DI)
+    return;
+  assert(DI->CurrentAtomRankBase);
+  --DI->CurrentAtomRankBase;
+}
+
 ApplyDebugLocation::ApplyDebugLocation(CodeGenFunction &CGF,
                                        SourceLocation TemporaryLocation)
     : CGF(&CGF) {
@@ -174,8 +314,15 @@ ApplyDebugLocation::ApplyDebugLocation(CodeGenFunction &CGF, llvm::DebugLoc Loc)
     return;
   }
   OriginalLocation = CGF.Builder.getCurrentDebugLocation();
-  if (Loc)
+  if (Loc) {
+    // Key Instructions: drop the atom group and rank to avoid accidentally
+    // propagating it around.
+    if (Loc->getAtomGroup())
+      Loc = llvm::DILocation::get(Loc->getContext(), Loc.getLine(),
+                                  Loc->getColumn(), Loc->getScope(),
+                                  Loc->getInlinedAt(), Loc.isImplicitCode());
     CGF.Builder.SetCurrentDebugLocation(std::move(Loc));
+  }
 }
 
 ApplyDebugLocation::~ApplyDebugLocation() {
diff --git a/clang/lib/CodeGen/CGDebugInfo.h b/clang/lib/CodeGen/CGDebugInfo.h
index 38f73eca561b7..997cf0e3db215 100644
--- a/clang/lib/CodeGen/CGDebugInfo.h
+++ b/clang/lib/CodeGen/CGDebugInfo.h
@@ -58,6 +58,9 @@ class CGBlockInfo;
 class CGDebugInfo {
   friend class ApplyDebugLocation;
   friend class SaveAndRestoreLocation;
+  friend class ApplyAtomGroup;
+  friend class IncAtomRank;
+
   CodeGenModule &CGM;
   const llvm::codegenoptions::DebugInfoKind Debu...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/130943