[llvm] [Docs][DebugInfo][RemoveDIs] Document some debug-info transition info (PR #79167)

Tue Jan 23 09:24:33 PST 2024

https://github.com/jmorse updated https://github.com/llvm/llvm-project/pull/79167

>From 56c15aaa396c7eb344da47e973669de1319d0561 Mon Sep 17 00:00:00 2001
From: Jeremy Morse <jeremy.morse at sony.com>
Date: Tue, 23 Jan 2024 16:26:55 +0000
Subject: [PATCH 1/2] [Docs][DebugInfo][RemoveDIs] Document some debug-info
 transition info

This is a high level description and FAQ for what we're doing in RemoveDIs,
and how old code should be behave with new debug-info (exactly the same 99%
of the time).
---
 llvm/docs/RemoveDIsDebugInfo.md | 106 ++++++++++++++++++++++++++++++++
 llvm/docs/UserGuides.rst        |   5 ++
 2 files changed, 111 insertions(+)
 create mode 100644 llvm/docs/RemoveDIsDebugInfo.md

diff --git a/llvm/docs/RemoveDIsDebugInfo.md b/llvm/docs/RemoveDIsDebugInfo.md
new file mode 100644
index 00000000000000..21cbfba2927bd8
--- /dev/null
+++ b/llvm/docs/RemoveDIsDebugInfo.md
@@ -0,0 +1,106 @@
+# What's all this then?
+
+We're planning on removing debug-info intrinsics from LLVM, as they're slow, unweildy and can confuse optimisation passes if they're not expecting them. Instead of having a sequence of instructions that looks like this:
+
+```text
+    %add = add i32 %foo, %bar
+    call void @llvm.dbg.value(metadata %add, ...
+    %sub = sub i32 %add, %tosub
+    call void @llvm.dbg.value(metadata %sub, ...
+    call void @a_normal_function()
+```
+
+with dbg.value intrinsics representing debug-info records, it would instead be printed as:
+
+```text
+    %add = add i32 %foo, %bar
+      #dbg_value(%add, ...
+    %sub = sub i32 %add, %tosub
+      #dbg_value(%sub, ...
+    call void @a_normal_function()
+```
+
+Where the debug records are not instructions, do not appear in the instruction list, and won't appear in your optimisation passes unless you go digging for them deliberately.
+
+# Great, what do I need to do!
+
+Approximately nothing -- we've already instrumented all of LLVM to handle these new records ("DPValues") and behave identically to past LLVM behaviour. We plan on turning this on by default some time soon, with IR converted to the intrinsic-form of debug-info at terminals (textual IR, Bitcode) for a short while, before then changing the textual IR and Bitcode formats.
+
+There are two significant changes to be aware of. Firstly, we're adding a single bit of debug-info relevant data to the BasicBlock::iterator class (it's so that we can determine whether ranges intend on including debug-info at the beginning of a block or not). That means when writing passes that insert LLVM-IR instructions, you need to identify positions with BasicBlock::iterator rather than just a bare Instruction *. Most of the time this means that after identifying where you intend on inserting something, you must also call getIterator on the instruction position -- however when inserting at the start of a block you _must_ use getFirstInsertionPt, getFirstNonPHIIt or begin and use that iterator to insert, rather than just fetching a pointer to the first instruction.
+
+The second matter is that if you transfer sequences of instructions from one place to another manually, i.e. repeatedly using moveBefore where you might have used splice, then you should instead use the method moveBeforePreserving. moveBeforePreserving will transfer debug-info records with the instruction they're attached to. This is something that happens automatically today -- if you use moveBefore on every element of an instruction sequence, then debug intrinsics will be moved in the normal course of your code, but we lose this behaviour with non-instruction debug-info.
+
+# Anything else?
+
+Not really, but here's an "old vs new" comparison of how to do certain things and quickstart for how this "new" debug-info is structured.
+
+## Skipping debug records, ignoring debug-uses of Values, stably counting instructions...
+
+This will all happen transparently without needing to think about it!
+
+## What exactly have you replaced debug intrinsics with?
+
+We're using a dedicated C++ class called DPValue to store debug-info, with a one-to-one relationship between each instance of a debug intrinsic and each DPValue object in any LLVM-IR program. This class stores exactly the same information as is stored in debugging intrinsics. It also has almost entirely the same set of methods, that behave in the same way:
+
+  https://llvm.org/docs/doxygen/classllvm_1_1DPValue.html
+
+This allows you to treat a DPValue as if it's a dbg.value intrinsic most of the time, for example in generic (auto-param) lambdas.
+
+## How do these DPValues fit into the instruction stream?
+
+Like so:
+
+```text
+                 +---------------+          +---------------+
+---------------->|  Instruction  +--------->|  Instruction  |
+                 +-------+-------+          +---------------+
+                         |
+                         |
+                         |
+                         |
+                         v
+                  +------------+
+            <-----+  DPMarker  |<----
+           /      +------------+     \
+          /                           \
+         /                             \
+        v                               ^
+ +-----------+    +-----------+   +-----------+
+ |  DPValue  +--->|  DPValue  +-->|  DPValue  |
+ +-----------+    +-----------+   +-----------+
+```
+
+Each instruction has a pointer to a DPMarker (which will become optional), that contains a list of DPValue objects. No debugging records appear in the in struction list at all. DPValues have a parent pointer to their owning DPMarker, and each DPMarker has a pointer back to it's owning instruction.
+
+Not shown are the links from DPValues to other parts of the Value/Metadata hierachy: DPValues have raw pointers to DILocalVariable, DIExpression and DILocation objects, and references to Values are stored in a DebugValueUser base class. This refers to a ValueAsMetadata object referring to Values, via the TrackingMetadata facility.
+
+The various kinds of debug intrinsic (value, declare, assign) are all stored in the DPValue object, with a "Type" field disamgibuating which is which.
+
+## Finding debug-info records
+
+Utilities such as findDbgUsers and the like now have an optional argument that will return the set of DPValue records that refer to a Value. You should be able to treat them the same as intrinsics.
+
+## Examining debug-info records at positions
+
+Call "Instruction::getDbgValueRange()" to get the range of DPValue objects that are attached to an instruction.
+
+## Moving around, deleting
+
+You can use DPValue::removeFromParent to unlink a DPValue from it's marker, and then BasicBlock::insertDPValueBefore or BasicBlock::insertDPValueAfter to re-insert the DPValue somewhere else. You cannot insert a DPValue at an arbitary point in a list of DPValues (if you're doing this with dbg.values then it's unlikely to be correct).
+
+Erase DPValues by calling eraseFromParent or deleteInstr if it's already been removed.
+
+## What about dangling DPValues?
+
+If you have a block like so:
+
+```text
+    foo:
+      %bar = add i32 %baz...
+      dbg.value(metadata i32 %bar,...
+      br label %xyzzy
+```
+
+your optimisation pass may wish to erase the terminator and then do something to the block. This is easy to do when debug-info is kept in instructions, but with DPValues there is no trailing instruction to attach the variable information to in the lbock above, once the terminator is erased. For such degenerate blocks, DPValues are stored temporarily in a map in LLVMContext, and are re-inserted when a terminator is reinserted to the block or other instruction inserted at end().
+
+This can technically lead to trouble in the vanishingly rare scenario where an optimisation pass erases a terminator and then decides to erase the whole block. (We recommend not doing that).
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 2f450ef46025aa..eb459b69a49a3f 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -62,6 +62,7 @@ intermediate LLVM representation.
    ReportingGuide
    ResponseGuide
    Remarks
+   RemoveDIsDebugInfo
    RISCVUsage
    SourceLevelDebugging
    SPIRVUsage
@@ -178,6 +179,10 @@ Optimizations
    referencing, to determine variable locations for debug info in the final
    stages of compilation.
 
+:doc:`RemoveDIsDebugInfo`
+   This is a migration guide describing how to move from debug-info using
+   intrinsics such as dbg.value to using the non-instruction DPValue object.
+
 :doc:`InstrProfileFormat`
    This document explains two binary formats of instrumentation-based profiles.
 

>From f11c6ec3f95c0957a6ef51e1f0ea4f78326b34f3 Mon Sep 17 00:00:00 2001
From: Jeremy Morse <jeremy.morse at sony.com>
Date: Tue, 23 Jan 2024 17:24:01 +0000
Subject: [PATCH 2/2] Address feedback from jryans

---
 llvm/docs/RemoveDIsDebugInfo.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/docs/RemoveDIsDebugInfo.md b/llvm/docs/RemoveDIsDebugInfo.md
index 21cbfba2927bd8..dcf6b48b772370 100644
--- a/llvm/docs/RemoveDIsDebugInfo.md
+++ b/llvm/docs/RemoveDIsDebugInfo.md
@@ -1,6 +1,6 @@
 # What's all this then?
 
-We're planning on removing debug-info intrinsics from LLVM, as they're slow, unweildy and can confuse optimisation passes if they're not expecting them. Instead of having a sequence of instructions that looks like this:
+We're planning on removing debug info intrinsics from LLVM, as they're slow, unwieldy and can confuse optimisation passes if they're not expecting them. Instead of having a sequence of instructions that looks like this:
 
 ```text
     %add = add i32 %foo, %bar
@@ -10,7 +10,7 @@ We're planning on removing debug-info intrinsics from LLVM, as they're slow, unw
     call void @a_normal_function()
 ```
 
-with dbg.value intrinsics representing debug-info records, it would instead be printed as:
+with dbg.value intrinsics representing debug info records, it would instead be printed as:
 
 ```text
     %add = add i32 %foo, %bar
@@ -20,19 +20,19 @@ with dbg.value intrinsics representing debug-info records, it would instead be p
     call void @a_normal_function()
 ```
 
-Where the debug records are not instructions, do not appear in the instruction list, and won't appear in your optimisation passes unless you go digging for them deliberately.
+The debug records are not instructions, do not appear in the instruction list, and won't appear in your optimisation passes unless you go digging for them deliberately.
 
 # Great, what do I need to do!
 
-Approximately nothing -- we've already instrumented all of LLVM to handle these new records ("DPValues") and behave identically to past LLVM behaviour. We plan on turning this on by default some time soon, with IR converted to the intrinsic-form of debug-info at terminals (textual IR, Bitcode) for a short while, before then changing the textual IR and Bitcode formats.
+Approximately nothing -- we've already instrumented all of LLVM to handle these new records ("DPValues") and behave identically to past LLVM behaviour. We plan on turning this on by default some time soon, with IR converted to the intrinsic form of debug info at terminals (textual IR, bitcode) for a short while, before then changing the textual IR and bitcode formats.
 
-There are two significant changes to be aware of. Firstly, we're adding a single bit of debug-info relevant data to the BasicBlock::iterator class (it's so that we can determine whether ranges intend on including debug-info at the beginning of a block or not). That means when writing passes that insert LLVM-IR instructions, you need to identify positions with BasicBlock::iterator rather than just a bare Instruction *. Most of the time this means that after identifying where you intend on inserting something, you must also call getIterator on the instruction position -- however when inserting at the start of a block you _must_ use getFirstInsertionPt, getFirstNonPHIIt or begin and use that iterator to insert, rather than just fetching a pointer to the first instruction.
+There are two significant changes to be aware of. Firstly, we're adding a single bit of debug relevant data to the BasicBlock::iterator class (it's so that we can determine whether ranges intend on including debug info at the beginning of a block or not). That means when writing passes that insert LLVM-IR instructions, you need to identify positions with BasicBlock::iterator rather than just a bare Instruction *. Most of the time this means that after identifying where you intend on inserting something, you must also call getIterator on the instruction position -- however when inserting at the start of a block you _must_ use getFirstInsertionPt, getFirstNonPHIIt or begin and use that iterator to insert, rather than just fetching a pointer to the first instruction.
 
-The second matter is that if you transfer sequences of instructions from one place to another manually, i.e. repeatedly using moveBefore where you might have used splice, then you should instead use the method moveBeforePreserving. moveBeforePreserving will transfer debug-info records with the instruction they're attached to. This is something that happens automatically today -- if you use moveBefore on every element of an instruction sequence, then debug intrinsics will be moved in the normal course of your code, but we lose this behaviour with non-instruction debug-info.
+The second matter is that if you transfer sequences of instructions from one place to another manually, i.e. repeatedly using `moveBefore` where you might have used `splice`, then you should instead use the method `moveBeforePreserving`. `moveBeforePreserving` will transfer debug info records with the instruction they're attached to. This is something that happens automatically today -- if you use `moveBefore` on every element of an instruction sequence, then debug intrinsics will be moved in the normal course of your code, but we lose this behaviour with non-instruction debug info.
 
 # Anything else?
 
-Not really, but here's an "old vs new" comparison of how to do certain things and quickstart for how this "new" debug-info is structured.
+Not really, but here's an "old vs new" comparison of how to do certain things and quickstart for how this "new" debug info is structured.
 
 ## Skipping debug records, ignoring debug-uses of Values, stably counting instructions...
 
@@ -40,7 +40,7 @@ This will all happen transparently without needing to think about it!
 
 ## What exactly have you replaced debug intrinsics with?
 
-We're using a dedicated C++ class called DPValue to store debug-info, with a one-to-one relationship between each instance of a debug intrinsic and each DPValue object in any LLVM-IR program. This class stores exactly the same information as is stored in debugging intrinsics. It also has almost entirely the same set of methods, that behave in the same way:
+We're using a dedicated C++ class called DPValue to store debug info, with a one-to-one relationship between each instance of a debug intrinsic and each DPValue object in any LLVM-IR program. This class stores exactly the same information as is stored in debugging intrinsics. It also has almost entirely the same set of methods, that behave in the same way:
 
   https://llvm.org/docs/doxygen/classllvm_1_1DPValue.html
 
@@ -76,11 +76,11 @@ Not shown are the links from DPValues to other parts of the Value/Metadata hiera
 
 The various kinds of debug intrinsic (value, declare, assign) are all stored in the DPValue object, with a "Type" field disamgibuating which is which.
 
-## Finding debug-info records
+## Finding debug info records
 
 Utilities such as findDbgUsers and the like now have an optional argument that will return the set of DPValue records that refer to a Value. You should be able to treat them the same as intrinsics.
 
-## Examining debug-info records at positions
+## Examining debug info records at positions
 
 Call "Instruction::getDbgValueRange()" to get the range of DPValue objects that are attached to an instruction.
 
@@ -101,6 +101,6 @@ If you have a block like so:
       br label %xyzzy
 ```
 
-your optimisation pass may wish to erase the terminator and then do something to the block. This is easy to do when debug-info is kept in instructions, but with DPValues there is no trailing instruction to attach the variable information to in the lbock above, once the terminator is erased. For such degenerate blocks, DPValues are stored temporarily in a map in LLVMContext, and are re-inserted when a terminator is reinserted to the block or other instruction inserted at end().
+your optimisation pass may wish to erase the terminator and then do something to the block. This is easy to do when debug info is kept in instructions, but with DPValues there is no trailing instruction to attach the variable information to in the lbock above, once the terminator is erased. For such degenerate blocks, DPValues are stored temporarily in a map in LLVMContext, and are re-inserted when a terminator is reinserted to the block or other instruction inserted at end().
 
 This can technically lead to trouble in the vanishingly rare scenario where an optimisation pass erases a terminator and then decides to erase the whole block. (We recommend not doing that).