[llvm] [DebugInfo][DWARF] Emit Per-Function Line Table Offsets and End Sequences (PR #110192)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 26 17:44:50 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mc
Author: None (alx32)
<details>
<summary>Changes</summary>
**Summary**
This patch introduces a new compiler option `-mllvm -emit-func-debug-line-table-offsets` that enables the emission of per-function line table offsets and end sequences in DWARF debug information. This enhancement allows tools and debuggers to accurately attribute line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker.
**Background**
RFC: [New DWARF Attribute for Symbolication of Merged Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)
Previous similar PR: [#<!-- -->93137](https://github.com/llvm/llvm-project/pull/93137) – This PR was very similar to the current one but at the time, the assembler had no support for emitting labels within the line table. That support was added in PR [#<!-- -->99710](https://github.com/llvm/llvm-project/pull/99710) - and in this PR we use some of the support added in the assembler PR.
In the current implementation, Clang generates line information in the `debug_line` section without directly associating line entries with their originating `DW_TAG_subprogram` DIEs. This can lead to issues when post-compilation optimizations merge functions, resulting in overlapping address ranges and ambiguous line information.
For example, when functions are merged by ICF in LLD, multiple functions may end up sharing the same address range. Without explicit linkage between functions and their line entries, tools cannot accurately attribute line information to the correct function, adversely affecting debugging and call stack resolution.
**Implementation Details**
To address the above issue, the patch makes the following key changes:
**`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE. This attribute holds a label pointing to the offset in the line table where the function's line entries begin.
**End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after each function's line entries in the line table. This marks the end of the line information for that function, ensuring that line entries are correctly delimited.
**Assembler and Streamer Modifications**: Modifies the MCStreamer and related classes to support emitting the necessary labels and tracking the current function's line entries. A new flag GenerateFuncLineTableOffsets is added to control this behavior.
**Compiler Option**: Introduces the `-mllvm -emit-func-debug-line-table-offsets` option to enable this functionality, allowing users to opt-in as needed.
---
Full diff: https://github.com/llvm/llvm-project/pull/110192.diff
7 Files Affected:
- (modified) llvm/include/llvm/BinaryFormat/Dwarf.def (+1)
- (modified) llvm/include/llvm/MC/MCDwarf.h (+4-1)
- (modified) llvm/include/llvm/MC/MCStreamer.h (+27)
- (modified) llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (+8)
- (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+29-1)
- (modified) llvm/lib/MC/MCDwarf.cpp (+18-4)
- (added) llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll (+82)
``````````diff
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.def b/llvm/include/llvm/BinaryFormat/Dwarf.def
index d55947fc5103ac..b1fa81a2fc6abd 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.def
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.def
@@ -617,6 +617,7 @@ HANDLE_DW_AT(0x3e07, LLVM_apinotes, 0, APPLE)
HANDLE_DW_AT(0x3e08, LLVM_ptrauth_isa_pointer, 0, LLVM)
HANDLE_DW_AT(0x3e09, LLVM_ptrauth_authenticates_null_values, 0, LLVM)
HANDLE_DW_AT(0x3e0a, LLVM_ptrauth_authentication_mode, 0, LLVM)
+HANDLE_DW_AT(0x3e0b, LLVM_stmt_sequence, 0, LLVM)
// Apple extensions.
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index bea79545d1ab96..e7e1bef1ad2d72 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -123,6 +123,9 @@ class MCDwarfLoc {
friend class MCContext;
friend class MCDwarfLineEntry;
+ // DwarfDebug::endFunctionImpl needs to construct MCDwarfLoc(IsEndOfFunction)
+ friend class DwarfDebug;
+
MCDwarfLoc(unsigned fileNum, unsigned line, unsigned column, unsigned flags,
unsigned isa, unsigned discriminator)
: FileNum(fileNum), Line(line), Column(column), Flags(flags), Isa(isa),
@@ -239,7 +242,7 @@ class MCLineSection {
// Add an end entry by cloning the last entry, if exists, for the section
// the given EndLabel belongs to. The label is replaced by the given EndLabel.
- void addEndEntry(MCSymbol *EndLabel);
+ void addEndEntry(MCSymbol *EndLabel, bool generatingFuncLineTableOffsets);
using MCDwarfLineEntryCollection = std::vector<MCDwarfLineEntry>;
using iterator = MCDwarfLineEntryCollection::iterator;
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 707aecc5dc578e..d6d5970917401d 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -251,6 +251,15 @@ class MCStreamer {
/// discussion for future inclusion.
bool AllowAutoPadding = false;
+ // Flag specifying weather functions will have an offset into the line table
+ // where the line data for that function starts
+ bool GenerateFuncLineTableOffsets = false;
+
+ // Symbol that tracks the stream symbol for first line of the current function
+ // being generated. This symbol can be used to reference where the line
+ // entries for the function start in the generated line table.
+ MCSymbol *CurrentFuncFirstLineStreamSym;
+
protected:
MCFragment *CurFrag = nullptr;
@@ -313,6 +322,24 @@ class MCStreamer {
void setAllowAutoPadding(bool v) { AllowAutoPadding = v; }
bool getAllowAutoPadding() const { return AllowAutoPadding; }
+ void setGenerateFuncLineTableOffsets(bool v) {
+ GenerateFuncLineTableOffsets = v;
+ }
+ bool getGenerateFuncLineTableOffsets() const {
+ return GenerateFuncLineTableOffsets;
+ }
+
+ // Use the below functions to track the symbol that points to the current
+ // function's line info in the output stream.
+ void beginFunction() { CurrentFuncFirstLineStreamSym = nullptr; }
+ void emittedLineStreamSym(MCSymbol *StreamSym) {
+ if (!CurrentFuncFirstLineStreamSym)
+ CurrentFuncFirstLineStreamSym = StreamSym;
+ }
+ MCSymbol *getCurrentFuncFirstLineStreamSym() {
+ return CurrentFuncFirstLineStreamSym;
+ }
+
/// When emitting an object file, create and emit a real label. When emitting
/// textual assembly, this should do nothing to avoid polluting our output.
virtual MCSymbol *emitCFILabel();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index 0a1ff189bedbc4..c62075cf77c45a 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -527,6 +527,14 @@ DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) {
*DD->getCurrentFunction()))
addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr);
+ if (Asm->OutStreamer->getGenerateFuncLineTableOffsets() &&
+ Asm->OutStreamer->getCurrentFuncFirstLineStreamSym()) {
+ addSectionLabel(
+ *SPDie, dwarf::DW_AT_LLVM_stmt_sequence,
+ Asm->OutStreamer->getCurrentFuncFirstLineStreamSym(),
+ Asm->getObjFileLowering().getDwarfLineSection()->getBeginSymbol());
+ }
+
// Only include DW_AT_frame_base in full debug info
if (!includeMinimalInlineScopes()) {
const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering();
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index e9649f9ff81658..bd6d5e0ea7a363 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -170,6 +170,12 @@ static cl::opt<DwarfDebug::MinimizeAddrInV5> MinimizeAddrInV5Option(
"Stuff")),
cl::init(DwarfDebug::MinimizeAddrInV5::Default));
+static cl::opt<bool> EmitFuncLineTableOffsetsOption(
+ "emit-func-debug-line-table-offsets", cl::Hidden,
+ cl::desc("Include line table offset in function's debug info and emit end "
+ "sequence after each function's line data."),
+ cl::init(false));
+
static constexpr unsigned ULEB128PadSize = 4;
void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {
@@ -443,6 +449,8 @@ DwarfDebug::DwarfDebug(AsmPrinter *A)
Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion);
Asm->OutStreamer->getContext().setDwarfFormat(Dwarf64 ? dwarf::DWARF64
: dwarf::DWARF32);
+ Asm->OutStreamer->setGenerateFuncLineTableOffsets(
+ EmitFuncLineTableOffsetsOption);
}
// Define out of line so we don't have to include DwarfUnit.h in DwarfDebug.h.
@@ -2221,6 +2229,10 @@ void DwarfDebug::beginFunctionImpl(const MachineFunction *MF) {
if (SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug)
return;
+ // Notify the streamer that we are beginning a function - this will reset the
+ // label pointing to the currently generated function's first line entry
+ Asm->OutStreamer->beginFunction();
+
DwarfCompileUnit &CU = getOrCreateDwarfCompileUnit(SP->getUnit());
Asm->OutStreamer->getContext().setDwarfCompileUnitID(
@@ -2249,7 +2261,8 @@ void DwarfDebug::terminateLineTable(const DwarfCompileUnit *CU) {
getDwarfCompileUnitIDForLineTable(*CU));
// Add the last range label for the given CU.
LineTable.getMCLineSections().addEndEntry(
- const_cast<MCSymbol *>(CURanges.back().End));
+ const_cast<MCSymbol *>(CURanges.back().End),
+ EmitFuncLineTableOffsetsOption);
}
void DwarfDebug::skippedNonDebugFunction() {
@@ -2342,6 +2355,21 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
// Construct call site entries.
constructCallSiteEntryDIEs(*SP, TheCU, ScopeDIE, *MF);
+ // If we're emitting line table offsets, we also need to emit an end label
+ // after all function's line entries
+ if (EmitFuncLineTableOffsetsOption) {
+ MCSymbol *LineSym = Asm->OutStreamer->getContext().createTempSymbol();
+ Asm->OutStreamer->emitLabel(LineSym);
+ MCDwarfLoc DwarfLoc(
+ 1, 1, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, 0, 0);
+ MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+ Asm->OutStreamer->getContext()
+ .getMCDwarfLineTable(
+ Asm->OutStreamer->getContext().getDwarfCompileUnitID())
+ .getMCLineSections()
+ .addLineEntry(LineEntry, Asm->OutStreamer->getCurrentSectionOnly());
+ }
+
// Clear debug info
// Ownership of DbgVariables is a bit subtle - ScopeVariables owns all the
// DbgVariables except those that are also in AbstractVariables (since they
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index 8ff097f29aebd1..34a9541bbbcc3a 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -104,8 +104,17 @@ void MCDwarfLineEntry::make(MCStreamer *MCOS, MCSection *Section) {
// Get the current .loc info saved in the context.
const MCDwarfLoc &DwarfLoc = MCOS->getContext().getCurrentDwarfLoc();
+ MCSymbol *LineStreamLabel = nullptr;
+ // If functions need offsets into the generated line table, then we need to
+ // create a label referencing where the line was generated in the output
+ // stream
+ if (MCOS->getGenerateFuncLineTableOffsets()) {
+ LineStreamLabel = MCOS->getContext().createTempSymbol();
+ MCOS->emittedLineStreamSym(LineStreamLabel);
+ }
+
// Create a (local) line entry with the symbol and the current .loc info.
- MCDwarfLineEntry LineEntry(LineSym, DwarfLoc);
+ MCDwarfLineEntry LineEntry(LineSym, DwarfLoc, LineStreamLabel);
// clear DwarfLocSeen saying the current .loc info is now used.
MCOS->getContext().clearDwarfLocSeen();
@@ -145,7 +154,8 @@ makeStartPlusIntExpr(MCContext &Ctx, const MCSymbol &Start, int IntVal) {
return Res;
}
-void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
+void MCLineSection::addEndEntry(MCSymbol *EndLabel,
+ bool generatingFuncLineTableOffsets) {
auto *Sec = &EndLabel->getSection();
// The line table may be empty, which we should skip adding an end entry.
// There are two cases:
@@ -158,8 +168,12 @@ void MCLineSection::addEndEntry(MCSymbol *EndLabel) {
if (I != MCLineDivisions.end()) {
auto &Entries = I->second;
auto EndEntry = Entries.back();
- EndEntry.setEndLabel(EndLabel);
- Entries.push_back(EndEntry);
+ // If generatingFuncLineTableOffsets is set, then we already generated an
+ // end label at the end of the last function, so skip generating another one
+ if (!generatingFuncLineTableOffsets) {
+ EndEntry.setEndLabel(EndLabel);
+ Entries.push_back(EndEntry);
+ }
}
}
diff --git a/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
new file mode 100644
index 00000000000000..ef8b0c817cfb67
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
@@ -0,0 +1,82 @@
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=NO_STMT_SEQ
+
+; RUN: llc -mtriple=i686-w64-mingw32 -o %t -filetype=obj %s -emit-func-debug-line-table-offsets
+; RUN: llvm-dwarfdump -v -all %t | FileCheck %s -check-prefix=STMT_SEQ
+
+; NO_STMT_SEQ-NOT: DW_AT_LLVM_stmt_sequence
+
+; STMT_SEQ: [[[ABBREV_CODE:[0-9]+]]] DW_TAG_subprogram
+; STMT_SEQ: DW_AT_LLVM_stmt_sequence DW_FORM_sec_offset
+; STMT_SEQ: DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x00000028)
+; STMT_SEQ: DW_AT_name {{.*}}func01
+; STMT_SEQ: DW_TAG_subprogram [[[ABBREV_CODE]]]
+; STMT_SEQ: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x00000033)
+; STMT_SEQ: DW_AT_name {{.*}}main
+
+;; Check that the line table starts at 0x00000028 (first function)
+; STMT_SEQ: Address Line Column File ISA Discriminator OpIndex Flags
+; STMT_SEQ-NEXT: ------------------ ------ ------ ------ --- ------------- ------- -------------
+; STMT_SEQ-NEXT: 0x00000028: 00 DW_LNE_set_address (0x00000006)
+
+;; Check that we have an 'end_sequence' just before the next function (0x00000033)
+; STMT_SEQ: 0x0000000000000006 1 0 1 0 0 0 is_stmt end_sequence
+; STMT_SEQ-NEXT: 0x00000033: 00 DW_LNE_set_address (0x00000027)
+
+;; Check that the end of the line table still has an 'end_sequence'
+; STMT_SEQ 0x00000049: 00 DW_LNE_end_sequence
+; STMT_SEQ-NEXT 0x0000000000000027 6 3 1 0 0 0 end_sequence
+
+
+; generated from:
+; clang -g -S -emit-llvm test.c -o test.ll
+; ======= test.c ======
+; int func01() {
+; return 1;
+; }
+; int main() {
+; return 0;
+; }
+; =====================
+
+
+; ModuleID = 'test.c'
+source_filename = "test.c"
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @func01() #0 !dbg !9 {
+ ret i32 1, !dbg !13
+}
+
+; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
+define i32 @main() #0 !dbg !14 {
+ %1 = alloca i32, align 4
+ store i32 0, ptr %1, align 4
+ ret i32 0, !dbg !15
+}
+
+attributes #0 = { noinline nounwind optnone ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "Homebrew clang version 17.0.6", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: Apple, sysroot: "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk", sdk: "MacOSX14.sdk")
+!1 = !DIFile(filename: "test.c", directory: "/tmp/clang_test")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"uwtable", i32 1}
+!7 = !{i32 7, !"frame-pointer", i32 1}
+!8 = !{!"Homebrew clang version 17.0.6"}
+!9 = distinct !DISubprogram(name: "func01", scope: !1, file: !1, line: 1, type: !10, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{!12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !DILocation(line: 2, column: 3, scope: !9)
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 5, type: !10, scopeLine: 5, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DILocation(line: 6, column: 3, scope: !14)
``````````
</details>
https://github.com/llvm/llvm-project/pull/110192
More information about the llvm-commits
mailing list