[llvm] f9193f3 - [DebugInfo] Preserve line and column number when merging debug info. (#129960)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 4 09:37:28 PDT 2025
Author: Snehasish Kumar
Date: 2025-04-04T09:37:25-07:00
New Revision: f9193f3b18f08547e2f92b5e354a44655bfc1b94
URL: https://github.com/llvm/llvm-project/commit/f9193f3b18f08547e2f92b5e354a44655bfc1b94
DIFF: https://github.com/llvm/llvm-project/commit/f9193f3b18f08547e2f92b5e354a44655bfc1b94.diff
LOG: [DebugInfo] Preserve line and column number when merging debug info. (#129960)
This patch introduces a new option `-preserve-merged-debug-info` to
preserve an arbitrary but deterministic version of debug information
when DILocations are merged. This is intended to be used in production
environments from which sample based profiles are derived such as
AutoFDO and MemProf.
With this patch we have see a 0.2% improvement on an internal workload
at Google when generating AutoFDO profiles. It also significantly
improves the ability for MemProf by preserving debug info for merged
call instructions used in the contextual profile.
---------
Co-authored-by: Krzysztof Pszeniczny <kpszeniczny at google.com>
Added:
llvm/test/DebugInfo/pick-merged-source-locations.ll
Modified:
llvm/docs/HowToUpdateDebugInfo.rst
llvm/docs/SourceLevelDebugging.rst
llvm/lib/IR/DebugInfoMetadata.cpp
Removed:
################################################################################
diff --git a/llvm/docs/HowToUpdateDebugInfo.rst b/llvm/docs/HowToUpdateDebugInfo.rst
index d8c300f2f3a70..3088f59c1066a 100644
--- a/llvm/docs/HowToUpdateDebugInfo.rst
+++ b/llvm/docs/HowToUpdateDebugInfo.rst
@@ -9,7 +9,8 @@ Introduction
============
Certain kinds of code transformations can inadvertently result in a loss of
-debug info, or worse, make debug info misrepresent the state of a program.
+debug info, or worse, make debug info misrepresent the state of a program. Debug
+info availability is also essential for SamplePGO.
This document specifies how to correctly update debug info in various kinds of
code transformations, and offers suggestions for how to create targeted debug
@@ -89,9 +90,14 @@ has a location with an accurate scope attached, and b) to prevent misleading
single-stepping (or breakpoint) behavior. Often, merged instructions are memory
accesses which can trap: having an accurate scope attached greatly assists in
crash triage by identifying the (possibly inlined) function where the bad
-memory access occurred. This rule is also meant to assist SamplePGO by banning
-scenarios in which a sample of a block containing a merged instruction is
-misattributed to a block containing one of the instructions-to-be-merged.
+memory access occurred.
+
+To maintain distinct source locations for SamplePGO, it is often beneficial to
+retain an arbitrary but deterministic location instead of discarding line and
+column information as part of merging. In particular, loss of location
+information for calls inhibit optimizations such as indirect call promotion.
+This behavior can be optionally enabled until support for accurately
+representing merged instructions in the line table is implemented.
Examples of transformations that should follow this rule include:
diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst
index b3007756a8d07..8a11dcf5254a9 100644
--- a/llvm/docs/SourceLevelDebugging.rst
+++ b/llvm/docs/SourceLevelDebugging.rst
@@ -55,6 +55,8 @@ the stored debug information into source-language specific information. As
such, a debugger must be aware of the source-language, and is thus tied to a
specific language or family of languages.
+.. _intro_consumers:
+
Debug information consumers
---------------------------
@@ -71,6 +73,17 @@ as Visual Studio and WinDBG. LLVM's debug information format is mostly derived
from and inspired by DWARF, but it is feasible to translate into other target
debug info formats such as STABS.
+SamplePGO (also known as `AutoFDO <https://gcc.gnu.org/wiki/AutoFDO>`_)
+is a variant of profile guided optimizations which uses hardware sampling based
+profilers to collect branch frequency data with low overhead in production
+environments. It relies on debug information to associate profile information
+to LLVM IR which is then used to guide optimization heuristics. Maintaining
+deterministic and distinct source locations is necessary to maximize the
+accuracy of mapping hardware sample counts to LLVM IR. For example, DWARF
+`discriminators <https://wiki.dwarfstd.org/Path_Discriminators.md>`_ allow
+SamplePGO to distinguish between multiple paths of execution which map to the
+same source line.
+
It would also be reasonable to use debug information to feed profiling tools
for analysis of generated code, or, tools for reconstructing the original
source from generated code.
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index f8c24d896df32..12aba7d2bd123 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -21,6 +21,7 @@
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"
+#include "llvm/Support/CommandLine.h"
#include <numeric>
#include <optional>
@@ -34,6 +35,12 @@ cl::opt<bool> EnableFSDiscriminator(
cl::desc("Enable adding flow sensitive discriminators"));
} // namespace llvm
+// When true, preserves line and column number by picking one of the merged
+// location info in a deterministic manner to assist sample based PGO.
+static cl::opt<bool> PickMergedSourceLocations(
+ "pick-merged-source-locations", cl::init(false), cl::Hidden,
+ cl::desc("Preserve line and column number when merging locations."));
+
uint32_t DIType::getAlignInBits() const {
return (getTag() == dwarf::DW_TAG_LLVM_ptrauth_type ? 0 : SubclassData32);
}
@@ -125,6 +132,20 @@ DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
if (LocA == LocB)
return LocA;
+ // For some use cases (SamplePGO), it is important to retain distinct source
+ // locations. When this flag is set, we choose arbitrarily between A and B,
+ // rather than computing a merged location using line 0, which is typically
+ // not useful for PGO.
+ if (PickMergedSourceLocations) {
+ auto A = std::make_tuple(LocA->getLine(), LocA->getColumn(),
+ LocA->getDiscriminator(), LocA->getFilename(),
+ LocA->getDirectory());
+ auto B = std::make_tuple(LocB->getLine(), LocB->getColumn(),
+ LocB->getDiscriminator(), LocB->getFilename(),
+ LocB->getDirectory());
+ return A < B ? LocA : LocB;
+ }
+
LLVMContext &C = LocA->getContext();
using LocVec = SmallVector<const DILocation *>;
diff --git a/llvm/test/DebugInfo/pick-merged-source-locations.ll b/llvm/test/DebugInfo/pick-merged-source-locations.ll
new file mode 100644
index 0000000000000..2a9387e039232
--- /dev/null
+++ b/llvm/test/DebugInfo/pick-merged-source-locations.ll
@@ -0,0 +1,77 @@
+;; This test verifies that we assign a deterministic location for merged
+;; instructions when -pick-merged-source-locations is enabled. We use the
+;; simplifycfg pass to test this behaviour since it was a common source of
+;; merged instructions, however we intend this to apply to all users of the
+;; getMergedLocation API.
+
+;; Run simplifycfg and check that only 1 call to bar remains and it's debug
+;; location has a valid line number (lexicographically smallest).
+; RUN: opt %s -passes=simplifycfg -hoist-common-insts -pick-merged-source-locations -S | FileCheck %s --check-prefix=ENABLED
+; ENABLED: call i32 @bar{{.*!dbg !}}[[TAG:[0-9]+]]
+; ENABLED-NOT: call i32 @bar
+; ENABLED: ![[TAG]] = !DILocation(line: 9, column: 16, scope: !9)
+
+;; Run simplifycfg without the pass to ensure that we don't spuriously start
+;; passing the test if simplifycfg behaviour changes.
+; RUN: opt %s -passes=simplifycfg -hoist-common-insts -pick-merged-source-locations=false -S | FileCheck %s --check-prefix=DISABLED
+; DISABLED: call i32 @bar{{.*!dbg !}}[[TAG:[0-9]+]]
+; DISABLED-NOT: call i32 @bar
+; DISABLED: ![[TAG]] = !DILocation(line: 0, scope: !9)
+
+; ModuleID = '../llvm/test/DebugInfo/Inputs/debug-info-merge-call.c'
+source_filename = "../llvm/test/DebugInfo/Inputs/debug-info-merge-call.c"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @test(i32 %n) !dbg !9 {
+entry:
+ %call = call i32 @foo(i32 %n), !dbg !12
+ %cmp1 = icmp sgt i32 %n, 100, !dbg !13
+ br i1 %cmp1, label %if.then, label %if.else, !dbg !13
+
+if.then: ; preds = %entry
+ %call2 = call i32 @bar(i32 %n), !dbg !14
+ %add = add nsw i32 %call2, %call, !dbg !15
+ br label %if.end, !dbg !16
+
+if.else: ; preds = %entry
+ %call4 = call i32 @bar(i32 %n), !dbg !17
+ br label %if.end
+
+if.end: ; preds = %if.else, %if.then
+ %r.0 = phi i32 [ %add, %if.then ], [ %call4, %if.else ], !dbg !18
+ ret i32 %r.0, !dbg !19
+}
+
+declare !dbg !20 i32 @foo(i32)
+
+declare !dbg !21 i32 @bar(i32)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 21.0.0git (git at github.com:snehasish/llvm-project.git 6ce41db6b0275d060d6e60f88b96a1657024345c)", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "../llvm/test/DebugInfo/Inputs/debug-info-merge-call.c", directory: "/usr/local/google/home/snehasishk/working/llvm-project/build-assert", checksumkind: CSK_MD5, checksum: "ac1be6c40dad11691922d600f9d55c55")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"PIE Level", i32 2}
+!7 = !{i32 7, !"uwtable", i32 2}
+!8 = !{!"clang version 21.0.0git (git at github.com:snehasish/llvm-project.git 6ce41db6b0275d060d6e60f88b96a1657024345c)"}
+!9 = distinct !DISubprogram(name: "test", scope: !1, file: !1, line: 5, type: !10, scopeLine: 5, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!10 = !DISubroutineType(types: !11)
+!11 = !{}
+!12 = !DILocation(line: 7, column: 13, scope: !9)
+!13 = !DILocation(line: 8, column: 8, scope: !9)
+!14 = !DILocation(line: 9, column: 16, scope: !9)
+!15 = !DILocation(line: 9, column: 14, scope: !9)
+!16 = !DILocation(line: 10, column: 3, scope: !9)
+!17 = !DILocation(line: 11, column: 10, scope: !9)
+!18 = !DILocation(line: 0, scope: !9)
+!19 = !DILocation(line: 13, column: 3, scope: !9)
+!20 = !DISubprogram(name: "foo", scope: !1, file: !1, line: 2, type: !10, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
+!21 = !DISubprogram(name: "bar", scope: !1, file: !1, line: 1, type: !10, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
+
More information about the llvm-commits
mailing list