[llvm] [DWARFVerifier] Fix debug_str_offsets DWARF version detection (PR #81210)

Felipe de Azevedo Piovezan via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 8 17:01:28 PST 2024


https://github.com/felipepiovezan created https://github.com/llvm/llvm-project/pull/81210

The DWARF 5 debug_str_offsets section starts with a header, which must be skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section, although it's not clear where this is documented.

How does The DWARF verifier figure out which version to use? It manually reads the **first** header in debug_info and uses that. This is wrong when multiple debug_str_offset sections have been linked together, in particular it is wrong in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less debug_str_offsets section) linked with a standard DWARF 5 object file.

This patch provides a quick to fix case (1): we use the `MaxVersion` from the DWARFContext instead of reading it from the debug_info section manually. Since this is dealing with standard-conforming formats, which should be linked together without issues, the verifier must handle it.

Fixing case 2 would require a lot of rework, restructuring how each piece of the debug_str_offsets is visited and, since this is dealing with a non-standard format, is left for the future in case anyone cares enough about this case.

>From 9ed2b48d68a56d023733c42817c36e7cf5bdf99b Mon Sep 17 00:00:00 2001
From: Felipe de Azevedo Piovezan <fpiovezan at apple.com>
Date: Tue, 6 Feb 2024 12:36:15 -0800
Subject: [PATCH] [DWARFVerifier] Fix debug_str_offsets DWARF version detection

The DWARF 5 debug_str_offsets section starts with a header, which must be
skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this section
(with the same section name), which does not have a header. In this case, the
offsets start on the first byte of the section, although it's not clear where
this is documented.

How does The DWARF verifier figure out which version to use? It manually reads
the **first** header in debug_info and uses that. This is wrong when multiple
debug_str_offset sections have been linked together, in particular it is wrong
in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a
standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

This patch provides a quick to fix case (1): we use the `MaxVersion` from the
DWARFContext instead of reading it from the debug_info section manually. Since
this is dealing with standard-conforming formats, which should be linked
together without issues, the verifier must handle it.

Fixing case 2 would require a lot of rework, restructuring how each piece of the
debug_str_offsets is visited and, since this is dealing with a non-standard
format, is left for the future in case anyone cares enough about this case.
---
 .../llvm/DebugInfo/DWARF/DWARFVerifier.h      |  3 +-
 llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp    | 25 ++++----
 .../debug-str-offsets-mixed-dwarf-4-5.yaml    | 57 +++++++++++++++++++
 3 files changed, 73 insertions(+), 12 deletions(-)
 create mode 100644 llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml

diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
index ea73664b1e46ca..6c5df409fe6de8 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
@@ -361,7 +361,8 @@ class DWARFVerifier {
   /// \returns true if the .debug_line verifies successfully, false otherwise.
   bool handleDebugStrOffsets();
   bool verifyDebugStrOffsets(
-      StringRef SectionName, const DWARFSection &Section, StringRef StrData,
+      uint8_t MaxVersion, StringRef SectionName, const DWARFSection &Section,
+      StringRef StrData,
       void (DWARFObject::*)(function_ref<void(const DWARFSection &)>) const);
 
   /// Emits any aggregate information collected, depending on the dump options
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 2124ff835c5727..805f40af217e1e 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -1882,29 +1882,31 @@ bool DWARFVerifier::handleDebugStrOffsets() {
   const DWARFObject &DObj = DCtx.getDWARFObj();
   bool Success = true;
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets.dwo", DObj.getStrOffsetsDWOSection(),
-      DObj.getStrDWOSection(), &DWARFObject::forEachInfoDWOSections);
+      DCtx.getMaxDWOVersion(), ".debug_str_offsets.dwo",
+      DObj.getStrOffsetsDWOSection(), DObj.getStrDWOSection(),
+      &DWARFObject::forEachInfoDWOSections);
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets", DObj.getStrOffsetsSection(), DObj.getStrSection(),
-      &DWARFObject::forEachInfoSections);
+      DCtx.getMaxVersion(), ".debug_str_offsets", DObj.getStrOffsetsSection(),
+      DObj.getStrSection(), &DWARFObject::forEachInfoSections);
   return Success;
 }
 
 bool DWARFVerifier::verifyDebugStrOffsets(
-    StringRef SectionName, const DWARFSection &Section, StringRef StrData,
+    uint8_t MaxVersion, StringRef SectionName, const DWARFSection &Section,
+    StringRef StrData,
     void (DWARFObject::*VisitInfoSections)(
         function_ref<void(const DWARFSection &)>) const) {
   const DWARFObject &DObj = DCtx.getDWARFObj();
-  uint16_t InfoVersion = 0;
-  DwarfFormat InfoFormat = DwarfFormat::DWARF32;
+
+  std::optional<DwarfFormat> MaybeInfoFormat;
   (DObj.*VisitInfoSections)([&](const DWARFSection &S) {
-    if (InfoVersion)
+    if (MaybeInfoFormat)
       return;
     DWARFDataExtractor DebugInfoData(DObj, S, DCtx.isLittleEndian(), 0);
     uint64_t Offset = 0;
-    InfoFormat = DebugInfoData.getInitialLength(&Offset).second;
-    InfoVersion = DebugInfoData.getU16(&Offset);
+    MaybeInfoFormat = DebugInfoData.getInitialLength(&Offset).second;
   });
+  DwarfFormat InfoFormat = MaybeInfoFormat.value_or(DwarfFormat::DWARF32);
 
   DWARFDataExtractor DA(DObj, Section, DCtx.isLittleEndian(), 0);
 
@@ -1915,7 +1917,8 @@ bool DWARFVerifier::verifyDebugStrOffsets(
     DwarfFormat Format;
     uint64_t Length;
     uint64_t StartOffset = C.tell();
-    if (InfoVersion == 4) {
+    if (MaxVersion == 4) {
+      // Pre-standardization debug_str_offsets had no header.
       Format = InfoFormat;
       Length = DA.getData().size();
       NextUnit = C.tell() + Length;
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
new file mode 100644
index 00000000000000..d10460896171d6
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
@@ -0,0 +1,57 @@
+# RUN: yaml2obj %s -o %t.o
+# RUN: llvm-dwarfdump -debug-str-offsets -verify %t.o | FileCheck %s
+
+# CHECK: Verifying .debug_str_offsets...
+# CHECK: No errors
+
+# Check that when mixing standard DWARF 4 debug information with standard DWARF
+# 5 debug information, the verifier correctly interprets the debug_str_offsets
+# section as a standards-conforming DWARF 5 section.
+
+--- !ELF
+FileHeader:
+  Class: ELFCLASS64
+  Data:  ELFDATA2LSB
+  Type:  ET_EXEC
+DWARF:
+  debug_str:
+    - 'cu1'
+    - 'cu2'
+  debug_str_offsets:
+    - Offsets:
+        - 0x0
+  debug_abbrev:
+    - Table:
+        - Code:            0x1
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x2
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strx1
+            - Attribute:       DW_AT_str_offsets_base
+              Form:            DW_FORM_sec_offset
+  debug_info:
+    - Version:         4
+      AbbrevTableID:   0
+      AbbrOffset:      0x0
+      AddrSize:        8
+      Entries:
+        - AbbrCode:        0x1
+          Values:
+            - Value:           0x4
+    - Version:         5
+      UnitType:        DW_UT_compile
+      AbbrOffset:      0x0
+      AddrSize:        8
+      AbbrevTableID:   0
+      Entries:
+        - AbbrCode:        0x2
+          Values:
+            - Value:           0x0
+            - Value:           0x8 # str offsets base



More information about the llvm-commits mailing list