[llvm] [DWARFVerifier] Fix debug_str_offsets DWARF version detection (PR #81303)

Felipe de Azevedo Piovezan via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 9 11:38:52 PST 2024


https://github.com/felipepiovezan updated https://github.com/llvm/llvm-project/pull/81303

>From 7e965c125bea705daab3a6f02ea799277d3d1cc4 Mon Sep 17 00:00:00 2001
From: Felipe de Azevedo Piovezan <fpiovezan at apple.com>
Date: Tue, 6 Feb 2024 12:36:15 -0800
Subject: [PATCH 1/2] [DWARFVerifier] Fix debug_str_offsets DWARF version
 detection

The DWARF 5 debug_str_offsets section starts with a header, which must be
skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this section
(with the same section name), which does not have a header. In this case, the
offsets start on the first byte of the section, although it's not clear where
this is documented.

How does The DWARF verifier figure out which version to use? It manually reads
the **first** header in debug_info and uses that. This is wrong when multiple
debug_str_offset sections have been linked together, in particular it is wrong
in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a
standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in https://github.com/llvm/llvm-project/pull/81210,
the legacy version is only possible with dwo files, and dwo files cannot mix the
legacy version with the dwarf 5 version. As such, we change the verifier to only
check the debug_info header in the case of dwo files. If it sees a dwarf 4
version, it handles it the legacy way.

Note: the modified test _never_ worked to being with. It was emitting the error
message by accident, because it treated the "legacy" dwarf 4 section as a dwarf
5 section. To see why, simply note that the test contained no `debug_info.dwo`
sections, so the call to DWARFObject::forEachInfoDWOSections was doing nothing.
---
 .../llvm/DebugInfo/DWARF/DWARFVerifier.h      |  6 +-
 llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp    | 46 ++++++++-------
 .../debug-str-offsets-mixed-dwarf-4-5.yaml    | 57 +++++++++++++++++++
 .../X86/verify_invalid_str_offsets.yaml       | 17 +++---
 4 files changed, 94 insertions(+), 32 deletions(-)
 create mode 100644 llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml

diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
index ea73664b1e46ca..c2365a4c7cf647 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
@@ -360,9 +360,9 @@ class DWARFVerifier {
   ///
   /// \returns true if the .debug_line verifies successfully, false otherwise.
   bool handleDebugStrOffsets();
-  bool verifyDebugStrOffsets(
-      StringRef SectionName, const DWARFSection &Section, StringRef StrData,
-      void (DWARFObject::*)(function_ref<void(const DWARFSection &)>) const);
+  bool verifyDebugStrOffsets(std::optional<dwarf::DwarfFormat> LegacyFormat,
+                             StringRef SectionName, const DWARFSection &Section,
+                             StringRef StrData);
 
   /// Emits any aggregate information collected, depending on the dump options
   void summarize();
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 2124ff835c5727..f56ac9889a296e 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -1881,33 +1881,37 @@ bool DWARFVerifier::handleDebugStrOffsets() {
   OS << "Verifying .debug_str_offsets...\n";
   const DWARFObject &DObj = DCtx.getDWARFObj();
   bool Success = true;
+
+  // dwo sections may contain the legacy debug_str_offsets format (and they
+  // can't be mixed with dwarf 5's format). This section format contains no
+  // header.
+  // As such, check the version from debug_info and, if we are in the legacy
+  // mode (Dwarf <= 4), extract Dwarf32/Dwarf64.
+  std::optional<DwarfFormat> DwoLegacyDwarf4Format;
+  DObj.forEachInfoDWOSections([&](const DWARFSection &S) {
+    if (DwoLegacyDwarf4Format)
+      return;
+    DWARFDataExtractor DebugInfoData(DObj, S, DCtx.isLittleEndian(), 0);
+    uint64_t Offset = 0;
+    DwarfFormat InfoFormat = DebugInfoData.getInitialLength(&Offset).second;
+    if (uint16_t InfoVersion = DebugInfoData.getU16(&Offset); InfoVersion <= 4)
+      DwoLegacyDwarf4Format = InfoFormat;
+  });
+
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets.dwo", DObj.getStrOffsetsDWOSection(),
-      DObj.getStrDWOSection(), &DWARFObject::forEachInfoDWOSections);
+      DwoLegacyDwarf4Format, ".debug_str_offsets.dwo",
+      DObj.getStrOffsetsDWOSection(), DObj.getStrDWOSection());
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets", DObj.getStrOffsetsSection(), DObj.getStrSection(),
-      &DWARFObject::forEachInfoSections);
+      /*LegacyFormat=*/std::nullopt, ".debug_str_offsets",
+      DObj.getStrOffsetsSection(), DObj.getStrSection());
   return Success;
 }
 
-bool DWARFVerifier::verifyDebugStrOffsets(
-    StringRef SectionName, const DWARFSection &Section, StringRef StrData,
-    void (DWARFObject::*VisitInfoSections)(
-        function_ref<void(const DWARFSection &)>) const) {
+bool DWARFVerifier::verifyDebugStrOffsets(std::optional<DwarfFormat> LegacyFormat,
+    StringRef SectionName, const DWARFSection &Section, StringRef StrData) {
   const DWARFObject &DObj = DCtx.getDWARFObj();
-  uint16_t InfoVersion = 0;
-  DwarfFormat InfoFormat = DwarfFormat::DWARF32;
-  (DObj.*VisitInfoSections)([&](const DWARFSection &S) {
-    if (InfoVersion)
-      return;
-    DWARFDataExtractor DebugInfoData(DObj, S, DCtx.isLittleEndian(), 0);
-    uint64_t Offset = 0;
-    InfoFormat = DebugInfoData.getInitialLength(&Offset).second;
-    InfoVersion = DebugInfoData.getU16(&Offset);
-  });
 
   DWARFDataExtractor DA(DObj, Section, DCtx.isLittleEndian(), 0);
-
   DataExtractor::Cursor C(0);
   uint64_t NextUnit = 0;
   bool Success = true;
@@ -1915,8 +1919,8 @@ bool DWARFVerifier::verifyDebugStrOffsets(
     DwarfFormat Format;
     uint64_t Length;
     uint64_t StartOffset = C.tell();
-    if (InfoVersion == 4) {
-      Format = InfoFormat;
+    if (LegacyFormat) {
+      Format = *LegacyFormat;
       Length = DA.getData().size();
       NextUnit = C.tell() + Length;
     } else {
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
new file mode 100644
index 00000000000000..d10460896171d6
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
@@ -0,0 +1,57 @@
+# RUN: yaml2obj %s -o %t.o
+# RUN: llvm-dwarfdump -debug-str-offsets -verify %t.o | FileCheck %s
+
+# CHECK: Verifying .debug_str_offsets...
+# CHECK: No errors
+
+# Check that when mixing standard DWARF 4 debug information with standard DWARF
+# 5 debug information, the verifier correctly interprets the debug_str_offsets
+# section as a standards-conforming DWARF 5 section.
+
+--- !ELF
+FileHeader:
+  Class: ELFCLASS64
+  Data:  ELFDATA2LSB
+  Type:  ET_EXEC
+DWARF:
+  debug_str:
+    - 'cu1'
+    - 'cu2'
+  debug_str_offsets:
+    - Offsets:
+        - 0x0
+  debug_abbrev:
+    - Table:
+        - Code:            0x1
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x2
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strx1
+            - Attribute:       DW_AT_str_offsets_base
+              Form:            DW_FORM_sec_offset
+  debug_info:
+    - Version:         4
+      AbbrevTableID:   0
+      AbbrOffset:      0x0
+      AddrSize:        8
+      Entries:
+        - AbbrCode:        0x1
+          Values:
+            - Value:           0x4
+    - Version:         5
+      UnitType:        DW_UT_compile
+      AbbrOffset:      0x0
+      AddrSize:        8
+      AbbrevTableID:   0
+      Entries:
+        - AbbrCode:        0x2
+          Values:
+            - Value:           0x0
+            - Value:           0x8 # str offsets base
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml
index 37f376352a19a4..1bdc640acae6dc 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_invalid_str_offsets.yaml
@@ -13,7 +13,7 @@
 # CHECK-NEXT: error: .debug_str_offsets: contribution 0x29: length exceeds available space (contribution offset (0x29) + length field space (0x4) + length (0x5000000) == 0x500002D > section size 0x30)
 # Errors detected.
 
-# V4: error: .debug_str_offsets: contribution 0x0: index 0x2: invalid string offset *0x8 == 0x2, is neither zero nor immediately following a null character
+# V4: error: .debug_str_offsets.dwo: contribution 0x0: index 0x2: invalid string offset *0x8 == 0x2, is neither zero nor immediately following a null character
 
 
 #--- v4.yaml
@@ -23,16 +23,17 @@ FileHeader:
   Data:  ELFDATA2LSB
   Type:  ET_EXEC
 DWARF:
-  debug_str:
-    - 'foo'
-    - 'bar'
-  debug_info:
-    - Version: 4
-      AddrSize: 4
 Sections:
-  - Name: '.debug_str_offsets'
+  - Name: '.debug_info.dwo'
+    Type: SHT_PROGBITS
+    Content: "0700000004000000000004"
+  - Name: '.debug_str_offsets.dwo'
     Type: SHT_PROGBITS
     Content: "000000000400000002000000"
+  - Name: 'debug_str.dwo'
+    Type: SHT_PROGBITS
+    Content: "666F6F0062617200"
+
 
 #--- v5.yaml
 --- !ELF

>From aa4847deef7dc1c57a545037cc96c80965bdc35f Mon Sep 17 00:00:00 2001
From: Felipe de Azevedo Piovezan <fpiovezan at apple.com>
Date: Fri, 9 Feb 2024 11:38:41 -0800
Subject: [PATCH 2/2] fixup! clang-format

---
 llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index f56ac9889a296e..25a14f4330ba8a 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -1907,8 +1907,9 @@ bool DWARFVerifier::handleDebugStrOffsets() {
   return Success;
 }
 
-bool DWARFVerifier::verifyDebugStrOffsets(std::optional<DwarfFormat> LegacyFormat,
-    StringRef SectionName, const DWARFSection &Section, StringRef StrData) {
+bool DWARFVerifier::verifyDebugStrOffsets(
+    std::optional<DwarfFormat> LegacyFormat, StringRef SectionName,
+    const DWARFSection &Section, StringRef StrData) {
   const DWARFObject &DObj = DCtx.getDWARFObj();
 
   DWARFDataExtractor DA(DObj, Section, DCtx.isLittleEndian(), 0);



More information about the llvm-commits mailing list