[lld] e618ccb - [ELF] Resolve relocations in .debug_* referencing (discarded symbols or ICF folded section symbols) to tombstone values

Fangrui Song via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 23 11:48:55 PDT 2020


Author: Fangrui Song
Date: 2020-06-23T11:48:46-07:00
New Revision: e618ccbf431f6730edb6d1467a127c3a52fd57f7

URL: https://github.com/llvm/llvm-project/commit/e618ccbf431f6730edb6d1467a127c3a52fd57f7
DIFF: https://github.com/llvm/llvm-project/commit/e618ccbf431f6730edb6d1467a127c3a52fd57f7.diff

LOG: [ELF] Resolve relocations in .debug_* referencing (discarded symbols or ICF folded section symbols) to tombstone values

See D59553, https://lists.llvm.org/pipermail/llvm-dev/2020-May/141885.html and
https://sourceware.org/pipermail/binutils/2020-May/111357.html for
extensive discussions on a tombstone value.
See http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
(Reserve an address value for "not present") for a DWARF enhancement proposal.

We resolve such relocations to a tombstone value to indicate that the address is invalid.
This solves several problems (the normal behavior is to resolve the relocation to the addend):

* For an empty function in a collected section, a pair of (0,0) can
  terminate .debug_loc and .debug_ranges (as of binutils 2.34, GNU ld
  resolves such a relocation to 1 to avoid the .debug_ranges issue)
* If DW_AT_high_pc is sufficiently large, the address range can collide
  with a regular code range of low address (https://bugs.llvm.org/show_bug.cgi?id=41124 )
* If a text section is folded into another by ICF, we may leave entries
  in multiple CUs claiming ownership of the same range of code, which can
  confuse consumers.
* Debug information associated with COMDAT sections can have problems
  similar to ICF, but is more complex - thus not addressed by this patch.

For pre-DWARF-v5 .debug_loc and .debug_ranges, a pair of 0 can terminate
entries (invalidating subsequent ranges).
-1 is a reserved value with special meaning (base address selection entry) which can't be used either.
Use -2 instead.

For all other .debug_*, use UINT32_MAX for 32-bit targets and UINT64_MAX
for 64-bit targets. In the code, we intentionally use
`uint64_t tombstone = UINT64_MAX` for 32-bit targets as well: this matches
SignExtend64 as used in `relocateAlloc`. (Actually UINT32_MAX does not work for R_386_32)

Note 0, we only special case `target->symbolicRel` (R_X86_64_64, R_AARCH64_ABS64, R_PPC64_ADDR64), not
short-range absolute relocations (e.g. R_X86_64_32). Only forms like DW_FORM_addr need to be special cased.
They can hold an arbitrary address (must be 64-bit on a 64-bit target). (In theory,
producers can make use of small code model to emit 32-bit relocations. This doesn't seem to be leveraged.)

Note 1, we have to ignore the addend, because we don't want to resolve
DW_AT_low_pc (which may have a non-zero addend) to -1+addend (wrap
around to a low address):

  __attribute__((section(".text.x"))) void f1() { }
  __attribute__((section(".text.x"))) void f2() { } // DW_AT_low_pc has a non-zero addend

Note 2, if the prevailing copy does not have debugging information while
a non-prevailing copy has (partial debug build), we don't do extra work
to attach debugging information to the prevailing definition.  (clang
has a lot of debug info optimizations that are on-by-default that assume
the whole program is built with debug info).

  clang -c -ffunction-sections a.cc    # prevailing copy has no debug info
  clang -c -ffunction-sections -g b.cc

Reviewed By: dblaikie, avl, jhenderson

Differential Revision: https://reviews.llvm.org/D81784

Added: 
    lld/test/ELF/debug-dead-reloc-32.s
    lld/test/ELF/debug-dead-reloc-icf.s
    lld/test/ELF/debug-dead-reloc.s

Modified: 
    lld/ELF/InputSection.cpp

Removed: 
    


################################################################################
diff  --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index d5e1370eeda0..d439db0039db 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -852,6 +852,9 @@ uint64_t InputSectionBase::getRelocTargetVA(const InputFile *file, RelType type,
 template <class ELFT, class RelTy>
 void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
   const unsigned bits = sizeof(typename ELFT::uint) * 8;
+  const bool isDebug = isDebugSection(*this);
+  const bool isDebugLocOrRanges =
+      isDebug && (name == ".debug_loc" || name == ".debug_ranges");
 
   for (const RelTy &rel : rels) {
     RelType type = rel.getType(config->isMips64EL);
@@ -902,11 +905,38 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
       continue;
     }
 
-    if (sym.isTls() && !Out::tlsPhdr)
+    if (sym.isTls() && !Out::tlsPhdr) {
       target->relocateNoSym(bufLoc, type, 0);
-    else
-      target->relocateNoSym(bufLoc, type,
-                            SignExtend64<bits>(sym.getVA(addend)));
+      continue;
+    }
+
+    if (isDebug && type == target->symbolicRel) {
+      // Resolve relocations in .debug_* referencing (discarded symbols or ICF
+      // folded section symbols) to a tombstone value. Resolving to addend is
+      // unsatisfactory because the result address range may collide with a
+      // valid range of low address, or leave multiple CUs claiming ownership of
+      // the same range of code, which may confuse consumers.
+      //
+      // To address the problems, we use -1 as a tombstone value for most
+      // .debug_* sections. We have to ignore the addend because we don't want
+      // to resolve an address attribute (which may have a non-zero addend) to
+      // -1+addend (wrap around to a low address).
+      //
+      // If the referenced symbol is discarded (made Undefined), or the
+      // section defining the referenced symbol is garbage collected,
+      // sym.getOutputSection() is nullptr. `ds->section->repl != ds->section`
+      // catches the ICF folded case.
+      //
+      // For pre-DWARF-v5 .debug_loc and .debug_ranges, -1 is a reserved value
+      // (base address selection entry), so -2 is used.
+      auto *ds = dyn_cast<Defined>(&sym);
+      if (!sym.getOutputSection() || (ds && ds->section->repl != ds->section)) {
+        target->relocateNoSym(bufLoc, type,
+                              isDebugLocOrRanges ? UINT64_MAX - 1 : UINT64_MAX);
+        continue;
+      }
+    }
+    target->relocateNoSym(bufLoc, type, SignExtend64<bits>(sym.getVA(addend)));
   }
 }
 

diff  --git a/lld/test/ELF/debug-dead-reloc-32.s b/lld/test/ELF/debug-dead-reloc-32.s
new file mode 100644
index 000000000000..a7496798c580
--- /dev/null
+++ b/lld/test/ELF/debug-dead-reloc-32.s
@@ -0,0 +1,29 @@
+# REQUIRES: x86
+## Test we resolve symbolic relocations in .debug_* sections to a tombstone
+## value if the referenced symbol is discarded (--gc-sections, non-prevailing
+## section group, SHF_EXCLUDE, /DISCARD/, etc).
+
+# RUN: llvm-mc -filetype=obj -triple=i386 %s -o %t.o
+# RUN: ld.lld %t.o -o %t
+# RUN: llvm-objdump -s %t | FileCheck %s
+
+# CHECK:      Contents of section .debug_loc:
+# CHECK-NEXT:  0000 feffffff
+# CHECK-NEXT: Contents of section .debug_ranges:
+# CHECK-NEXT:  0000 feffffff
+# CHECK-NEXT: Contents of section .debug_addr:
+# CHECK-NEXT:  0000 ffffffff
+
+.section .text.1,"axe"
+  .byte 0
+
+## Resolved to UINT32_C(-2), with the addend ignored.
+## UINT32_C(-1) is a reserved value (base address selection entry) which can't be used.
+.section .debug_loc
+  .long .text.1+8
+.section .debug_ranges
+  .long .text.1+16
+
+## Resolved to UINT32_C(-1), with the addend ignored.
+.section .debug_addr
+  .long .text.1+8

diff  --git a/lld/test/ELF/debug-dead-reloc-icf.s b/lld/test/ELF/debug-dead-reloc-icf.s
new file mode 100644
index 000000000000..3230656f1fbc
--- /dev/null
+++ b/lld/test/ELF/debug-dead-reloc-icf.s
@@ -0,0 +1,24 @@
+# REQUIRES: x86
+## Test we resolve symbolic relocations in .debug_* sections to a tombstone
+## value if the referenced section symbol is folded into another section by ICF.
+## Otherwise, we would leave entries in multiple CUs claiming ownership of the
+## same range of code, which can confuse consumers.
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
+# RUN: ld.lld --icf=all %t.o -o %t
+# RUN: llvm-objdump -s %t | FileCheck %s
+
+# CHECK:      Contents of section .debug_info:
+# CHECK-NEXT:  0000 {{[0-9a-f]+}}000 00000000 ffffffff ffffffff
+
+.globl _start
+_start:
+  ret
+
+## .text.1 will be folded by ICF.
+.section .text.1,"ax"
+  ret
+
+.section .debug_info
+  .quad .text+8
+  .quad .text.1+8

diff  --git a/lld/test/ELF/debug-dead-reloc.s b/lld/test/ELF/debug-dead-reloc.s
new file mode 100644
index 000000000000..7e6dc8d52b37
--- /dev/null
+++ b/lld/test/ELF/debug-dead-reloc.s
@@ -0,0 +1,53 @@
+# REQUIRES: x86
+## Test we resolve symbolic relocations in .debug_* sections to a tombstone
+## value if the referenced symbol is discarded (--gc-sections, non-prevailing
+## section group, SHF_EXCLUDE, /DISCARD/, etc).
+
+# RUN: echo '.globl _start; _start: call group' | llvm-mc -filetype=obj -triple=x86_64 - -o %t.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+# RUN: ld.lld --gc-sections %t.o %t1.o %t1.o -o %t
+# RUN: llvm-objdump -s %t | FileCheck %s
+
+# CHECK:      Contents of section .debug_loc:
+# CHECK-NEXT:  0000 feffffff ffffffff feffffff ffffffff
+# CHECK-NEXT: Contents of section .debug_ranges:
+# CHECK-NEXT:  0000 feffffff ffffffff feffffff ffffffff
+# CHECK-NEXT: Contents of section .debug_addr:
+# CHECK-NEXT:  0000 {{.*}}000 00000000 {{.*}}000 00000000
+# CHECK-NEXT:  0010 ffffffff  ffffffff {{.*}}000 00000000
+# CHECK-NEXT: Contents of section .debug_foo:
+# CHECK-NEXT:  0000 ffffffff ffffffff 08000000 00000000
+# CHECK-NEXT:  0010 ffffffff ffffffff 08000000 00000000
+
+.section .text.1,"ax"
+  .byte 0
+.section .text.2,"axe"
+  .byte 0
+.section .text.3,"axG", at progbits,group,comdat
+.globl group
+group:
+  .byte 0
+
+## Resolved to UINT64_C(-2), with the addend ignored.
+## UINT64_C(-1) is a reserved value (base address selection entry) which can't be used.
+.section .debug_loc
+  .quad .text.1+8
+.section .debug_ranges
+  .quad .text.2+16
+
+.section .debug_addr
+## .text.3 is a local symbol. The symbol defined in a non-prevailing group is
+## discarded. Resolved to UINT64_C(-1).
+  .quad .text.3+24
+## group is a non-local symbol. The relocation from the second %t1.o gets
+## resolved to the prevailing copy.
+  .quad group+32
+
+.section .debug_foo
+  .quad .text.1+8
+
+## We only deal with DW_FORM_addr. Don't special case short-range absolute
+## relocations. Treat them like regular absolute relocations referencing
+## discarded symbols, which are resolved to the addend.
+  .long .text.1+8
+  .long 0


        


More information about the llvm-commits mailing list