[lld] r358646 - [LLD][ELF] - Fix the different behavior of the linker script symbols on different platforms.

George Rimar via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 18 01:15:54 PDT 2019


Author: grimar
Date: Thu Apr 18 01:15:54 2019
New Revision: 358646

URL: http://llvm.org/viewvc/llvm-project?rev=358646&view=rev
Log:
[LLD][ELF] - Fix the different behavior of the linker script symbols on different platforms.

This generalizes code and also fixes the broken behavior shown in
one of our test cases for some targets, like x86-64.

The issue occurs when the forward declarations are used in the script.
One of the samples is:

SECTIONS {
  foo = ADDR(.text) - ABSOLUTE(ADDR(.text));
};

In that case, we have a broken output when output target does
not use thunks. That happens because thunks creating code
(called from maybeAddThunks)
calls Script->assignAddresses() at least one more time,
what fixups the values. As a result final symbols values can
be different on AArch64 and x86, for example.

In this patch, I generalize and rename maybeAddThunks to
finalizeAddressDependentContent and now it is used and called
by all targets.

Differential revision: https://reviews.llvm.org/D55550

Modified:
    lld/trunk/ELF/Writer.cpp
    lld/trunk/test/ELF/linkerscript/addr-zero.test

Modified: lld/trunk/ELF/Writer.cpp
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Writer.cpp?rev=358646&r1=358645&r2=358646&view=diff
==============================================================================
--- lld/trunk/ELF/Writer.cpp (original)
+++ lld/trunk/ELF/Writer.cpp Thu Apr 18 01:15:54 2019
@@ -55,7 +55,7 @@ private:
   void forEachRelSec(llvm::function_ref<void(InputSectionBase &)> Fn);
   void sortSections();
   void resolveShfLinkOrder();
-  void maybeAddThunks();
+  void finalizeAddressDependentContent();
   void sortInputSections();
   void finalizeSections();
   void checkExecuteOnly();
@@ -1460,19 +1460,15 @@ template <class ELFT> void Writer<ELFT>:
   }
 }
 
-// For most RISC ISAs, we need to generate content that depends on the address
-// of InputSections. For example some architectures such as AArch64 use small
-// displacements for jump instructions that is the linker's responsibility for
-// creating range extension thunks for. As the generation of the content may
-// also alter InputSection addresses we must converge to a fixed point.
-template <class ELFT> void Writer<ELFT>::maybeAddThunks() {
-  if (!Target->NeedsThunks && !Config->AndroidPackDynRelocs &&
-      !Config->RelrPackDynRelocs)
-    return;
-
+// We need to generate and finalize the content that depends on the address of
+// InputSections. As the generation of the content may also alter InputSection
+// addresses we must converge to a fixed point. We do that here. See the comment
+// in Writer<ELFT>::finalizeSections().
+template <class ELFT> void Writer<ELFT>::finalizeAddressDependentContent() {
   ThunkCreator TC;
   AArch64Err843419Patcher A64P;
 
+  // For some targets, like x86, this loop iterates only once.
   for (;;) {
     bool Changed = false;
 
@@ -1756,19 +1752,30 @@ template <class ELFT> void Writer<ELFT>:
   // known but before addresses are allocated.
   resolveShfLinkOrder();
 
-  // Jump instructions in many ISAs have small displacements, and therefore they
-  // cannot jump to arbitrary addresses in memory. For example, RISC-V JAL
-  // instruction can target only +-1 MiB from PC. It is a linker's
-  // responsibility to create and insert small pieces of code between sections
-  // to extend the ranges if jump targets are out of range. Such code pieces are
-  // called "thunks".
+  // This is used to:
+  // 1) Create "thunks":
+  //    Jump instructions in many ISAs have small displacements, and therefore
+  //    they cannot jump to arbitrary addresses in memory. For example, RISC-V
+  //    JAL instruction can target only +-1 MiB from PC. It is a linker's
+  //    responsibility to create and insert small pieces of code between
+  //    sections to extend the ranges if jump targets are out of range. Such
+  //    code pieces are called "thunks".
+  //
+  //    We add thunks at this stage. We couldn't do this before this point
+  //    because this is the earliest point where we know sizes of sections and
+  //    their layouts (that are needed to determine if jump targets are in
+  //    range).
+  //
+  // 2) Update the sections. We need to generate content that depends on the
+  //    address of InputSections. For example, MIPS GOT section content or
+  //    android packed relocations sections content.
   //
-  // We add thunks at this stage. We couldn't do this before this point because
-  // this is the earliest point where we know sizes of sections and their
-  // layouts (that are needed to determine if jump targets are in range).
-  maybeAddThunks();
+  // 3) Assign the final values for the linker script symbols. Linker scripts
+  //    sometimes using forward symbol declarations. We want to set the correct
+  //    values. They also might change after adding the thunks.
+  finalizeAddressDependentContent();
 
-  // maybeAddThunks may have added local symbols to the static symbol table.
+  // finalizeAddressDependentContent may have added local symbols to the static symbol table.
   finalizeSynthetic(In.SymTab);
   finalizeSynthetic(In.PPC64LongBranchTarget);
 

Modified: lld/trunk/test/ELF/linkerscript/addr-zero.test
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/test/ELF/linkerscript/addr-zero.test?rev=358646&r1=358645&r2=358646&view=diff
==============================================================================
--- lld/trunk/test/ELF/linkerscript/addr-zero.test (original)
+++ lld/trunk/test/ELF/linkerscript/addr-zero.test Thu Apr 18 01:15:54 2019
@@ -1,14 +1,14 @@
-# REQUIRES: x86
+# REQUIRES: x86, aarch64
 # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux /dev/null -o %t.o
 # RUN: ld.lld -o %t.so --script %s %t.o -shared
 # RUN: llvm-readobj --symbols %t.so | FileCheck %s
 
-# Test that the script creates a non absolute symbol with value
-# 0 I.E., a symbol that refers to the load address.
+## Test that the script creates a non absolute symbol with value
+## 0 I.E., a symbol that refers to the load address.
 
 # CHECK:      Symbol {
 # CHECK:        Name: foo
-# CHECK-NEXT:   Value: 0x70
+# CHECK-NEXT:   Value: 0x0
 # CHECK-NEXT:   Size: 0
 # CHECK-NEXT:   Binding: Global
 # CHECK-NEXT:   Type: None
@@ -16,6 +16,12 @@
 # CHECK-NEXT:   Section: .text
 # CHECK-NEXT: }
 
+## Because of a bug we had a different behavior (different symbol 'foo' value)
+## on a platforms that might use thunks, like AArch64. Check that issue is fixed.
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnux /dev/null -o %t.o
+# RUN: ld.lld -o %t.so --script %s %t.o -shared
+# RUN: llvm-readobj --symbols %t.so | FileCheck %s
+
 SECTIONS {
   foo = ADDR(.text) - ABSOLUTE(ADDR(.text));
 };




More information about the llvm-commits mailing list