[PATCH] D36351: [lld][ELF] Add profile guided section layout

Michael Spencer via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 31 15:00:34 PST 2018

On Wed, Jan 31, 2018 at 12:50 PM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:

> Rafael Avila de Espindola <rafael.espindola at gmail.com> writes:
> > Rafael Avila de Espindola <rafael.espindola at gmail.com> writes:
> >
> >> The patch needs to be rebased and clang formatted
> >>
> >> We should start with a version that doesn't depend on the llvm side, as
> >> there are too many moving pieces right now.
> >>
> >> I have started working on the above.
> >
> > OK, I have a patch against lld trunk that I can build standalone. I will
> > test it with perf generated call graphs and report.
> I am getting a crash when trying to link lld itself.
> The testcase is at
> https://s3-us-west-2.amazonaws.com/linker-tests/t.tar.xz.
> Please take a look and upload a new version of the patch I emailed. That
> is: read callgrah from a text file and no dependencies on llvm changes.
> The error also happens with --no-threads. The valgrind reported errors
> are attached.
> Cheers,
> Rafael
I wasn't able to duplicate the crash at all (Windows or Linux, debug with
asserts), but from the log it I'm pretty sure it was related the issue I
working on. Here's an updated patch.

- Michael Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180131/0fc2dd22/attachment.html>
-------------- next part --------------
diff --git a/ELF/CMakeLists.txt b/ELF/CMakeLists.txt
index 7ec8378..6889a36 100644
--- a/ELF/CMakeLists.txt
+++ b/ELF/CMakeLists.txt
@@ -19,6 +19,7 @@ add_lld_library(lldELF
+  CallGraphSort.cpp
diff --git a/ELF/CallGraphSort.cpp b/ELF/CallGraphSort.cpp
new file mode 100644
index 0000000..64d4c23
--- /dev/null
+++ b/ELF/CallGraphSort.cpp
@@ -0,0 +1,371 @@
+//===- CallGraphSort.cpp --------------------------------------------------===//
+//                             The LLVM Linker
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+/// Implementation of Call-Chain Clustering from: Optimizing Function Placement
+/// for Large-Scale Data-Center Applications
+/// https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
+/// The goal of this algorithm is to improve runtime performance of the final
+/// executable by arranging code sections such that page table and i-cache
+/// misses are minimized.
+/// Definitions:
+/// * Cluster
+///   * An ordered list of input sections which are layed out as a unit. At the
+///     beginning of the algorithm each input section has its own cluster and
+///     the weight of the cluster is the sum of the weight of all incomming
+///     edges.
+/// * Call-Chain Clustering (Cウ) Heuristic
+///   * Defines when and how clusters are combined. Pick the highest weight edge
+///     from cluster _u_ to _v_ then move the sections in _v_ and append them to
+///     _u_ unless the combined size would be larger than the page size.
+/// * Density
+///   * The weight of the cluster divided by the size of the cluster. This is a
+///     proxy for the ammount of execution time spent per byte of the cluster.
+/// It does so given a call graph profile by the following:
+/// * Build a call graph from the profile
+/// * While there are unresolved edges
+///   * Find the edge with the highest weight
+///   * Check if merging the two clusters would create a cluster larger than the
+///     target page size
+///   * If not, contract that edge putting the callee after the caller
+/// * Sort remaining clusters by density
+#include "CallGraphSort.h"
+#include "SymbolTable.h"
+#include "Symbols.h"
+#include "Target.h"
+#include "llvm/Support/MathExtras.h"
+#include <queue>
+#include <set>
+#include <unordered_map>
+using namespace llvm;
+using namespace lld;
+using namespace lld::elf;
+namespace {
+using NodeIndex = std::ptrdiff_t;
+using EdgeIndex = std::ptrdiff_t;
+struct Edge;
+struct EdgePriorityCmp {
+  std::vector<Edge> &Edges;
+  bool operator()(EdgeIndex A, EdgeIndex B) const;
+using PriorityQueue = std::multiset<EdgeIndex, EdgePriorityCmp>;
+struct Node {
+  Node() = default;
+  Node(const InputSectionBase *IS);
+  std::vector<const InputSectionBase *> Sections;
+  std::vector<EdgeIndex> IncidentEdges;
+  int64_t Size = 0;
+  uint64_t Weight = 0;
+struct Edge {
+  NodeIndex From;
+  NodeIndex To;
+  uint64_t Weight;
+  PriorityQueue::iterator PriorityPos;
+  bool operator==(const Edge Other) const;
+  bool operator<(const Edge Other) const;
+bool EdgePriorityCmp::operator()(EdgeIndex A, EdgeIndex B) const {
+  return Edges[A].Weight < Edges[B].Weight;
+struct EdgeDenseMapInfo {
+  static Edge getEmptyKey() {
+    return {DenseMapInfo<NodeIndex>::getEmptyKey(),
+            DenseMapInfo<NodeIndex>::getEmptyKey(), 0,
+            PriorityQueue::iterator()};
+  }
+  static Edge getTombstoneKey() {
+    return {DenseMapInfo<NodeIndex>::getTombstoneKey(),
+            DenseMapInfo<NodeIndex>::getTombstoneKey(), 0,
+            PriorityQueue::iterator()};
+  }
+  static unsigned getHashValue(const Edge &Val) {
+    return hash_combine(DenseMapInfo<NodeIndex>::getHashValue(Val.From),
+                        DenseMapInfo<NodeIndex>::getHashValue(Val.To));
+  }
+  static bool isEqual(const Edge &LHS, const Edge &RHS) { return LHS == RHS; }
+class CallGraphSort {
+  CallGraphSort();
+  DenseMap<const InputSectionBase *, int> run();
+  std::vector<Node> Nodes;
+  std::vector<Edge> Edges;
+  PriorityQueue WorkQueue{EdgePriorityCmp{Edges}};
+  bool killEdge(EdgeIndex EI);
+  void contractEdge(EdgeIndex CEI);
+  void generateClusters();
+} // end anonymous namespace
+Node::Node(const InputSectionBase *IS) {
+  Sections.push_back(IS);
+  Size = IS->getSize();
+bool Edge::operator==(const Edge Other) const {
+  return From == Other.From && To == Other.To;
+bool Edge::operator<(const Edge Other) const {
+  if (From != Other.From)
+    return From < Other.From;
+  return To < Other.To;
+// Take the edge list in Config->CallGraphProfile, resolve symbol names to
+// Symbols, and generate a graph between InputSections with the provided
+// weights.
+CallGraphSort::CallGraphSort() {
+  DenseMap<std::pair<const Symbol *, const Symbol *>, uint64_t> &Profile =
+      Config->CallGraphProfile;
+  DenseMap<const InputSectionBase *, NodeIndex> SecToNode;
+  DenseMap<Edge, EdgeIndex, EdgeDenseMapInfo> EdgeMap;
+  auto GetOrCreateNode = [&](const InputSectionBase *IS) -> NodeIndex {
+    auto Res = SecToNode.insert(std::make_pair(IS, Nodes.size()));
+    if (Res.second)
+      Nodes.emplace_back(IS);
+    return Res.first->second;
+  };
+  // Create the graph.
+  for (const auto &C : Profile) {
+    const Symbol *FromSym = C.first.first;
+    const Symbol *ToSym = C.first.second;
+    uint64_t Weight = C.second;
+    if (Weight == 0)
+      continue;
+    // Get the input section for a given symbol.
+    auto *FromDR = dyn_cast_or_null<Defined>(FromSym);
+    auto *ToDR = dyn_cast_or_null<Defined>(ToSym);
+    if (!FromDR || !ToDR)
+      continue;
+    auto *FromSB = dyn_cast_or_null<const InputSectionBase>(FromDR->Section);
+    auto *ToSB = dyn_cast_or_null<const InputSectionBase>(ToDR->Section);
+    if (!FromSB || !ToSB || FromSB->getSize() == 0 || ToSB->getSize() == 0)
+      continue;
+    NodeIndex From = GetOrCreateNode(FromSB);
+    NodeIndex To = GetOrCreateNode(ToSB);
+    Nodes[To].Weight = SaturatingAdd(Nodes[To].Weight, Weight);
+    if (From == To)
+      continue;
+    Edge E{From, To, Weight, WorkQueue.end()};
+    // Add or increment an edge
+    auto Res = EdgeMap.insert(std::make_pair(E, Edges.size()));
+    EdgeIndex EI = Res.first->second;
+    if (Res.second) {
+      Edges.push_back(E);
+      Nodes[From].IncidentEdges.push_back(EI);
+      Nodes[To].IncidentEdges.push_back(EI);
+    } else
+      Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, Weight);
+  }
+/// Like std::unique, but calls Merge on equal values. Merge is allowed
+/// to modifiy its first argument.
+/// Merge is a callable with signature
+///   Merge(*declval<ForwardIt>(), *declval<ForwardIt>())
+/// Example:
+///   int a[] = {1, 2, 2, 3, 4, 5, 5};
+///   auto end = merge_unique(std::begin(a), std::end(a),
+///     [](int a, int b) { return a == b; },
+///     [](int &a, int b) { a += b; });
+///   for (auto i = a; i != end; ++i)
+///     std::cout << *i << " ";
+///   -- 1 4 3 4 10
+template <class ForwardIt, class PredTy, class MergeTy>
+static ForwardIt merge_unique(ForwardIt First, ForwardIt Last, PredTy Pred,
+                              MergeTy Merge) {
+  if (First == Last)
+    return Last;
+  ForwardIt I = First;
+  while (++I != Last) {
+    if (Pred(*First, *I))
+      Merge(*First, *I);
+    else if (++First != I)
+      *First = std::move(*I);
+  }
+  return ++First;
+/// Marks an edge as head and removes it from the work queue.
+/// \returns true if the edge was killed, false if it was already dead.
+bool CallGraphSort::killEdge(EdgeIndex EI) {
+  Edge &E = Edges[EI];
+  if (E.PriorityPos != WorkQueue.end()) {
+    WorkQueue.erase(E.PriorityPos);
+    E.PriorityPos = WorkQueue.end();
+    return true;
+  }
+  return false;
+/// Remove edge \p CEI from the graph while simultaneously merging its two
+/// incident vertices u and v. This merges any duplicate edges between u and v
+/// by accumulating their weights.
+void CallGraphSort::contractEdge(EdgeIndex CEI) {
+  // Make a copy of the edge as the original will be marked killed while being
+  // used.
+  Edge CE = Edges[CEI];
+  assert(CE.From != CE.To && "Got self edge!");
+  std::vector<EdgeIndex> &FE = Nodes[CE.From].IncidentEdges;
+  // Remove the self edge from From.
+  FE.erase(std::remove(FE.begin(), FE.end(), CEI));
+  std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;
+  // Update all edges incident with To to reference From instead. Then if they
+  // aren't self edges add them to From.
+  for (EdgeIndex EI : TE) {
+    Edge &E = Edges[EI];
+    if (E.From == CE.To)
+      E.From = CE.From;
+    if (E.To == CE.To)
+      E.To = CE.From;
+    if (E.To == E.From) {
+      killEdge(EI);
+      continue;
+    }
+    FE.push_back(EI);
+  }
+  // Free memory. Otherwise we end up with N^2 memory usage.
+  std::vector<EdgeIndex>().swap(TE);
+  if (FE.empty())
+    return;
+  // Sort edges so they can be merged. The stability of this sort doesn't matter
+  // as equal edges will be merged in an order independent manner.
+  std::sort(FE.begin(), FE.end(),
+            [&](EdgeIndex AI, EdgeIndex BI) { return Edges[AI] < Edges[BI]; });
+  FE.erase(merge_unique(FE.begin(), FE.end(),
+                        [&](EdgeIndex AI, EdgeIndex BI) {
+                          return Edges[AI] == Edges[BI];
+                        },
+                        [&](EdgeIndex AI, EdgeIndex BI) {
+                          Edge &A = Edges[AI];
+                          Edge &B = Edges[BI];
+                          killEdge(BI);
+                          bool Restore = killEdge(AI);
+                          A.Weight = SaturatingAdd(A.Weight, B.Weight);
+                          if (Restore)
+                            A.PriorityPos = WorkQueue.insert(AI);
+                        }),
+           FE.end());
+// Group InputSections into clusters using the Call-Chain Clustering heuristic
+// then sort the clusters by density.
+void CallGraphSort::generateClusters() {
+  for (size_t I = 0; I < Edges.size(); ++I) {
+    Edges[I].PriorityPos = WorkQueue.insert(I);
+  }
+  // Collapse the graph.
+  while (!WorkQueue.empty()) {
+    PriorityQueue::const_iterator I = --WorkQueue.end();
+    EdgeIndex MaxI = *I;
+    const Edge MaxE = Edges[MaxI];
+    killEdge(MaxI);
+    // Merge the Nodes.
+    Node &From = Nodes[MaxE.From];
+    Node &To = Nodes[MaxE.To];
+    if (From.Size + To.Size > Target->PageSize)
+      continue;
+    contractEdge(MaxI);
+    From.Sections.insert(From.Sections.end(), To.Sections.begin(),
+                         To.Sections.end());
+    From.Size += To.Size;
+    From.Weight = SaturatingAdd(From.Weight, To.Weight);
+    To.Sections.clear();
+    To.Size = 0;
+    To.Weight = 0;
+  }
+  // Remove empty or dead nodes.
+  Nodes.erase(std::remove_if(Nodes.begin(), Nodes.end(),
+                             [](const Node &N) {
+                               return N.Size == 0 || N.Sections.empty();
+                             }),
+              Nodes.end());
+  // Sort by density. Invalidates all NodeIndexs.
+  std::sort(Nodes.begin(), Nodes.end(), [](const Node &A, const Node &B) {
+    return (APFloat(APFloat::IEEEdouble(), A.Weight) /
+            APFloat(APFloat::IEEEdouble(), A.Size))
+               .compare(APFloat(APFloat::IEEEdouble(), B.Weight) /
+                        APFloat(APFloat::IEEEdouble(), B.Size)) ==
+           APFloat::cmpLessThan;
+  });
+DenseMap<const InputSectionBase *, int> CallGraphSort::run() {
+  generateClusters();
+  // Generate order.
+  llvm::DenseMap<const InputSectionBase *, int> OrderMap;
+  ssize_t CurOrder = 1;
+  for (const Node &N : Nodes)
+    for (const InputSectionBase *IS : N.Sections)
+      OrderMap[IS] = CurOrder++;
+  return OrderMap;
+// Sort sections by the profile data provided by -callgraph-profile-file
+// This first builds a call graph based on the profile data then iteratively
+// merges the hottest call edges as long as it would not create a cluster larger
+// than the page size. All clusters are then sorted by a density metric to
+// further improve locality.
+DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder() {
+  return CallGraphSort().run();
diff --git a/ELF/CallGraphSort.h b/ELF/CallGraphSort.h
new file mode 100644
index 0000000..3f96dc8
--- /dev/null
+++ b/ELF/CallGraphSort.h
@@ -0,0 +1,23 @@
+//===- CallGraphSort.h ------------------------------------------*- C++ -*-===//
+//                             The LLVM Linker
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+#include "llvm/ADT/DenseMap.h"
+namespace lld {
+namespace elf {
+class InputSectionBase;
+llvm::DenseMap<const InputSectionBase *, int> computeCallGraphProfileOrder();
+} // namespace elf
+} // namespace lld
diff --git a/ELF/Config.h b/ELF/Config.h
index 28326a8..63f5818 100644
--- a/ELF/Config.h
+++ b/ELF/Config.h
@@ -11,6 +11,7 @@
 #include "lld/Common/ErrorHandler.h"
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/StringSet.h"
@@ -25,6 +26,7 @@ namespace lld {
 namespace elf {
 class InputFile;
+class Symbol;
 enum ELFKind {
@@ -104,6 +106,8 @@ struct Configuration {
   std::vector<SymbolVersion> VersionScriptGlobals;
   std::vector<SymbolVersion> VersionScriptLocals;
   std::vector<uint8_t> BuildIdVector;
+  llvm::DenseMap<std::pair<const Symbol *, const Symbol *>, uint64_t>
+      CallGraphProfile;
   bool AllowMultipleDefinition;
   bool AndroidPackDynRelocs = false;
   bool ARMHasBlx = false;
diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
index bbf28ad..2df1acd 100644
--- a/ELF/Driver.cpp
+++ b/ELF/Driver.cpp
@@ -570,6 +570,26 @@ getBuildId(opt::InputArgList &Args) {
   return {BuildIdKind::None, {}};
+static void readCallGraph(MemoryBufferRef MB) {
+  DenseMap<std::pair<StringRef, StringRef>, uint64_t> Ret;
+  std::vector<StringRef> Lines = args::getLines(MB);
+  for (StringRef L : Lines) {
+    SmallVector<StringRef, 3> Fields;
+    L.split(Fields, ' ');
+    if (Fields.size() != 3)
+      fatal("parse error");
+    uint64_t Count;
+    if (!to_integer(Fields[2], Count))
+      fatal("parse error");
+    StringRef From = Fields[0];
+    Symbol *FromSym = Symtab->find(From);
+    StringRef To = Fields[1];
+    Symbol *ToSym = Symtab->find(To);
+    if (FromSym && ToSym)
+      Config->CallGraphProfile[std::make_pair(FromSym, ToSym)] = Count;
+  }
 static bool getCompressDebugSections(opt::InputArgList &Args) {
   StringRef S = Args.getLastArgValue(OPT_compress_debug_sections, "none");
   if (S == "none")
@@ -1082,6 +1102,10 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &Args) {
   // Apply symbol renames for -wrap.
+  if (auto *Arg = Args.getLastArg(OPT_call_graph_ordering_file))
+    if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))
+      readCallGraph(*Buffer);
   // Now that we have a complete list of input files.
   // Beyond this point, no new files are added.
   // Aggregate all input sections into one place.
diff --git a/ELF/Options.td b/ELF/Options.td
index 86aa99e..e6c1016 100644
--- a/ELF/Options.td
+++ b/ELF/Options.td
@@ -51,6 +51,9 @@ def allow_multiple_definition: F<"allow-multiple-definition">,
 def as_needed: F<"as-needed">,
   HelpText<"Only set DT_NEEDED for shared libraries if used">;
+def call_graph_ordering_file: S<"call-graph-ordering-file">,
+  HelpText<"Layout sections to optimize the given callgraph">;
 // -chroot doesn't have a help text because it is an internal option.
 def chroot: S<"chroot">;
diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
index 4367eec..7ade7d7 100644
--- a/ELF/Writer.cpp
+++ b/ELF/Writer.cpp
@@ -9,6 +9,7 @@
 #include "Writer.h"
 #include "AArch64ErrataFix.h"
+#include "CallGraphSort.h"
 #include "Config.h"
 #include "Filesystem.h"
 #include "LinkerScript.h"
@@ -1050,6 +1051,15 @@ static DenseMap<SectionBase *, int> buildSectionOrder() {
 // If no layout was provided by linker script, we want to apply default
 // sorting for special input sections. This also handles --symbol-ordering-file.
 template <class ELFT> void Writer<ELFT>::sortInputSections() {
+  // Use the rarely used option -call-graph-ordering-file to sort sections.
+  if (!Config->CallGraphProfile.empty()) {
+    DenseMap<const InputSectionBase *, int> OrderMap =
+        computeCallGraphProfileOrder();
+    if (OutputSection *Sec = findSection(".text"))
+      Sec->sort([&](InputSectionBase *S) { return OrderMap.lookup(S); });
+  }
   // Sort input sections by priority using the list provided
   // by --symbol-ordering-file.
   DenseMap<SectionBase *, int> Order = buildSectionOrder();
diff --git a/test/ELF/cgprofile-object.s b/test/ELF/cgprofile-object.s
new file mode 100644
index 0000000..43b67bf
--- /dev/null
+++ b/test/ELF/cgprofile-object.s
@@ -0,0 +1,50 @@
+# REQUIRES: x86
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
+# RUN: ld.lld %t -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT
+# RUN: echo "_start zed 1" > %t.call_graph
+# RUN: echo "foo bar 1" >> %t.call_graph
+# RUN: echo "zed foo 1" >> %t.call_graph
+# RUN: ld.lld %t --call-graph-ordering-file %t.call_graph -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s
+    .section    .text.foo,"ax", at progbits
+    .globl  foo
+    retq
+    .section    .text.bar,"ax", at progbits
+    .globl  bar
+    retq
+    .section    .text.zed,"ax", at progbits
+    .globl  zed
+    retq
+    .section    .text._start,"ax", at progbits
+    .globl  _start
+    retq
+# CHECK:          Name: _start
+# CHECK-NEXT:     Value: 0x201000
+# CHECK:          Name: bar
+# CHECK-NEXT:     Value: 0x201003
+# CHECK:          Name: foo
+# CHECK-NEXT:     Value: 0x201002
+# CHECK:          Name: zed
+# CHECK-NEXT:     Value: 0x201001
+# NOSORT:          Name: _start
+# NOSORT-NEXT:     Value: 0x201003
+# NOSORT:          Name: bar
+# NOSORT-NEXT:     Value: 0x201001
+# NOSORT:          Name: foo
+# NOSORT-NEXT:     Value: 0x201000
+# NOSORT:          Name: zed
+# NOSORT-NEXT:     Value: 0x201002

More information about the llvm-commits mailing list