[PATCH] D36351: [lld][ELF] Add profile guided section layout

Michael Spencer via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 7 14:44:46 PST 2018


On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:

> I have benchmarked this by timing lld ltoing FileCheck. The working set
> is much larger this time. The old callgraph had 4079 calls, this one has
> 30616.
>
> The results are somewhat similar:
>
>  Performance counter stats for '../default-ld.lld @response.txt' (10 runs):
>
>            498,771      iTLB-load-misses
>             ( +-  0.10% )
>        224,751,360      L1-icache-load-misses
>            ( +-  0.00% )
>
>        2.339864606 seconds time elapsed
>       ( +-  0.06% )
>
>  Performance counter stats for '../sorted-ld.lld @response.txt' (10 runs):
>
>            556,999      iTLB-load-misses
>             ( +-  0.17% )
>        216,788,838      L1-icache-load-misses
>            ( +-  0.01% )
>
>        2.326596163 seconds time elapsed
>       ( +-  0.04% )
>
> As with the previous test iTLB gets worse and L1 gets better. The net
> result is a very small speedup.
>
> Do you know how big the chromium call graph is?
>

Not sure, but the call graph for a high profile internal game I tested is
about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.  Given
that it's a game it runs a decent portion of that 17MiB 60 times a second,
while llvm is heavily pass based, so I don't expect the instruction working
set over a small period of time to be that high.

I am however surprised by the 10% increase in iTLB misses.

- Michael Spencer


>
> Cheers,
> Rafael
>
> Michael Spencer via Phabricator <reviews at reviews.llvm.org> writes:
>
> > Bigcheese updated this revision to Diff 132667.
> > Bigcheese added a comment.
> >
> > - Don't reorder non-reorderable sections
> > - Skip edges across output sections
> > - Add tests
> >
> >
> > https://reviews.llvm.org/D36351
> >
> > Files:
> >   ELF/CMakeLists.txt
> >   ELF/CallGraphSort.cpp
> >   ELF/CallGraphSort.h
> >   ELF/Config.h
> >   ELF/Driver.cpp
> >   ELF/Options.td
> >   ELF/Writer.cpp
> >   test/ELF/cgprofile-txt.s
> >
> > Index: test/ELF/cgprofile-txt.s
> > ===================================================================
> > --- /dev/null
> > +++ test/ELF/cgprofile-txt.s
> > @@ -0,0 +1,106 @@
> > +# REQUIRES: x86
> > +
> > +# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
> > +# RUN: ld.lld -e A %t -o %t2
> > +# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT
> > +
> > +# RUN: echo "A B 100" > %t.call_graph
> > +# RUN: echo "A C 40" >> %t.call_graph
> > +# RUN: echo "B C 30" >> %t.call_graph
> > +# RUN: echo "C D 90" >> %t.call_graph
> > +# RUN: echo "PP TS 100" >> %t.call_graph
> > +# RUN: echo "_init2 _init 24567837" >> %t.call_graph
> > +# RUN: echo "TS QC 9001" >> %t.call_graph
> > +# RUN: ld.lld -e A %t --call-graph-ordering-file %t.call_graph -o %t2
> > +# RUN: llvm-readobj -symbols %t2 | FileCheck %s
> > +
> > +    .section    .text.D,"ax", at progbits
> > +D:
> > +    retq
> > +
> > +    .section    .text.C,"ax", at progbits
> > +    .globl  C
> > +C:
> > +    retq
> > +
> > +    .section    .text.B,"ax", at progbits
> > +    .globl  B
> > +B:
> > +    retq
> > +
> > +    .section    .text.A,"ax", at progbits
> > +    .globl  A
> > +A:
> > +    retq
> > +
> > +    .section    .ponies,"ax", at progbits,unique,1
> > +    .globl TS
> > +TS:
> > +    retq
> > +
> > +    .section    .ponies,"ax", at progbits,unique,2
> > +    .globl PP
> > +PP:
> > +    retq
> > +
> > +    .section    .other,"ax", at progbits,unique,1
> > +    .globl QC
> > +QC:
> > +    retq
> > +
> > +    .section    .other,"ax", at progbits,unique,2
> > +    .globl GB
> > +GB:
> > +    retq
> > +
> > +    .section    .init,"ax", at progbits,unique,1
> > +    .globl _init
> > +_init:
> > +    retq
> > +
> > +    .section    .init,"ax", at progbits,unique,2
> > +    .globl _init2
> > +_init2:
> > +    retq
> > +
> > +# CHECK:          Name: D
> > +# CHECK-NEXT:     Value: 0x201003
> > +# CHECK:          Name: A
> > +# CHECK-NEXT:     Value: 0x201000
> > +# CHECK:          Name: B
> > +# CHECK-NEXT:     Value: 0x201001
> > +# CHECK:          Name: C
> > +# CHECK-NEXT:     Value: 0x201002
> > +# CHECK:          Name: GB
> > +# CHECK-NEXT:     Value: 0x201007
> > +# CHECK:          Name: PP
> > +# CHECK-NEXT:     Value: 0x201004
> > +# CHECK:          Name: QC
> > +# CHECK-NEXT:     Value: 0x201006
> > +# CHECK:          Name: TS
> > +# CHECK-NEXT:     Value: 0x201005
> > +# CHECK:          Name: _init
> > +# CHECK-NEXT:     Value: 0x201008
> > +# CHECK:          Name: _init2
> > +# CHECK-NEXT:     Value: 0x201009
> > +
> > +# NOSORT:          Name: D
> > +# NOSORT-NEXT:     Value: 0x201000
> > +# NOSORT:          Name: A
> > +# NOSORT-NEXT:     Value: 0x201003
> > +# NOSORT:          Name: B
> > +# NOSORT-NEXT:     Value: 0x201002
> > +# NOSORT:          Name: C
> > +# NOSORT-NEXT:     Value: 0x201001
> > +# NOSORT:          Name: GB
> > +# NOSORT-NEXT:     Value: 0x201007
> > +# NOSORT:          Name: PP
> > +# NOSORT-NEXT:     Value: 0x201005
> > +# NOSORT:          Name: QC
> > +# NOSORT-NEXT:     Value: 0x201006
> > +# NOSORT:          Name: TS
> > +# NOSORT-NEXT:     Value: 0x201004
> > +# NOSORT:          Name: _init
> > +# NOSORT-NEXT:     Value: 0x201008
> > +# NOSORT:          Name: _init2
> > +# NOSORT-NEXT:     Value: 0x201009
> > Index: ELF/Writer.cpp
> > ===================================================================
> > --- ELF/Writer.cpp
> > +++ ELF/Writer.cpp
> > @@ -9,6 +9,7 @@
> >
> >  #include "Writer.h"
> >  #include "AArch64ErrataFix.h"
> > +#include "CallGraphSort.h"
> >  #include "Config.h"
> >  #include "Filesystem.h"
> >  #include "LinkerScript.h"
> > @@ -1050,6 +1051,17 @@
> >  // If no layout was provided by linker script, we want to apply default
> >  // sorting for special input sections. This also handles
> --symbol-ordering-file.
> >  template <class ELFT> void Writer<ELFT>::sortInputSections() {
> > +  // Use the rarely used option -call-graph-ordering-file to sort
> sections.
> > +  if (!Config->CallGraphProfile.empty()) {
> > +    DenseMap<const InputSectionBase *, int> OrderMap =
> > +        computeCallGraphProfileOrder();
> > +
> > +    for (BaseCommand *Base : Script->SectionCommands)
> > +      if (auto *Sec = dyn_cast<OutputSection>(Base))
> > +        if (Sec->Live)
> > +          Sec->sort([&](InputSectionBase *S) { return
> OrderMap.lookup(S); });
> > +  }
> > +
> >    // Sort input sections by priority using the list provided
> >    // by --symbol-ordering-file.
> >    DenseMap<SectionBase *, int> Order = buildSectionOrder();
> > Index: ELF/Options.td
> > ===================================================================
> > --- ELF/Options.td
> > +++ ELF/Options.td
> > @@ -51,6 +51,9 @@
> >  def as_needed: F<"as-needed">,
> >    HelpText<"Only set DT_NEEDED for shared libraries if used">;
> >
> > +def call_graph_ordering_file: S<"call-graph-ordering-file">,
> > +  HelpText<"Layout sections to optimize the given callgraph">;
> > +
> >  // -chroot doesn't have a help text because it is an internal option.
> >  def chroot: S<"chroot">;
> >
> > Index: ELF/Driver.cpp
> > ===================================================================
> > --- ELF/Driver.cpp
> > +++ ELF/Driver.cpp
> > @@ -570,6 +570,31 @@
> >    return {BuildIdKind::None, {}};
> >  }
> >
> > +static void readCallGraph(MemoryBufferRef MB) {
> > +  // Build a map from symbol name to section
> > +  DenseMap<StringRef, InputSectionBase *> SymbolSection;
> > +  for (InputFile *File : ObjectFiles)
> > +    for (Symbol *Sym : File->getSymbols())
> > +      if (auto *D = dyn_cast<Defined>(Sym))
> > +        if (auto *IS = dyn_cast_or_null<InputSectionBase>(D->Section))
> > +          SymbolSection[D->getName()] = IS;
> > +
> > +  std::vector<StringRef> Lines = args::getLines(MB);
> > +  for (StringRef L : Lines) {
> > +    SmallVector<StringRef, 3> Fields;
> > +    L.split(Fields, ' ');
> > +    if (Fields.size() != 3)
> > +      fatal("parse error");
> > +    uint64_t Count;
> > +    if (!to_integer(Fields[2], Count))
> > +      fatal("parse error");
> > +    InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
> > +    InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
> > +    if (FromSec && ToSec)
> > +      Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] = Count;
> > +  }
> > +}
> > +
> >  static bool getCompressDebugSections(opt::InputArgList &Args) {
> >    StringRef S = Args.getLastArgValue(OPT_compress_debug_sections,
> "none");
> >    if (S == "none")
> > @@ -1084,6 +1109,10 @@
> >    // Apply symbol renames for -wrap.
> >    Symtab->applySymbolWrap();
> >
> > +  if (auto *Arg = Args.getLastArg(OPT_call_graph_ordering_file))
> > +    if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))
> > +      readCallGraph(*Buffer);
> > +
> >    // Now that we have a complete list of input files.
> >    // Beyond this point, no new files are added.
> >    // Aggregate all input sections into one place.
> > Index: ELF/Config.h
> > ===================================================================
> > --- ELF/Config.h
> > +++ ELF/Config.h
> > @@ -25,6 +25,7 @@
> >  namespace elf {
> >
> >  class InputFile;
> > +class InputSectionBase;
> >
> >  enum ELFKind {
> >    ELFNoneKind,
> > @@ -104,6 +105,9 @@
> >    std::vector<SymbolVersion> VersionScriptGlobals;
> >    std::vector<SymbolVersion> VersionScriptLocals;
> >    std::vector<uint8_t> BuildIdVector;
> > +  llvm::MapVector<std::pair<const InputSectionBase *, const
> InputSectionBase *>,
> > +                  uint64_t>
> > +      CallGraphProfile;
> >    bool AllowMultipleDefinition;
> >    bool AndroidPackDynRelocs = false;
> >    bool ARMHasBlx = false;
> > Index: ELF/CallGraphSort.h
> > ===================================================================
> > --- /dev/null
> > +++ ELF/CallGraphSort.h
> > @@ -0,0 +1,23 @@
> > +//===- CallGraphSort.h ------------------------------------------*-
> C++ -*-===//
> > +//
> > +//                             The LLVM Linker
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +
> > +#ifndef LLD_ELF_CALL_GRAPH_SORT_H
> > +#define LLD_ELF_CALL_GRAPH_SORT_H
> > +
> > +#include "llvm/ADT/DenseMap.h"
> > +
> > +namespace lld {
> > +namespace elf {
> > +class InputSectionBase;
> > +
> > +llvm::DenseMap<const InputSectionBase *, int>
> computeCallGraphProfileOrder();
> > +} // namespace elf
> > +} // namespace lld
> > +
> > +#endif
> > Index: ELF/CallGraphSort.cpp
> > ===================================================================
> > --- /dev/null
> > +++ ELF/CallGraphSort.cpp
> > @@ -0,0 +1,373 @@
> > +//===- CallGraphSort.cpp ------------------------------
> --------------------===//
> > +//
> > +//                             The LLVM Linker
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +///
> > +/// Implementation of Call-Chain Clustering from: Optimizing Function
> Placement
> > +/// for Large-Scale Data-Center Applications
> > +/// https://research.fb.com/wp-content/uploads/2017/01/
> cgo2017-hfsort-final1.pdf
> > +///
> > +/// The goal of this algorithm is to improve runtime performance of the
> final
> > +/// executable by arranging code sections such that page table and
> i-cache
> > +/// misses are minimized.
> > +///
> > +/// Definitions:
> > +/// * Cluster
> > +///   * An ordered list of input sections which are layed out as a
> unit. At the
> > +///     beginning of the algorithm each input section has its own
> cluster and
> > +///     the weight of the cluster is the sum of the weight of all
> incomming
> > +///     edges.
> > +/// * Call-Chain Clustering (Cウ) Heuristic
> > +///   * Defines when and how clusters are combined. Pick the highest
> weight edge
> > +///     from cluster _u_ to _v_ then move the sections in _v_ and
> append them to
> > +///     _u_ unless the combined size would be larger than the page size.
> > +/// * Density
> > +///   * The weight of the cluster divided by the size of the cluster.
> This is a
> > +///     proxy for the ammount of execution time spent per byte of the
> cluster.
> > +///
> > +/// It does so given a call graph profile by the following:
> > +/// * Build a call graph from the profile
> > +/// * While there are unresolved edges
> > +///   * Find the edge with the highest weight
> > +///   * Check if merging the two clusters would create a cluster larger
> than the
> > +///     target page size
> > +///   * If not, contract that edge putting the callee after the caller
> > +/// * Sort remaining clusters by density
> > +///
> > +//===------------------------------------------------------
> ----------------===//
> > +
> > +#include "CallGraphSort.h"
> > +#include "OutputSections.h"
> > +#include "SymbolTable.h"
> > +#include "Symbols.h"
> > +#include "Target.h"
> > +
> > +#include "llvm/Support/MathExtras.h"
> > +
> > +#include <queue>
> > +#include <set>
> > +#include <unordered_map>
> > +
> > +using namespace llvm;
> > +using namespace lld;
> > +using namespace lld::elf;
> > +
> > +namespace {
> > +using NodeIndex = std::ptrdiff_t;
> > +using EdgeIndex = std::ptrdiff_t;
> > +
> > +struct Edge;
> > +
> > +struct EdgePriorityCmp {
> > +  std::vector<Edge> &Edges;
> > +  bool operator()(EdgeIndex A, EdgeIndex B) const;
> > +};
> > +
> > +using PriorityQueue = std::multiset<EdgeIndex, EdgePriorityCmp>;
> > +
> > +struct Node {
> > +  Node() = default;
> > +  Node(const InputSectionBase *IS);
> > +  std::vector<const InputSectionBase *> Sections;
> > +  std::vector<EdgeIndex> IncidentEdges;
> > +  int64_t Size = 0;
> > +  uint64_t Weight = 0;
> > +};
> > +
> > +struct Edge {
> > +  NodeIndex From;
> > +  NodeIndex To;
> > +  uint64_t Weight;
> > +  PriorityQueue::iterator PriorityPos;
> > +  bool operator==(const Edge Other) const;
> > +  bool operator<(const Edge Other) const;
> > +};
> > +
> > +bool EdgePriorityCmp::operator()(EdgeIndex A, EdgeIndex B) const {
> > +  return Edges[A].Weight < Edges[B].Weight;
> > +}
> > +
> > +struct EdgeDenseMapInfo {
> > +  static Edge getEmptyKey() {
> > +    return {DenseMapInfo<NodeIndex>::getEmptyKey(),
> > +            DenseMapInfo<NodeIndex>::getEmptyKey(), 0,
> > +            PriorityQueue::iterator()};
> > +  }
> > +  static Edge getTombstoneKey() {
> > +    return {DenseMapInfo<NodeIndex>::getTombstoneKey(),
> > +            DenseMapInfo<NodeIndex>::getTombstoneKey(), 0,
> > +            PriorityQueue::iterator()};
> > +  }
> > +  static unsigned getHashValue(const Edge &Val) {
> > +    return hash_combine(DenseMapInfo<NodeIndex>::getHashValue(Val.
> From),
> > +                        DenseMapInfo<NodeIndex>::getHashValue(Val.To));
> > +  }
> > +  static bool isEqual(const Edge &LHS, const Edge &RHS) { return LHS ==
> RHS; }
> > +};
> > +
> > +class CallGraphSort {
> > +public:
> > +  CallGraphSort();
> > +
> > +  DenseMap<const InputSectionBase *, int> run();
> > +
> > +private:
> > +  std::vector<Node> Nodes;
> > +  std::vector<Edge> Edges;
> > +
> > +  PriorityQueue WorkQueue{EdgePriorityCmp{Edges}};
> > +
> > +  bool killEdge(EdgeIndex EI);
> > +  void contractEdge(EdgeIndex CEI);
> > +  void generateClusters();
> > +};
> > +} // end anonymous namespace
> > +
> > +Node::Node(const InputSectionBase *IS) {
> > +  Sections.push_back(IS);
> > +  Size = IS->getSize();
> > +}
> > +
> > +bool Edge::operator==(const Edge Other) const {
> > +  return From == Other.From && To == Other.To;
> > +}
> > +
> > +bool Edge::operator<(const Edge Other) const {
> > +  if (From != Other.From)
> > +    return From < Other.From;
> > +  return To < Other.To;
> > +}
> > +
> > +static bool isKnownNonreorderableSection(const OutputSection *OS) {
> > +  return llvm::StringSwitch<bool>(OS->Name)
> > +      .Cases(".init", ".fini", ".init_array.", ".fini_array.",
> ".ctors.",
> > +             ".dtors.", true)
> > +      .Default(false);
> > +}
> > +
> > +// Take the edge list in Config->CallGraphProfile, resolve symbol names
> to
> > +// Symbols, and generate a graph between InputSections with the provided
> > +// weights.
> > +CallGraphSort::CallGraphSort() {
> > +  MapVector<std::pair<const InputSectionBase *, const InputSectionBase
> *>,
> > +            uint64_t> &Profile = Config->CallGraphProfile;
> > +  DenseMap<const InputSectionBase *, NodeIndex> SecToNode;
> > +  DenseMap<Edge, EdgeIndex, EdgeDenseMapInfo> EdgeMap;
> > +
> > +  auto GetOrCreateNode = [&](const InputSectionBase *IS) -> NodeIndex {
> > +    auto Res = SecToNode.insert(std::make_pair(IS, Nodes.size()));
> > +    if (Res.second)
> > +      Nodes.emplace_back(IS);
> > +    return Res.first->second;
> > +  };
> > +
> > +  // Create the graph.
> > +  for (const auto &C : Profile) {
> > +    const InputSectionBase *FromSB = C.first.first;
> > +    const InputSectionBase *ToSB = C.first.second;
> > +    uint64_t Weight = C.second;
> > +
> > +    if (Weight == 0)
> > +      continue;
> > +
> > +    if (FromSB->getOutputSection() != ToSB->getOutputSection())
> > +      continue;
> > +
> > +    if (isKnownNonreorderableSection(FromSB->getOutputSection()))
> > +      continue;
> > +
> > +    NodeIndex From = GetOrCreateNode(FromSB);
> > +    NodeIndex To = GetOrCreateNode(ToSB);
> > +
> > +    Nodes[To].Weight = SaturatingAdd(Nodes[To].Weight, Weight);
> > +
> > +    if (From == To)
> > +      continue;
> > +
> > +    Edge E{From, To, Weight, WorkQueue.end()};
> > +
> > +    // Add or increment an edge
> > +    auto Res = EdgeMap.insert(std::make_pair(E, Edges.size()));
> > +    EdgeIndex EI = Res.first->second;
> > +    if (Res.second) {
> > +      Edges.push_back(E);
> > +      Nodes[From].IncidentEdges.push_back(EI);
> > +      Nodes[To].IncidentEdges.push_back(EI);
> > +    } else
> > +      Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, Weight);
> > +  }
> > +}
> > +
> > +/// Like std::unique, but calls Merge on equal values. Merge is allowed
> > +/// to modifiy its first argument.
> > +///
> > +/// Merge is a callable with signature
> > +///   Merge(*declval<ForwardIt>(), *declval<ForwardIt>())
> > +///
> > +/// Example:
> > +///
> > +///   int a[] = {1, 2, 2, 3, 4, 5, 5};
> > +///   auto end = merge_unique(std::begin(a), std::end(a),
> > +///     [](int a, int b) { return a == b; },
> > +///     [](int &a, int b) { a += b; });
> > +///
> > +///   for (auto i = a; i != end; ++i)
> > +///     std::cout << *i << " ";
> > +///
> > +///   -- 1 4 3 4 10
> > +template <class ForwardIt, class PredTy, class MergeTy>
> > +static ForwardIt merge_unique(ForwardIt First, ForwardIt Last, PredTy
> Pred,
> > +                              MergeTy Merge) {
> > +  if (First == Last)
> > +    return Last;
> > +
> > +  ForwardIt I = First;
> > +  while (++I != Last) {
> > +    if (Pred(*First, *I))
> > +      Merge(*First, *I);
> > +    else if (++First != I)
> > +      *First = std::move(*I);
> > +  }
> > +  return ++First;
> > +}
> > +
> > +/// Marks an edge as head and removes it from the work queue.
> > +/// Returns true if the edge was killed, false if it was already dead.
> > +bool CallGraphSort::killEdge(EdgeIndex EI) {
> > +  Edge &E = Edges[EI];
> > +  if (E.PriorityPos != WorkQueue.end()) {
> > +    WorkQueue.erase(E.PriorityPos);
> > +    E.PriorityPos = WorkQueue.end();
> > +    return true;
> > +  }
> > +  return false;
> > +}
> > +
> > +/// Remove edge \p CEI from the graph while simultaneously merging its
> two
> > +/// incident vertices u and v. This merges any duplicate edges between
> u and v
> > +/// by accumulating their weights.
> > +void CallGraphSort::contractEdge(EdgeIndex CEI) {
> > +  // Make a copy of the edge as the original will be marked killed
> while being
> > +  // used.
> > +  Edge CE = Edges[CEI];
> > +  assert(CE.From != CE.To && "Got self edge!");
> > +  std::vector<EdgeIndex> &FE = Nodes[CE.From].IncidentEdges;
> > +
> > +  // Remove the self edge from From.
> > +  FE.erase(std::remove(FE.begin(), FE.end(), CEI));
> > +  std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;
> > +
> > +  // Update all edges incident with To to reference From instead. Then
> if they
> > +  // aren't self edges add them to From.
> > +  for (EdgeIndex EI : TE) {
> > +    Edge &E = Edges[EI];
> > +    if (E.From == CE.To)
> > +      E.From = CE.From;
> > +    if (E.To == CE.To)
> > +      E.To = CE.From;
> > +    if (E.To == E.From) {
> > +      killEdge(EI);
> > +      continue;
> > +    }
> > +    FE.push_back(EI);
> > +  }
> > +
> > +  // Free memory. Otherwise we end up with N^2 memory usage.
> > +  std::vector<EdgeIndex>().swap(TE);
> > +
> > +  if (FE.empty())
> > +    return;
> > +
> > +  // Sort edges so they can be merged. The stability of this sort
> doesn't matter
> > +  // as equal edges will be merged in an order independent manner.
> > +  std::sort(FE.begin(), FE.end(),
> > +            [&](EdgeIndex AI, EdgeIndex BI) { return Edges[AI] <
> Edges[BI]; });
> > +
> > +  FE.erase(merge_unique(FE.begin(), FE.end(),
> > +                        [&](EdgeIndex AI, EdgeIndex BI) {
> > +                          return Edges[AI] == Edges[BI];
> > +                        },
> > +                        [&](EdgeIndex AI, EdgeIndex BI) {
> > +                          Edge &A = Edges[AI];
> > +                          Edge &B = Edges[BI];
> > +                          killEdge(BI);
> > +                          bool Restore = killEdge(AI);
> > +                          A.Weight = SaturatingAdd(A.Weight, B.Weight);
> > +                          if (Restore)
> > +                            A.PriorityPos = WorkQueue.insert(AI);
> > +                        }),
> > +           FE.end());
> > +}
> > +
> > +// Group InputSections into clusters using the Call-Chain Clustering
> heuristic
> > +// then sort the clusters by density.
> > +void CallGraphSort::generateClusters() {
> > +  for (size_t I = 0; I < Edges.size(); ++I) {
> > +    Edges[I].PriorityPos = WorkQueue.insert(I);
> > +  }
> > +
> > +  // Collapse the graph.
> > +  while (!WorkQueue.empty()) {
> > +    PriorityQueue::const_iterator I = --WorkQueue.end();
> > +    EdgeIndex MaxI = *I;
> > +    const Edge MaxE = Edges[MaxI];
> > +    killEdge(MaxI);
> > +    // Merge the Nodes.
> > +    Node &From = Nodes[MaxE.From];
> > +    Node &To = Nodes[MaxE.To];
> > +    if (From.Size + To.Size > Target->PageSize)
> > +      continue;
> > +    contractEdge(MaxI);
> > +    From.Sections.insert(From.Sections.end(), To.Sections.begin(),
> > +                         To.Sections.end());
> > +    From.Size += To.Size;
> > +    From.Weight = SaturatingAdd(From.Weight, To.Weight);
> > +    To.Sections.clear();
> > +    To.Size = 0;
> > +    To.Weight = 0;
> > +  }
> > +
> > +  // Remove empty or dead nodes.
> > +  Nodes.erase(std::remove_if(Nodes.begin(), Nodes.end(),
> > +                             [](const Node &N) {
> > +                               return N.Size == 0 || N.Sections.empty();
> > +                             }),
> > +              Nodes.end());
> > +
> > +  // Sort by density. Invalidates all NodeIndexs.
> > +  std::sort(Nodes.begin(), Nodes.end(), [](const Node &A, const Node
> &B) {
> > +    return (APFloat(APFloat::IEEEdouble(), A.Weight) /
> > +            APFloat(APFloat::IEEEdouble(), A.Size))
> > +               .compare(APFloat(APFloat::IEEEdouble(), B.Weight) /
> > +                        APFloat(APFloat::IEEEdouble(), B.Size)) ==
> > +           APFloat::cmpLessThan;
> > +  });
> > +}
> > +
> > +DenseMap<const InputSectionBase *, int> CallGraphSort::run() {
> > +  generateClusters();
> > +
> > +  // Generate order.
> > +  llvm::DenseMap<const InputSectionBase *, int> OrderMap;
> > +  ssize_t CurOrder = 1;
> > +
> > +  for (const Node &N : Nodes)
> > +    for (const InputSectionBase *IS : N.Sections)
> > +      OrderMap[IS] = CurOrder++;
> > +
> > +  return OrderMap;
> > +}
> > +
> > +// Sort sections by the profile data provided by -callgraph-profile-file
> > +//
> > +// This first builds a call graph based on the profile data then
> iteratively
> > +// merges the hottest call edges as long as it would not create a
> cluster larger
> > +// than the page size. All clusters are then sorted by a density metric
> to
> > +// further improve locality.
> > +DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder()
> {
> > +  return CallGraphSort().run();
> > +}
> > Index: ELF/CMakeLists.txt
> > ===================================================================
> > --- ELF/CMakeLists.txt
> > +++ ELF/CMakeLists.txt
> > @@ -19,6 +19,7 @@
> >    Arch/SPARCV9.cpp
> >    Arch/X86.cpp
> >    Arch/X86_64.cpp
> > +  CallGraphSort.cpp
> >    Driver.cpp
> >    DriverUtils.cpp
> >    EhFrame.cpp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180207/77d2fd10/attachment.html>


More information about the llvm-commits mailing list