<div dir="ltr"><div class="gmail_extra"><div><div class="gmail_signature">On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I have benchmarked this by timing lld ltoing FileCheck. The working set<br>

is much larger this time. The old callgraph had 4079 calls, this one has<br>

30616.<br>

<br>

The results are somewhat similar:<br>

<br>

 Performance counter stats for '../default-ld.lld @response.txt' (10 runs):<br>

<br>

           498,771      iTLB-load-misses                                              ( +-  0.10% )<br>

       224,751,360      L1-icache-load-misses                                         ( +-  0.00% )<br>

<br>

       2.339864606 seconds time elapsed                                          ( +-  0.06% )<br>

<br>

 Performance counter stats for '../sorted-ld.lld @response.txt' (10 runs):<br>

<br>

           556,999      iTLB-load-misses                                              ( +-  0.17% )<br>

       216,788,838      L1-icache-load-misses                                         ( +-  0.01% )<br>

<br>

       2.326596163 seconds time elapsed                                          ( +-  0.04% )<br>

<br>

As with the previous test iTLB gets worse and L1 gets better. The net<br>

result is a very small speedup.<br>

<br>

Do you know how big the chromium call graph is?<br></blockquote><div><br></div><div>Not sure, but the call graph for a high profile internal game I tested is about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.  Given that it's a game it runs a decent portion of that 17MiB 60 times a second, while llvm is heavily pass based, so I don't expect the instruction working set over a small period of time to be that high.</div><div><br></div><div>I am however surprised by the 10% increase in iTLB misses.</div><div><br></div><div>- Michael Spencer</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Cheers,<br>

Rafael<br>

<span class="gmail-"><br>

Michael Spencer via Phabricator <<a href="mailto:reviews@reviews.llvm.org">reviews@reviews.llvm.org</a>> writes:<br>

<br>

</span><div><div class="gmail-h5">> Bigcheese updated this revision to Diff 132667.<br>

> Bigcheese added a comment.<br>

><br>

> - Don't reorder non-reorderable sections<br>

> - Skip edges across output sections<br>

> - Add tests<br>

><br>

><br>

> <a href="https://reviews.llvm.org/D36351" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D36351</a><br>

><br>

> Files:<br>

>   ELF/CMakeLists.txt<br>

>   ELF/CallGraphSort.cpp<br>

>   ELF/CallGraphSort.h<br>

>   ELF/Config.h<br>

>   ELF/Driver.cpp<br>

>   ELF/Options.td<br>

>   ELF/Writer.cpp<br>

>   test/ELF/cgprofile-txt.s<br>

><br>

</div></div><span class="gmail-">> Index: test/ELF/cgprofile-txt.s<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- /dev/null<br>

> +++ test/ELF/cgprofile-txt.s<br>

</span>> @@ -0,0 +1,106 @@<br>

<span class="gmail-">> +# REQUIRES: x86<br>

> +<br>

> +# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t<br>

> +# RUN: ld.lld -e A %t -o %t2<br>

> +# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT<br>

> +<br>

> +# RUN: echo "A B 100" > %t.call_graph<br>

> +# RUN: echo "A C 40" >> %t.call_graph<br>

> +# RUN: echo "B C 30" >> %t.call_graph<br>

> +# RUN: echo "C D 90" >> %t.call_graph<br>

> +# RUN: echo "PP TS 100" >> %t.call_graph<br>

</span>> +# RUN: echo "_init2 _init 24567837" >> %t.call_graph<br>

> +# RUN: echo "TS QC 9001" >> %t.call_graph<br>

<span class="gmail-">> +# RUN: ld.lld -e A %t --call-graph-ordering-file %t.call_graph -o %t2<br>

> +# RUN: llvm-readobj -symbols %t2 | FileCheck %s<br>

> +<br>

> +    .section    .text.D,"ax",@progbits<br>

> +D:<br>

</span>> +    retq<br>

> +<br>

> +    .section    .text.C,"ax",@progbits<br>

> +    .globl  C<br>

> +C:<br>

> +    retq<br>

> +<br>

> +    .section    .text.B,"ax",@progbits<br>

> +    .globl  B<br>

> +B:<br>

> +    retq<br>

> +<br>

> +    .section    .text.A,"ax",@progbits<br>

> +    .globl  A<br>

> +A:<br>

<span class="gmail-">> +    retq<br>

> +<br>

> +    .section    .ponies,"ax",@progbits,unique,<wbr>1<br>

> +    .globl TS<br>

> +TS:<br>

> +    retq<br>

> +<br>

> +    .section    .ponies,"ax",@progbits,unique,<wbr>2<br>

> +    .globl PP<br>

> +PP:<br>

> +    retq<br>

> +<br>

</span>> +    .section    .other,"ax",@progbits,unique,1<br>

> +    .globl QC<br>

> +QC:<br>

> +    retq<br>

> +<br>

> +    .section    .other,"ax",@progbits,unique,2<br>

> +    .globl GB<br>

> +GB:<br>

> +    retq<br>

> +<br>

> +    .section    .init,"ax",@progbits,unique,1<br>

> +    .globl _init<br>

> +_init:<br>

> +    retq<br>

> +<br>

> +    .section    .init,"ax",@progbits,unique,2<br>

> +    .globl _init2<br>

> +_init2:<br>

> +    retq<br>

> +<br>

> +# CHECK:          Name: D<br>

> +# CHECK-NEXT:     Value: 0x201003<br>

> +# CHECK:          Name: A<br>

> +# CHECK-NEXT:     Value: 0x201000<br>

> +# CHECK:          Name: B<br>

> +# CHECK-NEXT:     Value: 0x201001<br>

> +# CHECK:          Name: C<br>

> +# CHECK-NEXT:     Value: 0x201002<br>

> +# CHECK:          Name: GB<br>

> +# CHECK-NEXT:     Value: 0x201007<br>

> +# CHECK:          Name: PP<br>

> +# CHECK-NEXT:     Value: 0x201004<br>

> +# CHECK:          Name: QC<br>

> +# CHECK-NEXT:     Value: 0x201006<br>

> +# CHECK:          Name: TS<br>

> +# CHECK-NEXT:     Value: 0x201005<br>

> +# CHECK:          Name: _init<br>

> +# CHECK-NEXT:     Value: 0x201008<br>

> +# CHECK:          Name: _init2<br>

> +# CHECK-NEXT:     Value: 0x201009<br>

> +<br>

> +# NOSORT:          Name: D<br>

> +# NOSORT-NEXT:     Value: 0x201000<br>

> +# NOSORT:          Name: A<br>

> +# NOSORT-NEXT:     Value: 0x201003<br>

> +# NOSORT:          Name: B<br>

> +# NOSORT-NEXT:     Value: 0x201002<br>

> +# NOSORT:          Name: C<br>

> +# NOSORT-NEXT:     Value: 0x201001<br>

> +# NOSORT:          Name: GB<br>

> +# NOSORT-NEXT:     Value: 0x201007<br>

> +# NOSORT:          Name: PP<br>

> +# NOSORT-NEXT:     Value: 0x201005<br>

> +# NOSORT:          Name: QC<br>

> +# NOSORT-NEXT:     Value: 0x201006<br>

> +# NOSORT:          Name: TS<br>

> +# NOSORT-NEXT:     Value: 0x201004<br>

> +# NOSORT:          Name: _init<br>

> +# NOSORT-NEXT:     Value: 0x201008<br>

> +# NOSORT:          Name: _init2<br>

> +# NOSORT-NEXT:     Value: 0x201009<br>

> Index: ELF/Writer.cpp<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- ELF/Writer.cpp<br>

> +++ ELF/Writer.cpp<br>

<span class="gmail-">> @@ -9,6 +9,7 @@<br>

><br>

>  #include "Writer.h"<br>

>  #include "AArch64ErrataFix.h"<br>

> +#include "CallGraphSort.h"<br>

>  #include "Config.h"<br>

>  #include "Filesystem.h"<br>

>  #include "LinkerScript.h"<br>

</span>> @@ -1050,6 +1051,17 @@<br>

<span class="gmail-">>  // If no layout was provided by linker script, we want to apply default<br>

>  // sorting for special input sections. This also handles --symbol-ordering-file.<br>

</span><span class="gmail-">>  template <class ELFT> void Writer<ELFT>::<wbr>sortInputSections() {<br>

</span><span class="gmail-">> +  // Use the rarely used option -call-graph-ordering-file to sort sections.<br>

> +  if (!Config->CallGraphProfile.<wbr>empty()) {<br>

> +    DenseMap<const InputSectionBase *, int> OrderMap =<br>

> +        computeCallGraphProfileOrder()<wbr>;<br>

> +<br>

</span>> +    for (BaseCommand *Base : Script->SectionCommands)<br>

> +      if (auto *Sec = dyn_cast<OutputSection>(Base))<br>

> +        if (Sec->Live)<br>

> +          Sec->sort([&](InputSectionBase *S) { return OrderMap.lookup(S); });<br>

<span class="gmail-">> +  }<br>

> +<br>

>    // Sort input sections by priority using the list provided<br>

>    // by --symbol-ordering-file.<br>

</span><span class="gmail-">>    DenseMap<SectionBase *, int> Order = buildSectionOrder();<br>

</span>> Index: ELF/Options.td<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- ELF/Options.td<br>

> +++ ELF/Options.td<br>

> @@ -51,6 +51,9 @@<br>

<span class="gmail-">>  def as_needed: F<"as-needed">,<br>

>    HelpText<"Only set DT_NEEDED for shared libraries if used">;<br>

><br>

> +def call_graph_ordering_file: S<"call-graph-ordering-file">,<br>

> +  HelpText<"Layout sections to optimize the given callgraph">;<br>

> +<br>

>  // -chroot doesn't have a help text because it is an internal option.<br>

>  def chroot: S<"chroot">;<br>

><br>

</span>> Index: ELF/Driver.cpp<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- ELF/Driver.cpp<br>

> +++ ELF/Driver.cpp<br>

> @@ -570,6 +570,31 @@<br>

<span class="gmail-">>    return {BuildIdKind::None, {}};<br>

>  }<br>

><br>

> +static void readCallGraph(MemoryBufferRef MB) {<br>

> +  // Build a map from symbol name to section<br>

> +  DenseMap<StringRef, InputSectionBase *> SymbolSection;<br>

> +  for (InputFile *File : ObjectFiles)<br>

> +    for (Symbol *Sym : File->getSymbols())<br>

> +      if (auto *D = dyn_cast<Defined>(Sym))<br>

> +        if (auto *IS = dyn_cast_or_null<<wbr>InputSectionBase>(D->Section))<br>

> +          SymbolSection[D->getName()] = IS;<br>

> +<br>

</span><span class="gmail-">> +  std::vector<StringRef> Lines = args::getLines(MB);<br>

> +  for (StringRef L : Lines) {<br>

> +    SmallVector<StringRef, 3> Fields;<br>

> +    L.split(Fields, ' ');<br>

> +    if (Fields.size() != 3)<br>

> +      fatal("parse error");<br>

> +    uint64_t Count;<br>

> +    if (!to_integer(Fields[2], Count))<br>

> +      fatal("parse error");<br>

</span>> +    InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]<wbr>);<br>

> +    InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]<wbr>);<br>

<span class="gmail-">> +    if (FromSec && ToSec)<br>

> +      Config->CallGraphProfile[std::<wbr>make_pair(FromSec, ToSec)] = Count;<br>

> +  }<br>

> +}<br>

> +<br>

>  static bool getCompressDebugSections(opt::<wbr>InputArgList &Args) {<br>

>    StringRef S = Args.getLastArgValue(OPT_<wbr>compress_debug_sections, "none");<br>

>    if (S == "none")<br>

</span>> @@ -1084,6 +1109,10 @@<br>

<span class="gmail-">>    // Apply symbol renames for -wrap.<br>

>    Symtab->applySymbolWrap();<br>

><br>

> +  if (auto *Arg = Args.getLastArg(OPT_call_<wbr>graph_ordering_file))<br>

> +    if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))<br>

> +      readCallGraph(*Buffer);<br>

> +<br>

>    // Now that we have a complete list of input files.<br>

>    // Beyond this point, no new files are added.<br>

>    // Aggregate all input sections into one place.<br>

</span><span class="gmail-">> Index: ELF/Config.h<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- ELF/Config.h<br>

> +++ ELF/Config.h<br>

</span>> @@ -25,6 +25,7 @@<br>

<span class="gmail-">>  namespace elf {<br>

><br>

>  class InputFile;<br>

> +class InputSectionBase;<br>

><br>

>  enum ELFKind {<br>

>    ELFNoneKind,<br>

</span>> @@ -104,6 +105,9 @@<br>

<span class="gmail-">>    std::vector<SymbolVersion> VersionScriptGlobals;<br>

>    std::vector<SymbolVersion> VersionScriptLocals;<br>

>    std::vector<uint8_t> BuildIdVector;<br>

</span>> +  llvm::MapVector<std::pair<<wbr>const InputSectionBase *, const InputSectionBase *>,<br>

> +                  uint64_t><br>

<span class="gmail-">> +      CallGraphProfile;<br>

>    bool AllowMultipleDefinition;<br>

>    bool AndroidPackDynRelocs = false;<br>

>    bool ARMHasBlx = false;<br>

</span>> Index: ELF/CallGraphSort.h<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- /dev/null<br>

> +++ ELF/CallGraphSort.h<br>

<span class="gmail-">> @@ -0,0 +1,23 @@<br>

> +//===- CallGraphSort.h ------------------------------<wbr>------------*- C++ -*-===//<br>

> +//<br>

> +//                             The LLVM Linker<br>

> +//<br>

> +// This file is distributed under the University of Illinois Open Source<br>

> +// License. See LICENSE.TXT for details.<br>

> +//<br>

> +//===------------------------<wbr>------------------------------<wbr>----------------===//<br>

> +<br>

> +#ifndef LLD_ELF_CALL_GRAPH_SORT_H<br>

> +#define LLD_ELF_CALL_GRAPH_SORT_H<br>

> +<br>

> +#include "llvm/ADT/DenseMap.h"<br>

> +<br>

> +namespace lld {<br>

> +namespace elf {<br>

> +class InputSectionBase;<br>

> +<br>

> +llvm::DenseMap<const InputSectionBase *, int> computeCallGraphProfileOrder()<wbr>;<br>

> +} // namespace elf<br>

> +} // namespace lld<br>

> +<br>

> +#endif<br>

</span>> Index: ELF/CallGraphSort.cpp<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- /dev/null<br>

> +++ ELF/CallGraphSort.cpp<br>

> @@ -0,0 +1,373 @@<br>

<div><div class="gmail-h5">> +//===- CallGraphSort.cpp ------------------------------<wbr>--------------------===//<br>

> +//<br>

> +//                             The LLVM Linker<br>

> +//<br>

> +// This file is distributed under the University of Illinois Open Source<br>

> +// License. See LICENSE.TXT for details.<br>

> +//<br>

> +//===------------------------<wbr>------------------------------<wbr>----------------===//<br>

> +///<br>

> +/// Implementation of Call-Chain Clustering from: Optimizing Function Placement<br>

> +/// for Large-Scale Data-Center Applications<br>

> +/// <a href="https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf" rel="noreferrer" target="_blank">https://research.fb.com/wp-<wbr>content/uploads/2017/01/<wbr>cgo2017-hfsort-final1.pdf</a><br>

> +///<br>

> +/// The goal of this algorithm is to improve runtime performance of the final<br>

> +/// executable by arranging code sections such that page table and i-cache<br>

> +/// misses are minimized.<br>

> +///<br>

> +/// Definitions:<br>

> +/// * Cluster<br>

> +///   * An ordered list of input sections which are layed out as a unit. At the<br>

> +///     beginning of the algorithm each input section has its own cluster and<br>

> +///     the weight of the cluster is the sum of the weight of all incomming<br>

> +///     edges.<br>

> +/// * Call-Chain Clustering (Cｳ) Heuristic<br>

> +///   * Defines when and how clusters are combined. Pick the highest weight edge<br>

> +///     from cluster _u_ to _v_ then move the sections in _v_ and append them to<br>

> +///     _u_ unless the combined size would be larger than the page size.<br>

> +/// * Density<br>

> +///   * The weight of the cluster divided by the size of the cluster. This is a<br>

> +///     proxy for the ammount of execution time spent per byte of the cluster.<br>

> +///<br>

> +/// It does so given a call graph profile by the following:<br>

> +/// * Build a call graph from the profile<br>

> +/// * While there are unresolved edges<br>

> +///   * Find the edge with the highest weight<br>

> +///   * Check if merging the two clusters would create a cluster larger than the<br>

> +///     target page size<br>

> +///   * If not, contract that edge putting the callee after the caller<br>

> +/// * Sort remaining clusters by density<br>

> +///<br>

> +//===------------------------<wbr>------------------------------<wbr>----------------===//<br>

> +<br>

> +#include "CallGraphSort.h"<br>

</div></div>> +#include "OutputSections.h"<br>

<div><div class="gmail-h5">> +#include "SymbolTable.h"<br>

> +#include "Symbols.h"<br>

> +#include "Target.h"<br>

> +<br>

> +#include "llvm/Support/MathExtras.h"<br>

> +<br>

> +#include <queue><br>

> +#include <set><br>

> +#include <unordered_map><br>

> +<br>

> +using namespace llvm;<br>

> +using namespace lld;<br>

> +using namespace lld::elf;<br>

> +<br>

> +namespace {<br>

> +using NodeIndex = std::ptrdiff_t;<br>

> +using EdgeIndex = std::ptrdiff_t;<br>

> +<br>

> +struct Edge;<br>

> +<br>

> +struct EdgePriorityCmp {<br>

> +  std::vector<Edge> &Edges;<br>

> +  bool operator()(EdgeIndex A, EdgeIndex B) const;<br>

> +};<br>

> +<br>

> +using PriorityQueue = std::multiset<EdgeIndex, EdgePriorityCmp>;<br>

> +<br>

> +struct Node {<br>

> +  Node() = default;<br>

> +  Node(const InputSectionBase *IS);<br>

> +  std::vector<const InputSectionBase *> Sections;<br>

> +  std::vector<EdgeIndex> IncidentEdges;<br>

> +  int64_t Size = 0;<br>

> +  uint64_t Weight = 0;<br>

> +};<br>

> +<br>

> +struct Edge {<br>

> +  NodeIndex From;<br>

> +  NodeIndex To;<br>

> +  uint64_t Weight;<br>

> +  PriorityQueue::iterator PriorityPos;<br>

> +  bool operator==(const Edge Other) const;<br>

> +  bool operator<(const Edge Other) const;<br>

> +};<br>

> +<br>

> +bool EdgePriorityCmp::operator()(<wbr>EdgeIndex A, EdgeIndex B) const {<br>

> +  return Edges[A].Weight < Edges[B].Weight;<br>

> +}<br>

> +<br>

> +struct EdgeDenseMapInfo {<br>

> +  static Edge getEmptyKey() {<br>

> +    return {DenseMapInfo<NodeIndex>::<wbr>getEmptyKey(),<br>

> +            DenseMapInfo<NodeIndex>::<wbr>getEmptyKey(), 0,<br>

> +            PriorityQueue::iterator()};<br>

> +  }<br>

> +  static Edge getTombstoneKey() {<br>

> +    return {DenseMapInfo<NodeIndex>::<wbr>getTombstoneKey(),<br>

> +            DenseMapInfo<NodeIndex>::<wbr>getTombstoneKey(), 0,<br>

> +            PriorityQueue::iterator()};<br>

> +  }<br>

> +  static unsigned getHashValue(const Edge &Val) {<br>

> +    return hash_combine(DenseMapInfo<<wbr>NodeIndex>::getHashValue(Val.<wbr>From),<br>

> +                        DenseMapInfo<NodeIndex>::<wbr>getHashValue(Val.To));<br>

> +  }<br>

> +  static bool isEqual(const Edge &LHS, const Edge &RHS) { return LHS == RHS; }<br>

> +};<br>

> +<br>

> +class CallGraphSort {<br>

> +public:<br>

> +  CallGraphSort();<br>

> +<br>

</div></div>> +  DenseMap<const InputSectionBase *, int> run();<br>

<div><div class="gmail-h5">> +<br>

> +private:<br>

> +  std::vector<Node> Nodes;<br>

> +  std::vector<Edge> Edges;<br>

> +<br>

> +  PriorityQueue WorkQueue{EdgePriorityCmp{<wbr>Edges}};<br>

> +<br>

> +  bool killEdge(EdgeIndex EI);<br>

> +  void contractEdge(EdgeIndex CEI);<br>

> +  void generateClusters();<br>

> +};<br>

> +} // end anonymous namespace<br>

> +<br>

> +Node::Node(const InputSectionBase *IS) {<br>

> +  Sections.push_back(IS);<br>

> +  Size = IS->getSize();<br>

> +}<br>

> +<br>

> +bool Edge::operator==(const Edge Other) const {<br>

> +  return From == Other.From && To == Other.To;<br>

> +}<br>

> +<br>

> +bool Edge::operator<(const Edge Other) const {<br>

> +  if (From != Other.From)<br>

> +    return From < Other.From;<br>

> +  return To < Other.To;<br>

> +}<br>

> +<br>

</div></div><span class="gmail-">> +static bool isKnownNonreorderableSection(<wbr>const OutputSection *OS) {<br>

> +  return llvm::StringSwitch<bool>(OS-><wbr>Name)<br>

> +      .Cases(".init", ".fini", ".init_array.", ".fini_array.", ".ctors.",<br>

> +             ".dtors.", true)<br>

> +      .Default(false);<br>

</span><div><div class="gmail-h5">> +}<br>

> +<br>

> +// Take the edge list in Config->CallGraphProfile, resolve symbol names to<br>

> +// Symbols, and generate a graph between InputSections with the provided<br>

> +// weights.<br>

> +CallGraphSort::CallGraphSort(<wbr>) {<br>

> +  MapVector<std::pair<const InputSectionBase *, const InputSectionBase *>,<br>

> +            uint64_t> &Profile = Config->CallGraphProfile;<br>

> +  DenseMap<const InputSectionBase *, NodeIndex> SecToNode;<br>

> +  DenseMap<Edge, EdgeIndex, EdgeDenseMapInfo> EdgeMap;<br>

> +<br>

> +  auto GetOrCreateNode = [&](const InputSectionBase *IS) -> NodeIndex {<br>

> +    auto Res = SecToNode.insert(std::make_<wbr>pair(IS, Nodes.size()));<br>

> +    if (Res.second)<br>

> +      Nodes.emplace_back(IS);<br>

> +    return Res.first->second;<br>

> +  };<br>

> +<br>

> +  // Create the graph.<br>

> +  for (const auto &C : Profile) {<br>

> +    const InputSectionBase *FromSB = C.first.first;<br>

> +    const InputSectionBase *ToSB = C.first.second;<br>

> +    uint64_t Weight = C.second;<br>

> +<br>

> +    if (Weight == 0)<br>

> +      continue;<br>

> +<br>

> +    if (FromSB->getOutputSection() != ToSB->getOutputSection())<br>

> +      continue;<br>

</div></div>> +<br>

> +    if (isKnownNonreorderableSection(<wbr>FromSB->getOutputSection()))<br>

<div><div class="gmail-h5">> +      continue;<br>

> +<br>

> +    NodeIndex From = GetOrCreateNode(FromSB);<br>

> +    NodeIndex To = GetOrCreateNode(ToSB);<br>

> +<br>

> +    Nodes[To].Weight = SaturatingAdd(Nodes[To].<wbr>Weight, Weight);<br>

> +<br>

> +    if (From == To)<br>

> +      continue;<br>

> +<br>

> +    Edge E{From, To, Weight, WorkQueue.end()};<br>

> +<br>

> +    // Add or increment an edge<br>

> +    auto Res = EdgeMap.insert(std::make_pair(<wbr>E, Edges.size()));<br>

> +    EdgeIndex EI = Res.first->second;<br>

> +    if (Res.second) {<br>

> +      Edges.push_back(E);<br>

> +      Nodes[From].IncidentEdges.<wbr>push_back(EI);<br>

> +      Nodes[To].IncidentEdges.push_<wbr>back(EI);<br>

> +    } else<br>

> +      Edges[EI].Weight = SaturatingAdd(Edges[EI].<wbr>Weight, Weight);<br>

> +  }<br>

> +}<br>

> +<br>

> +/// Like std::unique, but calls Merge on equal values. Merge is allowed<br>

> +/// to modifiy its first argument.<br>

> +///<br>

> +/// Merge is a callable with signature<br>

> +///   Merge(*declval<ForwardIt>(), *declval<ForwardIt>())<br>

> +///<br>

> +/// Example:<br>

> +///<br>

> +///   int a[] = {1, 2, 2, 3, 4, 5, 5};<br>

> +///   auto end = merge_unique(std::begin(a), std::end(a),<br>

> +///     [](int a, int b) { return a == b; },<br>

> +///     [](int &a, int b) { a += b; });<br>

> +///<br>

> +///   for (auto i = a; i != end; ++i)<br>

> +///     std::cout << *i << " ";<br>

> +///<br>

> +///   -- 1 4 3 4 10<br>

> +template <class ForwardIt, class PredTy, class MergeTy><br>

> +static ForwardIt merge_unique(ForwardIt First, ForwardIt Last, PredTy Pred,<br>

> +                              MergeTy Merge) {<br>

> +  if (First == Last)<br>

> +    return Last;<br>

> +<br>

> +  ForwardIt I = First;<br>

> +  while (++I != Last) {<br>

> +    if (Pred(*First, *I))<br>

> +      Merge(*First, *I);<br>

> +    else if (++First != I)<br>

> +      *First = std::move(*I);<br>

> +  }<br>

> +  return ++First;<br>

> +}<br>

> +<br>

> +/// Marks an edge as head and removes it from the work queue.<br>

> +/// Returns true if the edge was killed, false if it was already dead.<br>

> +bool CallGraphSort::killEdge(<wbr>EdgeIndex EI) {<br>

> +  Edge &E = Edges[EI];<br>

> +  if (E.PriorityPos != WorkQueue.end()) {<br>

> +    WorkQueue.erase(E.PriorityPos)<wbr>;<br>

> +    E.PriorityPos = WorkQueue.end();<br>

> +    return true;<br>

> +  }<br>

> +  return false;<br>

> +}<br>

> +<br>

> +/// Remove edge \p CEI from the graph while simultaneously merging its two<br>

> +/// incident vertices u and v. This merges any duplicate edges between u and v<br>

> +/// by accumulating their weights.<br>

> +void CallGraphSort::contractEdge(<wbr>EdgeIndex CEI) {<br>

> +  // Make a copy of the edge as the original will be marked killed while being<br>

> +  // used.<br>

> +  Edge CE = Edges[CEI];<br>

> +  assert(CE.From != CE.To && "Got self edge!");<br>

> +  std::vector<EdgeIndex> &FE = Nodes[CE.From].IncidentEdges;<br>

> +<br>

> +  // Remove the self edge from From.<br>

> +  FE.erase(std::remove(FE.begin(<wbr>), FE.end(), CEI));<br>

> +  std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;<br>

> +<br>

> +  // Update all edges incident with To to reference From instead. Then if they<br>

> +  // aren't self edges add them to From.<br>

> +  for (EdgeIndex EI : TE) {<br>

> +    Edge &E = Edges[EI];<br>

> +    if (E.From == CE.To)<br>

> +      E.From = CE.From;<br>

> +    if (E.To == CE.To)<br>

> +      E.To = CE.From;<br>

> +    if (E.To == E.From) {<br>

> +      killEdge(EI);<br>

> +      continue;<br>

> +    }<br>

> +    FE.push_back(EI);<br>

> +  }<br>

> +<br>

> +  // Free memory. Otherwise we end up with N^2 memory usage.<br>

> +  std::vector<EdgeIndex>().swap(<wbr>TE);<br>

> +<br>

> +  if (FE.empty())<br>

> +    return;<br>

> +<br>

> +  // Sort edges so they can be merged. The stability of this sort doesn't matter<br>

> +  // as equal edges will be merged in an order independent manner.<br>

> +  std::sort(FE.begin(), FE.end(),<br>

> +            [&](EdgeIndex AI, EdgeIndex BI) { return Edges[AI] < Edges[BI]; });<br>

> +<br>

> +  FE.erase(merge_unique(FE.<wbr>begin(), FE.end(),<br>

> +                        [&](EdgeIndex AI, EdgeIndex BI) {<br>

> +                          return Edges[AI] == Edges[BI];<br>

> +                        },<br>

> +                        [&](EdgeIndex AI, EdgeIndex BI) {<br>

> +                          Edge &A = Edges[AI];<br>

> +                          Edge &B = Edges[BI];<br>

> +                          killEdge(BI);<br>

> +                          bool Restore = killEdge(AI);<br>

> +                          A.Weight = SaturatingAdd(A.Weight, B.Weight);<br>

> +                          if (Restore)<br>

> +                            A.PriorityPos = WorkQueue.insert(AI);<br>

> +                        }),<br>

> +           FE.end());<br>

> +}<br>

> +<br>

> +// Group InputSections into clusters using the Call-Chain Clustering heuristic<br>

> +// then sort the clusters by density.<br>

> +void CallGraphSort::<wbr>generateClusters() {<br>

> +  for (size_t I = 0; I < Edges.size(); ++I) {<br>

> +    Edges[I].PriorityPos = WorkQueue.insert(I);<br>

> +  }<br>

> +<br>

> +  // Collapse the graph.<br>

> +  while (!WorkQueue.empty()) {<br>

> +    PriorityQueue::const_iterator I = --WorkQueue.end();<br>

> +    EdgeIndex MaxI = *I;<br>

> +    const Edge MaxE = Edges[MaxI];<br>

> +    killEdge(MaxI);<br>

> +    // Merge the Nodes.<br>

> +    Node &From = Nodes[MaxE.From];<br>

> +    Node &To = Nodes[MaxE.To];<br>

> +    if (From.Size + To.Size > Target->PageSize)<br>

> +      continue;<br>

> +    contractEdge(MaxI);<br>

> +    From.Sections.insert(From.<wbr>Sections.end(), To.Sections.begin(),<br>

> +                         To.Sections.end());<br>

> +    From.Size += To.Size;<br>

> +    From.Weight = SaturatingAdd(From.Weight, To.Weight);<br>

> +    To.Sections.clear();<br>

> +    To.Size = 0;<br>

> +    To.Weight = 0;<br>

> +  }<br>

> +<br>

> +  // Remove empty or dead nodes.<br>

> +  Nodes.erase(std::remove_if(<wbr>Nodes.begin(), Nodes.end(),<br>

> +                             [](const Node &N) {<br>

> +                               return N.Size == 0 || N.Sections.empty();<br>

> +                             }),<br>

> +              Nodes.end());<br>

> +<br>

> +  // Sort by density. Invalidates all NodeIndexs.<br>

> +  std::sort(Nodes.begin(), Nodes.end(), [](const Node &A, const Node &B) {<br>

> +    return (APFloat(APFloat::IEEEdouble()<wbr>, A.Weight) /<br>

> +            APFloat(APFloat::IEEEdouble(), A.Size))<br>

> +               .compare(APFloat(APFloat::<wbr>IEEEdouble(), B.Weight) /<br>

> +                        APFloat(APFloat::IEEEdouble(), B.Size)) ==<br>

> +           APFloat::cmpLessThan;<br>

> +  });<br>

> +}<br>

> +<br>

> +DenseMap<const InputSectionBase *, int> CallGraphSort::run() {<br>

> +  generateClusters();<br>

> +<br>

> +  // Generate order.<br>

</div></div>> +  llvm::DenseMap<const InputSectionBase *, int> OrderMap;<br>

<span class="gmail-">> +  ssize_t CurOrder = 1;<br>

> +<br>

> +  for (const Node &N : Nodes)<br>

> +    for (const InputSectionBase *IS : N.Sections)<br>

> +      OrderMap[IS] = CurOrder++;<br>

> +<br>

> +  return OrderMap;<br>

> +}<br>

> +<br>

> +// Sort sections by the profile data provided by -callgraph-profile-file<br>

> +//<br>

> +// This first builds a call graph based on the profile data then iteratively<br>

> +// merges the hottest call edges as long as it would not create a cluster larger<br>

> +// than the page size. All clusters are then sorted by a density metric to<br>

> +// further improve locality.<br>

> +DenseMap<const InputSectionBase *, int> elf::<wbr>computeCallGraphProfileOrder() {<br>

> +  return CallGraphSort().run();<br>

> +}<br>

</span>> Index: ELF/CMakeLists.txt<br>

> ==============================<wbr>==============================<wbr>=======<br>

> --- ELF/CMakeLists.txt<br>

> +++ ELF/CMakeLists.txt<br>

> @@ -19,6 +19,7 @@<br>

<div class="gmail-HOEnZb"><div class="gmail-h5">>    Arch/SPARCV9.cpp<br>

>    Arch/X86.cpp<br>

>    Arch/X86_64.cpp<br>

> +  CallGraphSort.cpp<br>

>    Driver.cpp<br>

>    DriverUtils.cpp<br>

>    EhFrame.cpp<br>

</div></div></blockquote></div><br></div></div>