[PATCH] D36351: [lld][ELF] Add profile guided section layout

Michael Spencer via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 1 19:07:29 PDT 2017


On Thu, Oct 26, 2017 at 7:50 PM, Michael Spencer <bigcheesegs at gmail.com>
wrote:

> On Tue, Oct 24, 2017 at 9:30 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> Unfortunately, even with --full-shutdown, the result was the same.
>>
>>
> I tested this out and it worked for me with --full-shutdown while linking
> a trivial object file. One thing I forgot to mention is that you need to
> also compile the compiler-rt profile runtime at the same version as clang
> and llvm. It's probably falling back to an incompatible system version.
>
> - Michael Spencer
>

I've attached an updated patch based on r317141 that also includes fixes
for most review comments.

- Michael Spencer


>
>
>> On Tue, Oct 24, 2017 at 9:25 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Tue, Oct 24, 2017 at 9:23 PM, Michael Spencer <bigcheesegs at gmail.com>
>>> wrote:
>>>
>>>> On Tue, Oct 24, 2017 at 8:58 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> Sorry, as you pointed out, I made a mistake when I applied the
>>>>> patches. Now I succeeded to build it. But the output file of llvm-profdata
>>>>> seems too small. Is this correct?
>>>>>
>>>>> This is what I did.
>>>>>
>>>>> $ CC=`which clang` CXX=`which clang++` /ssd/cmake/bin/cmake -GNinja
>>>>> -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_ENABLE_PROJECTS='clang;lld'
>>>>> -DCMAKE_C{,XX}_FLAGS=-fprofile-instr-generate ../llvm-project/llvm/
>>>>>
>>>>> $ ninja clang lld
>>>>>
>>>>> $ bin/ld.lld <options-to-link-clang>
>>>>>
>>>>> $ bin/llvm-profdata merge default.profraw -o default.profdata
>>>>>
>>>>> $ ls -l default.*
>>>>> -rw-r----- 1 ruiu eng     560 Oct 24 20:55 default.profdata
>>>>> -rw-r----- 1 ruiu eng 2151240 Oct 24 20:55 default.profraw
>>>>>
>>>>> $ bin/llvm-profdata show default.profraw
>>>>> error: default.profraw: Empty raw profile file
>>>>>
>>>>
>>>> It could be that lld is using quick exit and not properly flushing the
>>>> profile data to disk. I'll test out this configuration.
>>>>
>>>
>>> Ah, it's likely. I'll try again with `--full-shutdown` option.
>>>
>>>
>>>> - Michael Spencer
>>>>
>>>>
>>>>>
>>>>> On Mon, Oct 23, 2017 at 7:52 PM, Michael Spencer <
>>>>> bigcheesegs at gmail.com> wrote:
>>>>>
>>>>>> On Mon, Oct 23, 2017 at 6:53 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>
>>>>>>> It didn't pass cmake because lld/ELF/CMakeLists.txt lacked
>>>>>>> "CallGraphSort.cpp", so I added that to the CMakeLists.txt.
>>>>>>>
>>>>>>> It still didn't compile because `Config::CallGraphProfile` doesn't
>>>>>>> exist, so I added that boolean variable.
>>>>>>>
>>>>>>> /ssd/llvm-project/lld/ELF/CallGraphSort.cpp:274:29: error: no
>>>>>>> member named 'CallGraphProfile' in 'lld::elf::Configuration'
>>>>>>>   CallGraphSort CGS(Config->CallGraphProfile);
>>>>>>>                       ~~~~~~  ^
>>>>>>>
>>>>>>> Now, I'm seeing the following errors. How can I fix them?
>>>>>>>
>>>>>>> /ssd/llvm-project/lld/ELF/CallGraphSort.cpp:274:17: error: no
>>>>>>> matching constructor for initialization of '(anonymous
>>>>>>> namespace)::CallGraphSort'
>>>>>>>   CallGraphSort CGS(Config->CallGraphProfile);
>>>>>>>                 ^   ~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>> /ssd/llvm-project/lld/ELF/CallGraphSort.cpp:43:7: note: candidate
>>>>>>> constructor (the implicit copy constructor) not viable: no known conversion
>>>>>>> from 'bool' to 'const (anonymous namespace)::CallGraphSort' for 1st argument
>>>>>>> class CallGraphSort {
>>>>>>>       ^
>>>>>>> /ssd/llvm-project/lld/ELF/CallGraphSort.cpp:43:7: note: candidate
>>>>>>> constructor (the implicit move constructor) not viable: no known conversion
>>>>>>> from 'bool' to '(anonymous namespace)::CallGraphSort' for 1st argument
>>>>>>> class CallGraphSort {
>>>>>>>       ^
>>>>>>> /ssd/llvm-project/lld/ELF/CallGraphSort.cpp:115:16: note: candidate
>>>>>>> constructor not viable: no known conversion from 'bool' to
>>>>>>> 'DenseMap<std::pair<const SymbolBody *, const SymbolBody *>, uint64_t> &'
>>>>>>> (aka 'DenseMap<pair<const lld::elf::SymbolBody *, const
>>>>>>> lld::elf::SymbolBody *>, unsigned long> &') for 1st argument
>>>>>>> CallGraphSort::CallGraphSort(
>>>>>>>                ^
>>>>>>>
>>>>>>>
>>>>>> Are you sure you applied the patch correctly? I just checked the diff
>>>>>> and it has all that.
>>>>>>
>>>>>> diff --git a/ELF/CMakeLists.txt b/ELF/CMakeLists.txt
>>>>>> index 205702975..3b3c388d2 100644
>>>>>> --- a/ELF/CMakeLists.txt
>>>>>> +++ b/ELF/CMakeLists.txt
>>>>>> @@ -18,6 +18,7 @@ add_lld_library(lldELF
>>>>>>    Arch/SPARCV9.cpp
>>>>>>    Arch/X86.cpp
>>>>>>    Arch/X86_64.cpp
>>>>>> +  CallGraphSort.cpp
>>>>>>    Driver.cpp
>>>>>>    DriverUtils.cpp
>>>>>>    EhFrame.cpp
>>>>>>
>>>>>> It also has the change that adds CallGraphProfile to Config.h.
>>>>>>
>>>>>> - Michael Spencer
>>>>>>
>>>>>>
>>>>>>> On Mon, Oct 23, 2017 at 5:02 PM, Michael Spencer <
>>>>>>> bigcheesegs at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> - Michael Spencer
>>>>>>>>
>>>>>>>> On Tue, Oct 3, 2017 at 6:43 PM, Rui Ueyama via Phabricator <
>>>>>>>> reviews at reviews.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> ruiu added a comment.
>>>>>>>>>
>>>>>>>>> Could you send me a patch to produce a call graph file so that I
>>>>>>>>> can try this patch on my machine?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry for the delay. I've attached the llvm and lld patches that
>>>>>>>> implement a full testable version of the feature.
>>>>>>>>
>>>>>>>> To use:
>>>>>>>>
>>>>>>>> -1) Have an elf system with working clang instrumentation based
>>>>>>>> profiling
>>>>>>>> 0) Compile clang and lld with the supplied patch
>>>>>>>> 1) Compile the code with `-fprofile-instr-generate` and link with
>>>>>>>> any linker
>>>>>>>> 2) Run the program on a representative sample
>>>>>>>> 3) `$ llvm-profdata merge default.profraw -o default.profdata`
>>>>>>>> 4) Compile the code again with `-fprofile-instr-use=default.profdata
>>>>>>>> -ffunction-sections -fuse-ld=lld`
>>>>>>>>
>>>>>>>> The output of #4 is the program with sections ordered by profile
>>>>>>>> data. You can add -Wl,-no-call-graph-profile-sort to disable
>>>>>>>> sorting to measure the difference.
>>>>>>>>
>>>>>>>> `llvm-readobj -elf-cg-profile` will dump the cg profile section.
>>>>>>>>
>>>>>>>> This also works with LTO.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:99-101
>>>>>>>>> +  if (To != Other.To)
>>>>>>>>> +    return To < Other.To;
>>>>>>>>> +  return false;
>>>>>>>>> ----------------
>>>>>>>>> You can just return `To < Other.To`.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:127
>>>>>>>>> +  // Create the graph.
>>>>>>>>> +  for (const auto &C : Profile) {
>>>>>>>>> +    if (C.second == 0)
>>>>>>>>> ----------------
>>>>>>>>> This loop is a bit too dense. It cannot be understood without
>>>>>>>>> reading each line carefully as I don't understand the whole picture. Please
>>>>>>>>> insert a blank line between code blocks. Adding more comment would help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:128
>>>>>>>>> +  for (const auto &C : Profile) {
>>>>>>>>> +    if (C.second == 0)
>>>>>>>>> +      continue;
>>>>>>>>> ----------------
>>>>>>>>> Please define local variables for `C.first.first`,
>>>>>>>>> `C.first.second` and `C.second` so that they are accessed through
>>>>>>>>> meaningful names.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:130-135
>>>>>>>>> +    auto FromDR = dyn_cast_or_null<DefinedRegula
>>>>>>>>> r>(Symtab->find(C.first.first));
>>>>>>>>> +    auto ToDR = dyn_cast_or_null<DefinedRegula
>>>>>>>>> r>(Symtab->find(C.first.second));
>>>>>>>>> +    if (!FromDR || !ToDR)
>>>>>>>>> +      continue;
>>>>>>>>> +    auto FromSB = dyn_cast_or_null<const
>>>>>>>>> InputSectionBase>(FromDR->Section);
>>>>>>>>> +    auto ToSB = dyn_cast_or_null<const
>>>>>>>>> InputSectionBase>(ToDR->Section);
>>>>>>>>> ----------------
>>>>>>>>> auto -> auto *
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:147
>>>>>>>>> +      Nodes[To].IncidentEdges.push_back(EI);
>>>>>>>>> +    } else
>>>>>>>>> +      Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight,
>>>>>>>>> C.second);
>>>>>>>>> ----------------
>>>>>>>>> nit: add {}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:153
>>>>>>>>> +
>>>>>>>>> +void CallGraphSort::contractEdge(EdgeIndex CEI) {
>>>>>>>>> +  // Make a copy of the edge as the original will be marked
>>>>>>>>> killed while being
>>>>>>>>> ----------------
>>>>>>>>> Please add a function comment as to what this function is intended
>>>>>>>>> to do. I do not understand this function because I don't get a whole
>>>>>>>>> picture.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:158-163
>>>>>>>>> +  // Remove the self edge from From.
>>>>>>>>> +  FE.erase(std::remove(FE.begin(), FE.end(), CEI));
>>>>>>>>> +  std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;
>>>>>>>>> +  // Update all edges incident with To to reference From instead.
>>>>>>>>> Then if they
>>>>>>>>> +  // aren't self edges add them to From.
>>>>>>>>> +  for (EdgeIndex EI : TE) {
>>>>>>>>> ----------------
>>>>>>>>> Add blank lines before comments.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:165-166
>>>>>>>>> +    Edge &E = Edges[EI];
>>>>>>>>> +    // E.From = E.From == CE.To ? CE.From : E.From;
>>>>>>>>> +    // E.To = E.To == CE.To ? CE.From : E.To;
>>>>>>>>> +    if (E.From == CE.To)
>>>>>>>>> ----------------
>>>>>>>>> Please remove debug code.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:178-180
>>>>>>>>> +  // Free memory.
>>>>>>>>> +  std::vector<EdgeIndex>().swap(TE);
>>>>>>>>> +
>>>>>>>>> ----------------
>>>>>>>>> This looks odd. Why do you need to do this? I think you can just
>>>>>>>>> leave it alone.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:207-208
>>>>>>>>> +
>>>>>>>>> +// Group InputSections into clusters using the Call-Chain
>>>>>>>>> Clustering heuristic
>>>>>>>>> +// then sort the clusters by density.
>>>>>>>>> +void CallGraphSort::generateClusters() {
>>>>>>>>> ----------------
>>>>>>>>> This might be understood for those who read the paper, but I don't
>>>>>>>>> think that is enough. Please write more comment as to what are clusters,
>>>>>>>>> what is density, and what is the heuristic.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/CallGraphSort.cpp:272-273
>>>>>>>>> +DenseMap<const InputSectionBase *, int>
>>>>>>>>> elf::computeCallGraphProfileOrder() {
>>>>>>>>> +  CallGraphSort CGS(Config->CallGraphProfile);
>>>>>>>>> +  return CGS.run();
>>>>>>>>> +}
>>>>>>>>> ----------------
>>>>>>>>> You can do this in one line without defining a local variable.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/Writer.cpp:926-928
>>>>>>>>> +    for (BaseCommand *Base : Script->Opt.Commands)
>>>>>>>>> +      if (auto *OS = dyn_cast<OutputSection>(Base))
>>>>>>>>> +        if (OS->Name == ".text") {
>>>>>>>>> ----------------
>>>>>>>>> I'd factor this code out as `OutputSection
>>>>>>>>> *findOutputSection(StringRef Name)`.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://reviews.llvm.org/D36351
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171101/5b3e344a/attachment.html>
-------------- next part --------------
diff --git a/include/llvm/InitializePasses.h b/include/llvm/InitializePasses.h
index c3ad8fe41af..cde58e78387 100644
--- a/include/llvm/InitializePasses.h
+++ b/include/llvm/InitializePasses.h
@@ -83,6 +83,7 @@ void initializeBreakCriticalEdgesPass(PassRegistry&);
 void initializeCFGOnlyPrinterLegacyPassPass(PassRegistry&);
 void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);
 void initializeCFGPrinterLegacyPassPass(PassRegistry&);
+void initializeCFGProfilePassPass(PassRegistry&);
 void initializeCFGSimplifyPassPass(PassRegistry&);
 void initializeCFGViewerLegacyPassPass(PassRegistry&);
 void initializeCFLAndersAAWrapperPassPass(PassRegistry&);
diff --git a/include/llvm/LinkAllPasses.h b/include/llvm/LinkAllPasses.h
index 765e63926da..aeba78be086 100644
--- a/include/llvm/LinkAllPasses.h
+++ b/include/llvm/LinkAllPasses.h
@@ -75,6 +75,7 @@ namespace {
       (void) llvm::createCallGraphDOTPrinterPass();
       (void) llvm::createCallGraphViewerPass();
       (void) llvm::createCFGSimplificationPass();
+      (void) llvm::createCFGProfilePass();
       (void) llvm::createCFLAndersAAWrapperPass();
       (void) llvm::createCFLSteensAAWrapperPass();
       (void) llvm::createStructurizeCFGPass();
diff --git a/include/llvm/MC/MCAssembler.h b/include/llvm/MC/MCAssembler.h
index 1ce6b09355d..30912377da6 100644
--- a/include/llvm/MC/MCAssembler.h
+++ b/include/llvm/MC/MCAssembler.h
@@ -393,6 +393,13 @@ public:
   const MCLOHContainer &getLOHContainer() const {
     return const_cast<MCAssembler *>(this)->getLOHContainer();
   }
+
+  struct CGProfileEntry {
+    const MCSymbol *From;
+    const MCSymbol *To;
+    uint64_t Count;
+  };
+  std::vector<CGProfileEntry> CGProfile;
   /// @}
   /// \name Backend Data Access
   /// @{
diff --git a/include/llvm/MC/MCELFStreamer.h b/include/llvm/MC/MCELFStreamer.h
index c5b66a163c8..3402980c13b 100644
--- a/include/llvm/MC/MCELFStreamer.h
+++ b/include/llvm/MC/MCELFStreamer.h
@@ -66,6 +66,9 @@ public:
 
   void EmitValueToAlignment(unsigned, int64_t, unsigned, unsigned) override;
 
+  void emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+                          uint64_t Count) override;
+
   void FinishImpl() override;
 
   void EmitBundleAlignMode(unsigned AlignPow2) override;
diff --git a/include/llvm/MC/MCStreamer.h b/include/llvm/MC/MCStreamer.h
index 58003d7d596..ec86dc39741 100644
--- a/include/llvm/MC/MCStreamer.h
+++ b/include/llvm/MC/MCStreamer.h
@@ -848,6 +848,9 @@ public:
                                 SMLoc Loc = SMLoc());
   virtual void EmitWinEHHandlerData(SMLoc Loc = SMLoc());
 
+  virtual void emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+                                  uint64_t Count);
+
   /// Get the .pdata section used for the given section. Typically the given
   /// section is either the main .text section or some other COMDAT .text
   /// section, but it may be any section containing code.
diff --git a/include/llvm/Object/ELFTypes.h b/include/llvm/Object/ELFTypes.h
index 83b688548fd..905916e910c 100644
--- a/include/llvm/Object/ELFTypes.h
+++ b/include/llvm/Object/ELFTypes.h
@@ -40,6 +40,7 @@ template <class ELFT> struct Elf_Versym_Impl;
 template <class ELFT> struct Elf_Hash_Impl;
 template <class ELFT> struct Elf_GnuHash_Impl;
 template <class ELFT> struct Elf_Chdr_Impl;
+template <class ELFT> struct Elf_CGProfile_Impl;
 
 template <endianness E, bool Is64> struct ELFType {
 private:
@@ -66,6 +67,7 @@ public:
   using Hash = Elf_Hash_Impl<ELFType<E, Is64>>;
   using GnuHash = Elf_GnuHash_Impl<ELFType<E, Is64>>;
   using Chdr = Elf_Chdr_Impl<ELFType<E, Is64>>;
+  using CGProfile = Elf_CGProfile_Impl<ELFType<E, Is64>>;
   using DynRange = ArrayRef<Dyn>;
   using ShdrRange = ArrayRef<Shdr>;
   using SymRange = ArrayRef<Sym>;
@@ -590,6 +592,14 @@ struct Elf_Chdr_Impl<ELFType<TargetEndianness, true>> {
   Elf_Xword ch_addralign;
 };
 
+template <class ELFT>
+struct Elf_CGProfile_Impl {
+  LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)
+  Elf_Word cgp_from;
+  Elf_Word cgp_to;
+  Elf_Xword cgp_weight;
+};
+
 // MIPS .reginfo section
 template <class ELFT>
 struct Elf_Mips_RegInfo;
diff --git a/include/llvm/Transforms/Instrumentation.h b/include/llvm/Transforms/Instrumentation.h
index fe458e7be06..51374b7cb5a 100644
--- a/include/llvm/Transforms/Instrumentation.h
+++ b/include/llvm/Transforms/Instrumentation.h
@@ -206,6 +206,8 @@ inline ModulePass *createDataFlowSanitizerPassForJIT(
 // checking on loads, stores, and other memory intrinsics.
 FunctionPass *createBoundsCheckingPass();
 
+ModulePass *createCFGProfilePass();
+
 /// \brief Calculate what to divide by to scale counts.
 ///
 /// Given the maximum count, calculate a divisor that will scale all the
diff --git a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
index e45cdee4368..130b62cb3cd 100644
--- a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+++ b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
@@ -97,16 +97,60 @@ void TargetLoweringObjectFileELF::emitModuleMetadata(
   StringRef Section;
 
   GetObjCImageInfo(M, Version, Flags, Section);
-  if (Section.empty())
-    return;
+  if (!Section.empty()) {
+    auto &C = getContext();
+    auto *S = C.getELFSection(Section, ELF::SHT_PROGBITS, ELF::SHF_ALLOC);
+    Streamer.SwitchSection(S);
+    Streamer.EmitLabel(C.getOrCreateSymbol(StringRef("OBJC_IMAGE_INFO")));
+    Streamer.EmitIntValue(Version, 4);
+    Streamer.EmitIntValue(Flags, 4);
+    Streamer.AddBlankLine();
+  }
 
-  auto &C = getContext();
-  auto *S = C.getELFSection(Section, ELF::SHT_PROGBITS, ELF::SHF_ALLOC);
-  Streamer.SwitchSection(S);
-  Streamer.EmitLabel(C.getOrCreateSymbol(StringRef("OBJC_IMAGE_INFO")));
-  Streamer.EmitIntValue(Version, 4);
-  Streamer.EmitIntValue(Flags, 4);
-  Streamer.AddBlankLine();
+  SmallVector<Module::ModuleFlagEntry, 8> ModuleFlags;
+  M.getModuleFlagsMetadata(ModuleFlags);
+
+  MDNode *CFGProfile = nullptr;
+
+  for (const auto &MFE : ModuleFlags) {
+    StringRef Key = MFE.Key->getString();
+    if (Key == "CFG Profile") {
+      CFGProfile = cast<MDNode>(MFE.Val);
+      break;
+    }
+  }
+
+  if (!CFGProfile)
+    return;
+  /*MCSectionELF *Sec =
+      getContext().getELFSection(".note.llvm.callgraph", ELF::SHT_NOTE, 0);
+  Streamer.SwitchSection(Sec);
+  SmallString<256> Out;
+  for (const auto &Edge : CFGProfile->operands()) {
+    raw_svector_ostream O(Out);
+    MDNode *E = cast<MDNode>(Edge);
+    O << cast<MDString>(E->getOperand(0))->getString() << " "
+      << cast<MDString>(E->getOperand(1))->getString() << " "
+      << cast<ConstantAsMetadata>(E->getOperand(2))
+             ->getValue()
+             ->getUniqueInteger()
+             .getZExtValue()
+      << "\n";
+    Streamer.EmitBytes(O.str());
+    Out.clear();
+  }*/
+  for (const auto &Edge : CFGProfile->operands()) {
+    MDNode *E = cast<MDNode>(Edge);
+    const MCSymbol *From = Streamer.getContext().getOrCreateSymbol(
+        cast<MDString>(E->getOperand(0))->getString());
+    const MCSymbol *To = Streamer.getContext().getOrCreateSymbol(
+        cast<MDString>(E->getOperand(1))->getString());
+    uint64_t Count = cast<ConstantAsMetadata>(E->getOperand(2))
+                         ->getValue()
+                         ->getUniqueInteger()
+                         .getZExtValue();
+    Streamer.emitCGProfileEntry(From, To, Count);
+  }
 }
 
 MCSymbol *TargetLoweringObjectFileELF::getCFIPersonalitySymbol(
diff --git a/lib/MC/ELFObjectWriter.cpp b/lib/MC/ELFObjectWriter.cpp
index e11eaaa3060..b54fc1693aa 100644
--- a/lib/MC/ELFObjectWriter.cpp
+++ b/lib/MC/ELFObjectWriter.cpp
@@ -1299,6 +1299,13 @@ void ELFObjectWriter::writeObject(MCAssembler &Asm,
     }
   }
 
+  MCSectionELF *CGProfileSection = nullptr;
+  if (!Asm.CGProfile.empty()) {
+    CGProfileSection =
+      Ctx.getELFSection(".note.llvm.cgprofile", ELF::SHT_NOTE, 0, 16, "");
+    SectionIndexMap[CGProfileSection] = addToSectionTable(CGProfileSection);
+  }
+
   for (MCSectionELF *Group : Groups) {
     align(Group->getAlignment());
 
@@ -1333,6 +1340,17 @@ void ELFObjectWriter::writeObject(MCAssembler &Asm,
     SectionOffsets[RelSection] = std::make_pair(SecStart, SecEnd);
   }
 
+  if (CGProfileSection) {
+    uint64_t SecStart = getStream().tell();
+    for (const MCAssembler::CGProfileEntry &CGPE : Asm.CGProfile) {
+      write32(CGPE.From->getIndex());
+      write32(CGPE.To->getIndex());
+      write64(CGPE.Count);
+    }
+    uint64_t SecEnd = getStream().tell();
+    SectionOffsets[CGProfileSection] = std::make_pair(SecStart, SecEnd);
+  }
+
   {
     uint64_t SecStart = getStream().tell();
     const MCSectionELF *Sec = createStringTable(Ctx);
diff --git a/lib/MC/MCAsmStreamer.cpp b/lib/MC/MCAsmStreamer.cpp
index f48ae84950e..ba94f70a462 100644
--- a/lib/MC/MCAsmStreamer.cpp
+++ b/lib/MC/MCAsmStreamer.cpp
@@ -290,6 +290,9 @@ public:
                         SMLoc Loc) override;
   void EmitWinEHHandlerData(SMLoc Loc) override;
 
+  void emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+                          uint64_t Count) override;
+
   void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
                        bool PrintSchedInfo) override;
 
@@ -1548,6 +1551,16 @@ void MCAsmStreamer::EmitWinCFIEndProlog(SMLoc Loc) {
   EmitEOL();
 }
 
+void MCAsmStreamer::emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+                                       uint64_t Count) {
+  OS << "\t.cg_profile ";
+  From->print(OS, MAI);
+  OS << ", ";
+  To->print(OS, MAI);
+  OS << ", " << Count;
+  EmitEOL();
+}
+
 void MCAsmStreamer::AddEncodingComment(const MCInst &Inst,
                                        const MCSubtargetInfo &STI,
                                        bool PrintSchedInfo) {
diff --git a/lib/MC/MCELFStreamer.cpp b/lib/MC/MCELFStreamer.cpp
index 366125962a5..292e365b058 100644
--- a/lib/MC/MCELFStreamer.cpp
+++ b/lib/MC/MCELFStreamer.cpp
@@ -365,6 +365,11 @@ void MCELFStreamer::EmitValueToAlignment(unsigned ByteAlignment,
                                          ValueSize, MaxBytesToEmit);
 }
 
+void MCELFStreamer::emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+                                       uint64_t Count) {
+  getAssembler().CGProfile.push_back({From, To, Count});
+}
+
 void MCELFStreamer::EmitIdent(StringRef IdentString) {
   MCSection *Comment = getAssembler().getContext().getELFSection(
       ".comment", ELF::SHT_PROGBITS, ELF::SHF_MERGE | ELF::SHF_STRINGS, 1, "");
diff --git a/lib/MC/MCParser/ELFAsmParser.cpp b/lib/MC/MCParser/ELFAsmParser.cpp
index 38720c23ff2..3a62a49968a 100644
--- a/lib/MC/MCParser/ELFAsmParser.cpp
+++ b/lib/MC/MCParser/ELFAsmParser.cpp
@@ -85,6 +85,7 @@ public:
     addDirectiveHandler<
       &ELFAsmParser::ParseDirectiveSymbolAttribute>(".hidden");
     addDirectiveHandler<&ELFAsmParser::ParseDirectiveSubsection>(".subsection");
+    addDirectiveHandler<&ELFAsmParser::ParseDirectiveCGProfile>(".cg_profile");
   }
 
   // FIXME: Part of this logic is duplicated in the MCELFStreamer. What is
@@ -149,6 +150,7 @@ public:
   bool ParseDirectiveWeakref(StringRef, SMLoc);
   bool ParseDirectiveSymbolAttribute(StringRef, SMLoc);
   bool ParseDirectiveSubsection(StringRef, SMLoc);
+  bool ParseDirectiveCGProfile(StringRef, SMLoc);
 
 private:
   bool ParseSectionName(StringRef &SectionName);
@@ -838,6 +840,40 @@ bool ELFAsmParser::ParseDirectiveSubsection(StringRef, SMLoc) {
   return false;
 }
 
+/// ParseDirectiveCGProfile
+///  ::= .cg_profile identifier, identifier, <number>
+bool ELFAsmParser::ParseDirectiveCGProfile(StringRef, SMLoc) {
+  StringRef From;
+  if (getParser().parseIdentifier(From))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::Comma))
+    return TokError("expected a comma");
+  Lex();
+
+  StringRef To;
+  if (getParser().parseIdentifier(To))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::Comma))
+    return TokError("expected a comma");
+  Lex();
+
+  int64_t Count;
+  if (getParser().parseIntToken(
+          Count, "expected integer count in '.cg_profile' directive"))
+    return true;
+
+  if (getLexer().isNot(AsmToken::EndOfStatement))
+    return TokError("unexpected token in directive");
+
+  MCSymbol *FromSym = getContext().getOrCreateSymbol(From);
+  MCSymbol *ToSym = getContext().getOrCreateSymbol(To);
+
+  getStreamer().emitCGProfileEntry(FromSym, ToSym, Count);
+  return false;
+}
+
 namespace llvm {
 
 MCAsmParserExtension *createELFAsmParser() {
diff --git a/lib/MC/MCStreamer.cpp b/lib/MC/MCStreamer.cpp
index 4067df0eaf5..e6a00fb46e4 100644
--- a/lib/MC/MCStreamer.cpp
+++ b/lib/MC/MCStreamer.cpp
@@ -639,6 +639,10 @@ void MCStreamer::EmitWinEHHandlerData(SMLoc Loc) {
     getContext().reportError(Loc, "Chained unwind areas can't have handlers!");
 }
 
+void MCStreamer::emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+                                    uint64_t Count) {
+}
+
 static MCSection *getWinCFISection(MCContext &Context, unsigned *NextWinCFIID,
                                    MCSection *MainCFISec,
                                    const MCSection *TextSec) {
diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
index 828eb5eee29..a6b4523485c 100644
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -665,6 +665,8 @@ void PassManagerBuilder::populateModulePassManager(
     MPM.add(createConstantMergePass());     // Merge dup global constants
   }
 
+  MPM.add(createCFGProfilePass());
+
   if (MergeFunctions)
     MPM.add(createMergeFunctionsPass());
 
diff --git a/lib/Transforms/Instrumentation/CFGProfile.cpp b/lib/Transforms/Instrumentation/CFGProfile.cpp
new file mode 100644
index 00000000000..6aa76d35a24
--- /dev/null
+++ b/lib/Transforms/Instrumentation/CFGProfile.cpp
@@ -0,0 +1,103 @@
+//===-- CFGProfile.cpp ----------------------------------------------------===//
+//
+//                      The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Analysis/BlockFrequencyInfo.h"
+#include "llvm/Analysis/BranchProbabilityInfo.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Instrumentation.h"
+
+#include <array>
+
+using namespace llvm;
+
+class CFGProfilePass : public ModulePass {
+public:
+  static char ID;
+
+  CFGProfilePass() : ModulePass(ID) {
+    initializeCFGProfilePassPass(
+      *PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override { return "CFGProfilePass"; }
+
+private:
+  bool runOnModule(Module &M) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.addRequired<BlockFrequencyInfoWrapperPass>();
+    AU.addRequired<BranchProbabilityInfoWrapperPass>();
+  }
+};
+
+bool CFGProfilePass::runOnModule(Module &M) {
+  if (skipModule(M))
+    return false;
+
+  llvm::DenseMap<std::pair<StringRef, StringRef>, uint64_t> Counts;
+
+  for (auto &F : M) {
+    if (F.isDeclaration())
+      continue;
+    getAnalysis<BranchProbabilityInfoWrapperPass>(F).getBPI();
+    auto &BFI = getAnalysis<BlockFrequencyInfoWrapperPass>(F).getBFI();
+    for (auto &BB : F) {
+      Optional<uint64_t> BBCount = BFI.getBlockProfileCount(&BB);
+      if (!BBCount)
+        continue;
+      for (auto &I : BB) {
+        auto *CI = dyn_cast<CallInst>(&I);
+        if (!CI)
+          continue;
+        Function *CalledF = CI->getCalledFunction();
+        if (!CalledF || CalledF->isIntrinsic())
+          continue;
+
+        uint64_t &Count =
+            Counts[std::make_pair(F.getName(), CalledF->getName())];
+        Count = SaturatingAdd(Count, *BBCount);
+      }
+    }
+  }
+
+  if (Counts.empty())
+    return false;
+
+  LLVMContext &Context = M.getContext();
+  MDBuilder MDB(Context);
+  std::vector<Metadata *> Nodes;
+
+  for (auto E : Counts) {
+    SmallVector<Metadata *, 3> Vals;
+    Vals.push_back(MDB.createString(E.first.first));
+    Vals.push_back(MDB.createString(E.first.second));
+    Vals.push_back(MDB.createConstant(
+        ConstantInt::get(Type::getInt64Ty(Context), E.second)));
+    Nodes.push_back(MDNode::get(Context, Vals));
+  }
+
+  M.addModuleFlag(Module::Append, "CFG Profile", MDNode::get(Context, Nodes));
+
+  return true;
+}
+
+char CFGProfilePass::ID = 0;
+INITIALIZE_PASS_BEGIN(CFGProfilePass, "cfg-profile",
+  "Generate profile information from the CFG.", false, false)
+  INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
+  INITIALIZE_PASS_DEPENDENCY(BranchProbabilityInfoWrapperPass)
+  INITIALIZE_PASS_END(CFGProfilePass, "cfg-profile",
+    "Generate profile information from the CFG.", false, false)
+
+ModulePass *llvm::createCFGProfilePass() {
+  return new CFGProfilePass();
+}
diff --git a/lib/Transforms/Instrumentation/CMakeLists.txt b/lib/Transforms/Instrumentation/CMakeLists.txt
index f2806e278e6..9b33edf0631 100644
--- a/lib/Transforms/Instrumentation/CMakeLists.txt
+++ b/lib/Transforms/Instrumentation/CMakeLists.txt
@@ -1,6 +1,7 @@
 add_llvm_library(LLVMInstrumentation
   AddressSanitizer.cpp
   BoundsChecking.cpp
+  CFGProfile.cpp
   DataFlowSanitizer.cpp
   GCOVProfiling.cpp
   MemorySanitizer.cpp
diff --git a/lib/Transforms/Instrumentation/Instrumentation.cpp b/lib/Transforms/Instrumentation/Instrumentation.cpp
index 7bb62d2c845..d147a521683 100644
--- a/lib/Transforms/Instrumentation/Instrumentation.cpp
+++ b/lib/Transforms/Instrumentation/Instrumentation.cpp
@@ -60,6 +60,7 @@ void llvm::initializeInstrumentation(PassRegistry &Registry) {
   initializeAddressSanitizerModulePass(Registry);
   initializeBoundsCheckingPass(Registry);
   initializeGCOVProfilerLegacyPassPass(Registry);
+  initializeCFGProfilePassPass(Registry);
   initializePGOInstrumentationGenLegacyPassPass(Registry);
   initializePGOInstrumentationUseLegacyPassPass(Registry);
   initializePGOIndirectCallPromotionLegacyPassPass(Registry);
diff --git a/tools/llvm-readobj/ELFDumper.cpp b/tools/llvm-readobj/ELFDumper.cpp
index 9f56a28d934..35f0dfbc006 100644
--- a/tools/llvm-readobj/ELFDumper.cpp
+++ b/tools/llvm-readobj/ELFDumper.cpp
@@ -97,6 +97,7 @@ using namespace ELF;
   using Elf_Vernaux = typename ELFO::Elf_Vernaux;                              \
   using Elf_Verdef = typename ELFO::Elf_Verdef;                                \
   using Elf_Verdaux = typename ELFO::Elf_Verdaux;                              \
+  using Elf_CGProfile = typename ELFT::CGProfile;                              \
   using uintX_t = typename ELFO::uintX_t;
 
 namespace {
@@ -161,6 +162,8 @@ public:
 
   void printHashHistogram() override;
 
+  void printCGProfile() override;
+
   void printNotes() override;
 
 private:
@@ -205,6 +208,7 @@ private:
   const Elf_Hash *HashTable = nullptr;
   const Elf_GnuHash *GnuHashTable = nullptr;
   const Elf_Shdr *DotSymtabSec = nullptr;
+  const Elf_Shdr *DotCGProfileSec = nullptr;
   StringRef DynSymtabName;
   ArrayRef<Elf_Word> ShndxTable;
 
@@ -249,9 +253,11 @@ public:
   Elf_Rela_Range dyn_relas() const;
   std::string getFullSymbolName(const Elf_Sym *Symbol, StringRef StrTable,
                                 bool IsDynamic) const;
+  StringRef getStaticSymbolName(uint32_t Index) const;
 
   void printSymbolsHelper(bool IsDynamic) const;
   const Elf_Shdr *getDotSymtabSec() const { return DotSymtabSec; }
+  const Elf_Shdr *getDotCGProfileSec() const { return DotCGProfileSec; }
   ArrayRef<Elf_Word> getShndxTable() const { return ShndxTable; }
   StringRef getDynamicStringTable() const { return DynamicStringTable; }
   const DynRegionInfo &getDynRelRegion() const { return DynRelRegion; }
@@ -309,6 +315,7 @@ public:
                            bool IsDynamic) = 0;
   virtual void printProgramHeaders(const ELFFile<ELFT> *Obj) = 0;
   virtual void printHashHistogram(const ELFFile<ELFT> *Obj) = 0;
+  virtual void printCGProfile(const ELFFile<ELFT> *Obj) = 0;
   virtual void printNotes(const ELFFile<ELFT> *Obj) = 0;
   const ELFDumper<ELFT> *dumper() const { return Dumper; }
 
@@ -336,6 +343,7 @@ public:
                           size_t Offset) override;
   void printProgramHeaders(const ELFO *Obj) override;
   void printHashHistogram(const ELFFile<ELFT> *Obj) override;
+  void printCGProfile(const ELFFile<ELFT> *Obj) override;
   void printNotes(const ELFFile<ELFT> *Obj) override;
 
 private:
@@ -394,6 +402,7 @@ public:
   void printDynamicRelocations(const ELFO *Obj) override;
   void printProgramHeaders(const ELFO *Obj) override;
   void printHashHistogram(const ELFFile<ELFT> *Obj) override;
+  void printCGProfile(const ELFFile<ELFT> *Obj) override;
   void printNotes(const ELFFile<ELFT> *Obj) override;
 
 private:
@@ -734,6 +743,16 @@ std::string ELFDumper<ELFT>::getFullSymbolName(const Elf_Sym *Symbol,
   return FullSymbolName;
 }
 
+template <typename ELFT>
+StringRef ELFDumper<ELFT>::getStaticSymbolName(uint32_t Index) const {
+  StringRef StrTable = unwrapOrError(Obj->getStringTableForSymtab(*DotSymtabSec));
+  Elf_Sym_Range Syms = unwrapOrError(Obj->symbols(DotSymtabSec));
+  if (Index >= Syms.size())
+    reportError("Invalid symbol index");
+  const Elf_Sym *Sym = &Syms[Index];
+  return unwrapOrError(Sym->getName(StrTable));
+}
+
 template <typename ELFT>
 static void
 getSectionNameIndex(const ELFFile<ELFT> &Obj, const typename ELFT::Sym *Symbol,
@@ -1341,6 +1360,12 @@ ELFDumper<ELFT>::ELFDumper(const ELFFile<ELFT> *Obj, ScopedPrinter &Writer)
         reportError("Multiple SHT_GNU_verneed");
       dot_gnu_version_r_sec = &Sec;
       break;
+    case ELF::SHT_NOTE:
+      if (unwrapOrError(Obj->getSectionName(&Sec)) != ".note.llvm.cgprofile")
+        break;
+      if (DotCGProfileSec != nullptr)
+        reportError("Multiple .note.llvm.cgprofile");
+      DotCGProfileSec = &Sec;
     }
   }
 
@@ -1485,6 +1510,10 @@ template <class ELFT> void ELFDumper<ELFT>::printHashHistogram() {
   ELFDumperStyle->printHashHistogram(Obj);
 }
 
+template <class ELFT> void ELFDumper<ELFT>::printCGProfile() {
+  ELFDumperStyle->printCGProfile(Obj);
+}
+
 template <class ELFT> void ELFDumper<ELFT>::printNotes() {
   ELFDumperStyle->printNotes(Obj);
 }
@@ -3365,6 +3394,11 @@ void GNUStyle<ELFT>::printHashHistogram(const ELFFile<ELFT> *Obj) {
   }
 }
 
+template <class ELFT>
+void GNUStyle<ELFT>::printCGProfile(const ELFFile<ELFT> *Obj) {
+  OS<< "GNUStyle::printCGProfile not implemented\n";
+}
+
 static std::string getGNUNoteTypeName(const uint32_t NT) {
   static const struct {
     uint32_t ID;
@@ -3965,6 +3999,22 @@ void LLVMStyle<ELFT>::printHashHistogram(const ELFFile<ELFT> *Obj) {
   W.startLine() << "Hash Histogram not implemented!\n";
 }
 
+
+
+template <class ELFT>
+void LLVMStyle<ELFT>::printCGProfile(const ELFFile<ELFT> *Obj) {
+  ListScope L(W, "CGProfile");
+  if (!this->dumper()->getDotCGProfileSec())
+    return;
+  auto CGProfile = unwrapOrError(Obj->template getSectionContentsAsArray<Elf_CGProfile>(this->dumper()->getDotCGProfileSec()));
+  for (const Elf_CGProfile &CGPE : CGProfile) {
+    DictScope D(W, "CGProfileEntry");
+    W.printNumber("From", this->dumper()->getStaticSymbolName(CGPE.cgp_from), CGPE.cgp_from);
+    W.printNumber("To", this->dumper()->getStaticSymbolName(CGPE.cgp_to), CGPE.cgp_to);
+    W.printNumber("Weight", CGPE.cgp_weight);
+  }
+}
+
 template <class ELFT>
 void LLVMStyle<ELFT>::printNotes(const ELFFile<ELFT> *Obj) {
   W.startLine() << "printNotes not implemented!\n";
diff --git a/tools/llvm-readobj/ObjDumper.h b/tools/llvm-readobj/ObjDumper.h
index f283e559e2a..84c259e3fb4 100644
--- a/tools/llvm-readobj/ObjDumper.h
+++ b/tools/llvm-readobj/ObjDumper.h
@@ -47,6 +47,7 @@ public:
   virtual void printVersionInfo() {}
   virtual void printGroupSections() {}
   virtual void printHashHistogram() {}
+  virtual void printCGProfile() {}
   virtual void printNotes() {}
 
   // Only implemented for ARM ELF at this time.
diff --git a/tools/llvm-readobj/llvm-readobj.cpp b/tools/llvm-readobj/llvm-readobj.cpp
index 05b7c800cc1..1eb39b1bb3a 100644
--- a/tools/llvm-readobj/llvm-readobj.cpp
+++ b/tools/llvm-readobj/llvm-readobj.cpp
@@ -284,6 +284,8 @@ namespace opts {
   cl::alias HashHistogramShort("I", cl::desc("Alias for -elf-hash-histogram"),
                                cl::aliasopt(HashHistogram));
 
+  cl::opt<bool> CGProfile("elf-cg-profile", cl::desc("Display callgraph profile section"));
+
   cl::opt<OutputStyleTy>
       Output("elf-output-style", cl::desc("Specify ELF dump style"),
              cl::values(clEnumVal(LLVM, "LLVM default style"),
@@ -439,6 +441,8 @@ static void dumpObject(const ObjectFile *Obj) {
       Dumper->printGroupSections();
     if (opts::HashHistogram)
       Dumper->printHashHistogram();
+    if (opts::CGProfile)
+      Dumper->printCGProfile();
     if (opts::Notes)
       Dumper->printNotes();
   }
-------------- next part --------------
diff --git a/ELF/CMakeLists.txt b/ELF/CMakeLists.txt
index aef5ee68f..135d2f561 100644
--- a/ELF/CMakeLists.txt
+++ b/ELF/CMakeLists.txt
@@ -18,6 +18,7 @@ add_lld_library(lldELF
   Arch/SPARCV9.cpp
   Arch/X86.cpp
   Arch/X86_64.cpp
+  CallGraphSort.cpp
   Driver.cpp
   DriverUtils.cpp
   EhFrame.cpp
diff --git a/ELF/CallGraphSort.cpp b/ELF/CallGraphSort.cpp
new file mode 100644
index 000000000..ad1bfc442
--- /dev/null
+++ b/ELF/CallGraphSort.cpp
@@ -0,0 +1,318 @@
+//===- CallGraphSort.cpp --------------------------------------------------===//
+//
+//                             The LLVM Linker
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file This file implements Call-Chain Clustering from:
+/// Optimizing Function Placement for Large-Scale Data-Center Applications
+/// https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
+///
+/// The goal of this algorithm is to improve runtime performance of the final
+/// executable by arranging code sections such that page table and i-cache
+/// misses are minimized.
+///
+/// Definitions:
+/// * Cluster
+///   * An ordered list of input sections which are layed out as a unit. At the
+///     beginning of the algorithm each input section has its own cluster and
+///     the weight of the cluster is the sum of the weight of all incomming
+///     edges.
+/// * Call-Chain Clustering (C�) Heuristic
+///   * Defines when and how clusters are combined. Pick the highest weight edge
+///     from cluster _u_ to _v_ then move the sections in _v_ and append them to
+///     _u_ unless the combined size would be larger than the page size.
+/// * Density
+///   * The weight of the cluster divided by the size of the cluster. This is a
+///     proxy for the ammount of execution time spent per byte of the cluster.
+///
+/// It does so given a call graph profile by the following:
+/// * Build a call graph from the profile
+/// * While there are unresolved edges
+///   * Find the edge with the highest weight
+///   * Check if merging the two clusters would create a cluster larger than the
+///     target page size
+///   * If not, contract that edge putting the callee after the caller
+/// * Sort remaining clusters by density
+///
+//===----------------------------------------------------------------------===//
+
+#include "CallGraphSort.h"
+#include "SymbolTable.h"
+#include "Target.h"
+
+#include "llvm/Support/MathExtras.h"
+
+#include <queue>
+#include <unordered_set>
+
+using namespace llvm;
+using namespace lld;
+using namespace lld::elf;
+
+namespace {
+class CallGraphSort {
+  using NodeIndex = std::ptrdiff_t;
+  using EdgeIndex = std::ptrdiff_t;
+
+  struct Node {
+    Node() = default;
+    Node(const InputSectionBase *IS);
+    std::vector<const InputSectionBase *> Sections;
+    std::vector<EdgeIndex> IncidentEdges;
+    int64_t Size = 0;
+    uint64_t Weight = 0;
+  };
+
+  struct Edge {
+    NodeIndex From;
+    NodeIndex To;
+    mutable uint64_t Weight;
+    bool operator==(const Edge Other) const;
+    bool operator<(const Edge Other) const;
+    void kill();
+    bool isDead() const;
+  };
+
+  struct EdgeDenseMapInfo {
+    static Edge getEmptyKey() {
+      return {DenseMapInfo<NodeIndex>::getEmptyKey(),
+              DenseMapInfo<NodeIndex>::getEmptyKey(), 0};
+    }
+    static Edge getTombstoneKey() {
+      return {DenseMapInfo<NodeIndex>::getTombstoneKey(),
+              DenseMapInfo<NodeIndex>::getTombstoneKey(), 0};
+    }
+    static unsigned getHashValue(const Edge &Val) {
+      return hash_combine(DenseMapInfo<NodeIndex>::getHashValue(Val.From),
+                          DenseMapInfo<NodeIndex>::getHashValue(Val.To));
+    }
+    static bool isEqual(const Edge &LHS, const Edge &RHS) { return LHS == RHS; }
+  };
+
+  std::vector<Node> Nodes;
+  std::vector<Edge> Edges;
+  struct EdgePriorityCmp {
+    std::vector<Edge> &Edges;
+    bool operator()(EdgeIndex A, EdgeIndex B) const {
+      return Edges[A].Weight < Edges[B].Weight;
+    }
+  };
+  std::priority_queue<EdgeIndex, std::vector<EdgeIndex>, EdgePriorityCmp>
+      WorkQueue{EdgePriorityCmp{Edges}};
+
+  void contractEdge(EdgeIndex CEI);
+  void generateClusters();
+
+public:
+  CallGraphSort(DenseMap<std::pair<const SymbolBody *, const SymbolBody *>,
+                         uint64_t> &Profile);
+
+  DenseMap<const InputSectionBase *, int> run();
+};
+} // end anonymous namespace
+
+CallGraphSort::Node::Node(const InputSectionBase *IS) {
+  Sections.push_back(IS);
+  Size = IS->getSize();
+}
+
+bool CallGraphSort::Edge::operator==(const Edge Other) const {
+  return From == Other.From && To == Other.To;
+}
+
+bool CallGraphSort::Edge::operator<(const Edge Other) const {
+  if (From != Other.From)
+    return From < Other.From;
+  return To < Other.To;
+}
+
+void CallGraphSort::Edge::kill() {
+  From = 0;
+  To = 0;
+}
+
+bool CallGraphSort::Edge::isDead() const { return From == 0 && To == 0; }
+
+// Take the edge list in Config->CallGraphProfile, resolve symbol names to
+// SymbolBodys, and generate a graph between InputSections with the provided
+// weights.
+CallGraphSort::CallGraphSort(
+    DenseMap<std::pair<const SymbolBody *, const SymbolBody *>, uint64_t>
+        &Profile) {
+  DenseMap<const InputSectionBase *, NodeIndex> SecToNode;
+  DenseMap<Edge, EdgeIndex, EdgeDenseMapInfo> EdgeMap;
+
+  auto GetOrCreateNode = [&](const InputSectionBase *IS) -> NodeIndex {
+    auto Res = SecToNode.insert(std::make_pair(IS, Nodes.size()));
+    if (Res.second)
+      Nodes.emplace_back(IS);
+    return Res.first->second;
+  };
+
+  // Create the graph.
+  for (const auto &C : Profile) {
+    const SymbolBody *FromSym = C.first.first;
+    const SymbolBody *ToSym = C.first.second;
+    uint64_t Weight = C.second;
+
+    if (Weight == 0)
+      continue;
+
+    // Get the input section for a given symbol.
+    auto *FromDR = dyn_cast_or_null<DefinedRegular>(FromSym);
+    auto *ToDR = dyn_cast_or_null<DefinedRegular>(ToSym);
+    if (!FromDR || !ToDR)
+      continue;
+
+    auto *FromSB = dyn_cast_or_null<const InputSectionBase>(FromDR->Section);
+    auto *ToSB = dyn_cast_or_null<const InputSectionBase>(ToDR->Section);
+    if (!FromSB || !ToSB || FromSB->getSize() == 0 || ToSB->getSize() == 0)
+      continue;
+
+    NodeIndex From = GetOrCreateNode(FromSB);
+    NodeIndex To = GetOrCreateNode(ToSB);
+    Edge E{From, To, Weight};
+
+    // Add or increment an edge
+    auto Res = EdgeMap.insert(std::make_pair(E, Edges.size()));
+    EdgeIndex EI = Res.first->second;
+    if (Res.second) {
+      Edges.push_back(E);
+      Nodes[From].IncidentEdges.push_back(EI);
+      Nodes[To].IncidentEdges.push_back(EI);
+    } else
+      Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, Weight);
+
+    Nodes[To].Weight = SaturatingAdd(Nodes[To].Weight, Weight);
+  }
+}
+
+/// Remove edge \p CEI from the graph while simultaneously merging its two
+/// incident vertices u and v. This merges any duplicate edges between u and v
+/// by accumulating their weights.
+void CallGraphSort::contractEdge(EdgeIndex CEI) {
+  // Make a copy of the edge as the original will be marked killed while being
+  // used.
+  Edge CE = Edges[CEI];
+  std::vector<EdgeIndex> &FE = Nodes[CE.From].IncidentEdges;
+
+  // Remove the self edge from From.
+  FE.erase(std::remove(FE.begin(), FE.end(), CEI));
+  std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;
+
+  // Update all edges incident with To to reference From instead. Then if they
+  // aren't self edges add them to From.
+  for (EdgeIndex EI : TE) {
+    Edge &E = Edges[EI];
+    if (E.From == CE.To)
+      E.From = CE.From;
+    if (E.To == CE.To)
+      E.To = CE.From;
+    if (E.To == E.From) {
+      E.kill();
+      continue;
+    }
+    FE.push_back(EI);
+  }
+
+  // Free memory.
+  std::vector<EdgeIndex>().swap(TE);
+
+  if (FE.empty())
+    return;
+
+  // Sort edges so they can be merged. The stability of this sort doesn't matter
+  // as equal edges will be merged in an order independent manner.
+  std::sort(FE.begin(), FE.end(),
+            [&](EdgeIndex AI, EdgeIndex BI) { return Edges[AI] < Edges[BI]; });
+
+  // std::unique, but also merge equal values.
+  auto First = FE.begin();
+  auto Last = FE.end();
+  auto Result = First;
+  while (++First != Last) {
+    if (Edges[*Result] == Edges[*First]) {
+      Edges[*Result].Weight =
+          SaturatingAdd(Edges[*Result].Weight, Edges[*First].Weight);
+      Edges[*First].kill();
+      // Add the updated edge to the work queue without removing the previous
+      // entry. Edges will never be contracted twice as they are marked as dead.
+      WorkQueue.push(*Result);
+    } else if (++Result != First)
+      *Result = *First;
+  }
+  FE.erase(++Result, FE.end());
+}
+
+// Group InputSections into clusters using the Call-Chain Clustering heuristic
+// then sort the clusters by density.
+void CallGraphSort::generateClusters() {
+  for (size_t I = 0; I < Edges.size(); ++I)
+    WorkQueue.push(I);
+
+  // Collapse the graph.
+  while (!WorkQueue.empty()) {
+    EdgeIndex MaxI = WorkQueue.top();
+    const Edge MaxE = Edges[MaxI];
+    WorkQueue.pop();
+    if (MaxE.isDead())
+      continue;
+    // Merge the Nodes.
+    Node &From = Nodes[MaxE.From];
+    Node &To = Nodes[MaxE.To];
+    if (From.Size + To.Size > Target->PageSize)
+      continue;
+    contractEdge(MaxI);
+    From.Sections.insert(From.Sections.end(), To.Sections.begin(),
+                         To.Sections.end());
+    From.Size += To.Size;
+    From.Weight = SaturatingAdd(From.Weight, To.Weight);
+    To.Sections.clear();
+    To.Size = 0;
+    To.Weight = 0;
+  }
+
+  // Remove empty or dead nodes.
+  Nodes.erase(std::remove_if(Nodes.begin(), Nodes.end(),
+                             [](const Node &N) {
+                               return N.Size == 0 || N.Sections.empty();
+                             }),
+              Nodes.end());
+
+  // Sort by density. Invalidates all NodeIndexs.
+  std::sort(Nodes.begin(), Nodes.end(), [](const Node &A, const Node &B) {
+    return (APFloat(APFloat::IEEEdouble(), A.Weight) /
+            APFloat(APFloat::IEEEdouble(), A.Size))
+               .compare(APFloat(APFloat::IEEEdouble(), B.Weight) /
+                        APFloat(APFloat::IEEEdouble(), B.Size)) ==
+           APFloat::cmpLessThan;
+  });
+}
+
+DenseMap<const InputSectionBase *, int> CallGraphSort::run() {
+  generateClusters();
+
+  // Generate order.
+  llvm::DenseMap<const InputSectionBase *, int> OrderMap;
+  ssize_t CurOrder = 1;
+
+  for (const Node &N : Nodes)
+    for (const InputSectionBase *IS : N.Sections)
+      OrderMap[IS] = CurOrder++;
+
+  return OrderMap;
+}
+
+// Sort sections by the profile data provided by -callgraph-profile-file
+//
+// This first builds a call graph based on the profile data then iteratively
+// merges the hottest call edges as long as it would not create a cluster larger
+// than the page size. All clusters are then sorted by a density metric to
+// further improve locality.
+DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder() {
+  return CallGraphSort(Config->CallGraphProfile).run();
+}
diff --git a/ELF/CallGraphSort.h b/ELF/CallGraphSort.h
new file mode 100644
index 000000000..46455489c
--- /dev/null
+++ b/ELF/CallGraphSort.h
@@ -0,0 +1,24 @@
+//===- CallGraphSort.h ------------------------------------------*- C++ -*-===//
+//
+//                             The LLVM Linker
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLD_ELF_CALL_GRAPH_SORT_H
+#define LLD_ELF_CALL_GRAPH_SORT_H
+
+#include "llvm/ADT/DenseMap.h"
+
+namespace lld {
+namespace elf {
+class InputSectionBase;
+
+llvm::DenseMap<const InputSectionBase *, int>
+computeCallGraphProfileOrder();
+}
+}
+
+#endif
diff --git a/ELF/Config.h b/ELF/Config.h
index d4b8510f4..a70eab558 100644
--- a/ELF/Config.h
+++ b/ELF/Config.h
@@ -10,6 +10,7 @@
 #ifndef LLD_ELF_CONFIG_H
 #define LLD_ELF_CONFIG_H
 
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/StringSet.h"
@@ -24,6 +25,7 @@ namespace lld {
 namespace elf {
 
 class InputFile;
+class SymbolBody;
 
 enum ELFKind {
   ELFNoneKind,
@@ -91,6 +93,7 @@ struct Configuration {
   llvm::StringRef SoName;
   llvm::StringRef Sysroot;
   llvm::StringRef ThinLTOCacheDir;
+  llvm::StringRef CallGraphProfileFile;
   std::string Rpath;
   std::vector<VersionDefinition> VersionDefinitions;
   std::vector<llvm::StringRef> Argv;
@@ -103,11 +106,14 @@ struct Configuration {
   std::vector<SymbolVersion> VersionScriptGlobals;
   std::vector<SymbolVersion> VersionScriptLocals;
   std::vector<uint8_t> BuildIdVector;
+  llvm::DenseMap<std::pair<const SymbolBody *, const SymbolBody *>, uint64_t>
+      CallGraphProfile;
   bool AllowMultipleDefinition;
   bool AndroidPackDynRelocs = false;
   bool AsNeeded = false;
   bool Bsymbolic;
   bool BsymbolicFunctions;
+  bool CallGraphProfileSort = true;
   bool CompressDebugSections;
   bool DefineCommon;
   bool Demangle = true;
diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
index 8176a37c7..bfcad1fde 100644
--- a/ELF/Driver.cpp
+++ b/ELF/Driver.cpp
@@ -610,6 +610,42 @@ static std::vector<StringRef> getLines(MemoryBufferRef MB) {
   return Ret;
 }
 
+// This reads a list of call edges with weights one line at a time from a file
+// with the following format for each line:
+//
+// ^[.*]+ [.*]+ [.*]+$
+//
+// It interprets the first value as an unsigned 64 bit weight, the second as
+// the symbol the call is from, and the third as the symbol the call is to.
+//
+// Example:
+//
+// 5000 c a
+// 4000 c b
+// 18446744073709551615 e d
+//
+template <typename ELFT>
+void readCallGraphProfile(MemoryBufferRef MB) {
+  for (StringRef L : getLines(MB)) {
+    SmallVector<StringRef, 3> Fields;
+    L.split(Fields, ' ');
+    if (Fields.size() != 3) {
+      error("parse error: " + MB.getBufferIdentifier() + ": " + L);
+      return;
+    }
+    uint64_t Count;
+    if (!to_integer(Fields[0], Count)) {
+      error("parse error: " + MB.getBufferIdentifier() + ": " + L);
+      return;
+    }
+    StringRef From = Fields[1];
+    StringRef To = Fields[2];
+    Config->CallGraphProfile[std::make_pair(Symtab->addUndefined<ELFT>(From),
+                                            Symtab->addUndefined<ELFT>(To))] =
+        Count;
+  }
+}
+
 static bool getCompressDebugSections(opt::InputArgList &Args) {
   StringRef S = Args.getLastArgValue(OPT_compress_debug_sections, "none");
   if (S == "none")
@@ -635,6 +671,8 @@ void LinkerDriver::readConfigs(opt::InputArgList &Args) {
   Config->AuxiliaryList = getArgs(Args, OPT_auxiliary);
   Config->Bsymbolic = Args.hasArg(OPT_Bsymbolic);
   Config->BsymbolicFunctions = Args.hasArg(OPT_Bsymbolic_functions);
+  Config->CallGraphProfileSort = Args.hasFlag(
+      OPT_call_graph_profile_sort, OPT_no_call_graph_profile_sort, true);
   Config->Chroot = Args.getLastArgValue(OPT_chroot);
   Config->CompressDebugSections = getCompressDebugSections(Args);
   Config->DefineCommon = Args.hasFlag(OPT_define_common, OPT_no_define_common,
@@ -788,6 +826,9 @@ void LinkerDriver::readConfigs(opt::InputArgList &Args) {
     if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))
       Config->SymbolOrderingFile = getLines(*Buffer);
 
+  if (auto *Arg = Args.getLastArg(OPT_call_graph_profile_file))
+    Config->CallGraphProfileFile = Arg->getValue();
+
   // If --retain-symbol-file is used, we'll keep only the symbols listed in
   // the file and discard all others.
   if (auto *Arg = Args.getLastArg(OPT_retain_symbols_file)) {
@@ -1063,6 +1104,11 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &Args) {
   Config->HasDynSymTab =
       !SharedFiles.empty() || Config->Pic || Config->ExportDynamic;
 
+  if (!Config->CallGraphProfileFile.empty())
+    if (Optional<MemoryBufferRef> Buffer =
+            readFile(Config->CallGraphProfileFile))
+      readCallGraphProfile<ELFT>(*Buffer);
+
   // Some symbols (such as __ehdr_start) are defined lazily only when there
   // are undefined symbols for them, so we add these to trigger that logic.
   for (StringRef Sym : Script->ReferencedSymbols)
diff --git a/ELF/InputFiles.cpp b/ELF/InputFiles.cpp
index 98329cdc0..1dacc0197 100644
--- a/ELF/InputFiles.cpp
+++ b/ELF/InputFiles.cpp
@@ -173,6 +173,15 @@ std::string ObjFile<ELFT>::getLineInfo(InputSectionBase *S, uint64_t Offset) {
   return "";
 }
 
+template<class ELFT>
+void lld::elf::ObjFile<ELFT>::parseCGProfile() {
+  for (const Elf_CGProfile &CGPE : CGProfile) {
+    uint64_t &C = Config->CallGraphProfile[std::make_pair(
+        &getSymbolBody(CGPE.cgp_from), &getSymbolBody(CGPE.cgp_to))];
+    C = std::max(C, (uint64_t)CGPE.cgp_weight);
+  }
+}
+
 // Returns "<internal>", "foo.a(bar.o)" or "baz.o".
 std::string lld::toString(const InputFile *F) {
   if (!F)
@@ -238,6 +247,7 @@ void ObjFile<ELFT>::parse(DenseSet<CachedHashStringRef> &ComdatGroups) {
   // Read section and symbol tables.
   initializeSections(ComdatGroups);
   initializeSymbols();
+  parseCGProfile();
 }
 
 // Sections with SHT_GROUP and comdat bits define comdat section groups.
@@ -554,6 +564,13 @@ InputSectionBase *ObjFile<ELFT>::createInputSection(const Elf_Shdr &Sec) {
   if (Name == ".eh_frame" && !Config->Relocatable)
     return make<EhInputSection>(this, &Sec, Name);
 
+  // Profile data.
+  if (Name == ".note.llvm.cgprofile") {
+    CGProfile = check(
+        this->getObj().template getSectionContentsAsArray<Elf_CGProfile>(&Sec));
+    return &InputSection::Discarded;
+  }
+
   if (shouldMerge(Sec))
     return make<MergeInputSection>(this, &Sec, Name);
   return make<InputSection>(this, &Sec, Name);
diff --git a/ELF/InputFiles.h b/ELF/InputFiles.h
index e1eaa5d33..4b27fccaa 100644
--- a/ELF/InputFiles.h
+++ b/ELF/InputFiles.h
@@ -154,6 +154,7 @@ template <class ELFT> class ObjFile : public ELFFileBase<ELFT> {
   typedef typename ELFT::Sym Elf_Sym;
   typedef typename ELFT::Shdr Elf_Shdr;
   typedef typename ELFT::Word Elf_Word;
+  typedef typename ELFT::CGProfile Elf_CGProfile;
 
   StringRef getShtGroupSignature(ArrayRef<Elf_Shdr> Sections,
                                  const Elf_Shdr &Sec);
@@ -202,6 +203,7 @@ private:
   initializeSections(llvm::DenseSet<llvm::CachedHashStringRef> &ComdatGroups);
   void initializeSymbols();
   void initializeDwarf();
+  void parseCGProfile();
   InputSectionBase *getRelocTarget(const Elf_Shdr &Sec);
   InputSectionBase *createInputSection(const Elf_Shdr &Sec);
   StringRef getSectionName(const Elf_Shdr &Sec);
@@ -219,6 +221,8 @@ private:
   std::unique_ptr<llvm::DWARFDebugLine> DwarfLine;
   llvm::DenseMap<StringRef, std::pair<unsigned, unsigned>> VariableLoc;
   llvm::once_flag InitDwarfLine;
+
+  ArrayRef<Elf_CGProfile> CGProfile;
 };
 
 // LazyObjFile is analogous to ArchiveFile in the sense that
diff --git a/ELF/Options.td b/ELF/Options.td
index e326f5b65..b599e45b9 100644
--- a/ELF/Options.td
+++ b/ELF/Options.td
@@ -51,6 +51,12 @@ def allow_multiple_definition: F<"allow-multiple-definition">,
 def as_needed: F<"as-needed">,
   HelpText<"Only set DT_NEEDED for shared libraries if used">;
 
+def call_graph_profile_file: S<"call-graph-profile-file">,
+  HelpText<"Layout sections to optimize the given callgraph">;
+
+def call_graph_profile_sort: F<"call-graph-profile-sort">,
+  HelpText<"Sort sections by call graph profile information">;
+
 // -chroot doesn't have a help text because it is an internal option.
 def chroot: S<"chroot">;
 
@@ -163,6 +169,9 @@ def nostdlib: F<"nostdlib">,
 def no_as_needed: F<"no-as-needed">,
   HelpText<"Always DT_NEEDED for shared libraries">;
 
+def no_call_graph_profile_sort: F<"no-call-graph-profile-sort">,
+  HelpText<"Don't sort sections by call graph profile information">;
+
 def no_color_diagnostics: F<"no-color-diagnostics">,
   HelpText<"Do not use colors in diagnostics">;
 
diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
index 944c683ed..752972df7 100644
--- a/ELF/Writer.cpp
+++ b/ELF/Writer.cpp
@@ -8,6 +8,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "Writer.h"
+#include "CallGraphSort.h"
 #include "Config.h"
 #include "Filesystem.h"
 #include "LinkerScript.h"
@@ -1003,6 +1004,15 @@ findOrphanPos(std::vector<BaseCommand *>::iterator B,
 template <class ELFT> void Writer<ELFT>::sortInputSections() {
   assert(!Script->HasSectionsCommand);
 
+  // Use the rarely used option -call-graph-ordering-file to sort sections.
+  if (Config->CallGraphProfileSort && !Config->CallGraphProfile.empty()) {
+    DenseMap<const InputSectionBase *, int> OrderMap =
+      computeCallGraphProfileOrder();
+
+    if (OutputSection *Sec = findSection(".text"))
+      Sec->sort([&](InputSectionBase *S) { return OrderMap.lookup(S); });
+  }
+
   // Sort input sections by priority using the list provided
   // by --symbol-ordering-file.
   DenseMap<SectionBase *, int> Order = buildSectionOrder();
diff --git a/test/ELF/Inputs/cgprofile.txt b/test/ELF/Inputs/cgprofile.txt
new file mode 100644
index 000000000..6b60397a6
--- /dev/null
+++ b/test/ELF/Inputs/cgprofile.txt
@@ -0,0 +1,7 @@
+5000 c a
+4000 c b
+0 d e
+18446744073709551615 e d
+18446744073709551611 f d
+18446744073709551612 f e
+6000 c h
diff --git a/test/ELF/cgprofile-object.s b/test/ELF/cgprofile-object.s
new file mode 100644
index 000000000..b308d58de
--- /dev/null
+++ b/test/ELF/cgprofile-object.s
@@ -0,0 +1,50 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
+# RUN: ld.lld %t -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s
+# RUN: ld.lld %t -o %t2 -no-call-graph-profile-sort
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT
+
+    .section    .text.hot._Z4fooav,"ax", at progbits
+    .globl  _Z4fooav
+_Z4fooav:
+    retq
+
+    .section    .text.hot._Z4foobv,"ax", at progbits
+    .globl  _Z4foobv
+_Z4foobv:
+    retq
+
+    .section    .text.hot._Z3foov,"ax", at progbits
+    .globl  _Z3foov
+_Z3foov:
+    retq
+
+    .section    .text.hot._start,"ax", at progbits
+    .globl  _start
+_start:
+    retq
+
+
+    .cg_profile _start, _Z3foov, 1
+    .cg_profile _Z4fooav, _Z4foobv, 1
+    .cg_profile _Z3foov, _Z4fooav, 1
+
+# CHECK:          Name: _Z3foov
+# CHECK-NEXT:     Value: 0x201001
+# CHECK:          Name: _Z4fooav
+# CHECK-NEXT:     Value: 0x201002
+# CHECK:          Name: _Z4foobv
+# CHECK-NEXT:     Value: 0x201003
+# CHECK:          Name: _start
+# CHECK-NEXT:     Value: 0x201000
+	
+# NOSORT:          Name: _Z3foov
+# NOSORT-NEXT:     Value: 0x201002
+# NOSORT:          Name: _Z4fooav
+# NOSORT-NEXT:     Value: 0x201000
+# NOSORT:          Name: _Z4foobv
+# NOSORT-NEXT:     Value: 0x201001
+# NOSORT:          Name: _start
+# NOSORT-NEXT:     Value: 0x201003
diff --git a/test/ELF/cgprofile.s b/test/ELF/cgprofile.s
new file mode 100644
index 000000000..ce0e0a51b
--- /dev/null
+++ b/test/ELF/cgprofile.s
@@ -0,0 +1,128 @@
+# REQUIRES: x86
+#
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t1
+# RUN: ld.lld %t1 -e a -o %t -call-graph-profile-file %p/Inputs/cgprofile.txt
+# RUN: llvm-readobj -symbols %t | FileCheck %s
+
+    .section .text.a,"ax", at progbits
+    .global a
+a:
+    .zero 20
+
+    .section .text.b,"ax", at progbits
+    .global b
+b:
+    .zero 1
+    
+    .section .text.c,"ax", at progbits
+    .global c
+c:
+    .zero 4095
+    
+    .section .text.d,"ax", at progbits
+    .global d
+d:
+    .zero 51
+    
+    .section .text.e,"ax", at progbits
+    .global e
+e:
+    .zero 42
+
+    .section .text.f,"ax", at progbits
+    .global f
+f:
+    .zero 42
+	
+    .section .text.g,"ax", at progbits
+    .global g
+g:
+	.zero 34
+	
+    .section .text.h,"ax", at progbits
+    .global h
+h:
+
+# CHECK:     Symbols [
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name:  (0)
+# CHECK-NEXT:    Value: 0x0
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Local (0x0)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: Undefined (0x0)
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: a
+# CHECK-NEXT:    Value: 0x202022
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: b
+# CHECK-NEXT:    Value: 0x202021
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: c
+# CHECK-NEXT:    Value: 0x201022
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: d
+# CHECK-NEXT:    Value: 0x20208A
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: e
+# CHECK-NEXT:    Value: 0x202060
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: f
+# CHECK-NEXT:    Value: 0x202036
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: g
+# CHECK-NEXT:    Value: 0x201000
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:  Symbol {
+# CHECK-NEXT:    Name: h
+# CHECK-NEXT:    Value: 0x201022
+# CHECK-NEXT:    Size: 0
+# CHECK-NEXT:    Binding: Global (0x1)
+# CHECK-NEXT:    Type: None (0x0)
+# CHECK-NEXT:    Other: 0
+# CHECK-NEXT:    Section: .text
+# CHECK-NEXT:  }
+# CHECK-NEXT:]


More information about the llvm-commits mailing list