[PATCH] D36351: [lld][ELF] Add profile guided section layout
Michael Spencer via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 23 17:02:39 PDT 2017
- Michael Spencer
On Tue, Oct 3, 2017 at 6:43 PM, Rui Ueyama via Phabricator <
reviews at reviews.llvm.org> wrote:
> ruiu added a comment.
>
> Could you send me a patch to produce a call graph file so that I can try
> this patch on my machine?
>
Sorry for the delay. I've attached the llvm and lld patches that implement
a full testable version of the feature.
To use:
-1) Have an elf system with working clang instrumentation based profiling
0) Compile clang and lld with the supplied patch
1) Compile the code with `-fprofile-instr-generate` and link with any linker
2) Run the program on a representative sample
3) `$ llvm-profdata merge default.profraw -o default.profdata`
4) Compile the code again with `-fprofile-instr-use=default.profdata
-ffunction-sections -fuse-ld=lld`
The output of #4 is the program with sections ordered by profile data. You
can add -Wl,-no-call-graph-profile-sort to disable sorting to measure the
difference.
`llvm-readobj -elf-cg-profile` will dump the cg profile section.
This also works with LTO.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:99-101
> + if (To != Other.To)
> + return To < Other.To;
> + return false;
> ----------------
> You can just return `To < Other.To`.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:127
> + // Create the graph.
> + for (const auto &C : Profile) {
> + if (C.second == 0)
> ----------------
> This loop is a bit too dense. It cannot be understood without reading each
> line carefully as I don't understand the whole picture. Please insert a
> blank line between code blocks. Adding more comment would help.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:128
> + for (const auto &C : Profile) {
> + if (C.second == 0)
> + continue;
> ----------------
> Please define local variables for `C.first.first`, `C.first.second` and
> `C.second` so that they are accessed through meaningful names.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:130-135
> + auto FromDR = dyn_cast_or_null<DefinedRegular>(Symtab->find(
> C.first.first));
> + auto ToDR = dyn_cast_or_null<DefinedRegular>(Symtab->find(
> C.first.second));
> + if (!FromDR || !ToDR)
> + continue;
> + auto FromSB = dyn_cast_or_null<const InputSectionBase>(FromDR->
> Section);
> + auto ToSB = dyn_cast_or_null<const InputSectionBase>(ToDR->Section);
> ----------------
> auto -> auto *
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:147
> + Nodes[To].IncidentEdges.push_back(EI);
> + } else
> + Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, C.second);
> ----------------
> nit: add {}
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:153
> +
> +void CallGraphSort::contractEdge(EdgeIndex CEI) {
> + // Make a copy of the edge as the original will be marked killed while
> being
> ----------------
> Please add a function comment as to what this function is intended to do.
> I do not understand this function because I don't get a whole picture.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:158-163
> + // Remove the self edge from From.
> + FE.erase(std::remove(FE.begin(), FE.end(), CEI));
> + std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;
> + // Update all edges incident with To to reference From instead. Then if
> they
> + // aren't self edges add them to From.
> + for (EdgeIndex EI : TE) {
> ----------------
> Add blank lines before comments.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:165-166
> + Edge &E = Edges[EI];
> + // E.From = E.From == CE.To ? CE.From : E.From;
> + // E.To = E.To == CE.To ? CE.From : E.To;
> + if (E.From == CE.To)
> ----------------
> Please remove debug code.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:178-180
> + // Free memory.
> + std::vector<EdgeIndex>().swap(TE);
> +
> ----------------
> This looks odd. Why do you need to do this? I think you can just leave it
> alone.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:207-208
> +
> +// Group InputSections into clusters using the Call-Chain Clustering
> heuristic
> +// then sort the clusters by density.
> +void CallGraphSort::generateClusters() {
> ----------------
> This might be understood for those who read the paper, but I don't think
> that is enough. Please write more comment as to what are clusters, what is
> density, and what is the heuristic.
>
>
> ================
> Comment at: ELF/CallGraphSort.cpp:272-273
> +DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder()
> {
> + CallGraphSort CGS(Config->CallGraphProfile);
> + return CGS.run();
> +}
> ----------------
> You can do this in one line without defining a local variable.
>
>
> ================
> Comment at: ELF/Writer.cpp:926-928
> + for (BaseCommand *Base : Script->Opt.Commands)
> + if (auto *OS = dyn_cast<OutputSection>(Base))
> + if (OS->Name == ".text") {
> ----------------
> I'd factor this code out as `OutputSection *findOutputSection(StringRef
> Name)`.
>
>
> https://reviews.llvm.org/D36351
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171023/d1067ef5/attachment-0001.html>
-------------- next part --------------
diff --git a/include/llvm/InitializePasses.h b/include/llvm/InitializePasses.h
index bf54b6471f4..175cf9fd7b6 100644
--- a/include/llvm/InitializePasses.h
+++ b/include/llvm/InitializePasses.h
@@ -83,6 +83,7 @@ void initializeBreakCriticalEdgesPass(PassRegistry&);
void initializeCFGOnlyPrinterLegacyPassPass(PassRegistry&);
void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);
void initializeCFGPrinterLegacyPassPass(PassRegistry&);
+void initializeCFGProfilePassPass(PassRegistry&);
void initializeCFGSimplifyPassPass(PassRegistry&);
void initializeCFGViewerLegacyPassPass(PassRegistry&);
void initializeCFLAndersAAWrapperPassPass(PassRegistry&);
diff --git a/include/llvm/LinkAllPasses.h b/include/llvm/LinkAllPasses.h
index 29314617177..c9c0a5e19ea 100644
--- a/include/llvm/LinkAllPasses.h
+++ b/include/llvm/LinkAllPasses.h
@@ -75,6 +75,7 @@ namespace {
(void) llvm::createCallGraphDOTPrinterPass();
(void) llvm::createCallGraphViewerPass();
(void) llvm::createCFGSimplificationPass();
+ (void) llvm::createCFGProfilePass();
(void) llvm::createLateCFGSimplificationPass();
(void) llvm::createCFLAndersAAWrapperPass();
(void) llvm::createCFLSteensAAWrapperPass();
diff --git a/include/llvm/MC/MCAssembler.h b/include/llvm/MC/MCAssembler.h
index 4f1b5a8b3d7..6535661bafc 100644
--- a/include/llvm/MC/MCAssembler.h
+++ b/include/llvm/MC/MCAssembler.h
@@ -391,6 +391,13 @@ public:
const MCLOHContainer &getLOHContainer() const {
return const_cast<MCAssembler *>(this)->getLOHContainer();
}
+
+ struct CGProfileEntry {
+ const MCSymbol *From;
+ const MCSymbol *To;
+ uint64_t Count;
+ };
+ std::vector<CGProfileEntry> CGProfile;
/// @}
/// \name Backend Data Access
/// @{
diff --git a/include/llvm/MC/MCELFStreamer.h b/include/llvm/MC/MCELFStreamer.h
index c5b66a163c8..3402980c13b 100644
--- a/include/llvm/MC/MCELFStreamer.h
+++ b/include/llvm/MC/MCELFStreamer.h
@@ -66,6 +66,9 @@ public:
void EmitValueToAlignment(unsigned, int64_t, unsigned, unsigned) override;
+ void emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+ uint64_t Count) override;
+
void FinishImpl() override;
void EmitBundleAlignMode(unsigned AlignPow2) override;
diff --git a/include/llvm/MC/MCStreamer.h b/include/llvm/MC/MCStreamer.h
index 479a7e2a748..e013f0097a4 100644
--- a/include/llvm/MC/MCStreamer.h
+++ b/include/llvm/MC/MCStreamer.h
@@ -841,6 +841,9 @@ public:
SMLoc Loc = SMLoc());
virtual void EmitWinEHHandlerData(SMLoc Loc = SMLoc());
+ virtual void emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+ uint64_t Count);
+
/// Get the .pdata section used for the given section. Typically the given
/// section is either the main .text section or some other COMDAT .text
/// section, but it may be any section containing code.
diff --git a/include/llvm/Object/ELFTypes.h b/include/llvm/Object/ELFTypes.h
index 83b688548fd..905916e910c 100644
--- a/include/llvm/Object/ELFTypes.h
+++ b/include/llvm/Object/ELFTypes.h
@@ -40,6 +40,7 @@ template <class ELFT> struct Elf_Versym_Impl;
template <class ELFT> struct Elf_Hash_Impl;
template <class ELFT> struct Elf_GnuHash_Impl;
template <class ELFT> struct Elf_Chdr_Impl;
+template <class ELFT> struct Elf_CGProfile_Impl;
template <endianness E, bool Is64> struct ELFType {
private:
@@ -66,6 +67,7 @@ public:
using Hash = Elf_Hash_Impl<ELFType<E, Is64>>;
using GnuHash = Elf_GnuHash_Impl<ELFType<E, Is64>>;
using Chdr = Elf_Chdr_Impl<ELFType<E, Is64>>;
+ using CGProfile = Elf_CGProfile_Impl<ELFType<E, Is64>>;
using DynRange = ArrayRef<Dyn>;
using ShdrRange = ArrayRef<Shdr>;
using SymRange = ArrayRef<Sym>;
@@ -590,6 +592,14 @@ struct Elf_Chdr_Impl<ELFType<TargetEndianness, true>> {
Elf_Xword ch_addralign;
};
+template <class ELFT>
+struct Elf_CGProfile_Impl {
+ LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)
+ Elf_Word cgp_from;
+ Elf_Word cgp_to;
+ Elf_Xword cgp_weight;
+};
+
// MIPS .reginfo section
template <class ELFT>
struct Elf_Mips_RegInfo;
diff --git a/include/llvm/Transforms/Instrumentation.h b/include/llvm/Transforms/Instrumentation.h
index fe458e7be06..51374b7cb5a 100644
--- a/include/llvm/Transforms/Instrumentation.h
+++ b/include/llvm/Transforms/Instrumentation.h
@@ -206,6 +206,8 @@ inline ModulePass *createDataFlowSanitizerPassForJIT(
// checking on loads, stores, and other memory intrinsics.
FunctionPass *createBoundsCheckingPass();
+ModulePass *createCFGProfilePass();
+
/// \brief Calculate what to divide by to scale counts.
///
/// Given the maximum count, calculate a divisor that will scale all the
diff --git a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
index e45cdee4368..130b62cb3cd 100644
--- a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+++ b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
@@ -97,16 +97,60 @@ void TargetLoweringObjectFileELF::emitModuleMetadata(
StringRef Section;
GetObjCImageInfo(M, Version, Flags, Section);
- if (Section.empty())
- return;
+ if (!Section.empty()) {
+ auto &C = getContext();
+ auto *S = C.getELFSection(Section, ELF::SHT_PROGBITS, ELF::SHF_ALLOC);
+ Streamer.SwitchSection(S);
+ Streamer.EmitLabel(C.getOrCreateSymbol(StringRef("OBJC_IMAGE_INFO")));
+ Streamer.EmitIntValue(Version, 4);
+ Streamer.EmitIntValue(Flags, 4);
+ Streamer.AddBlankLine();
+ }
- auto &C = getContext();
- auto *S = C.getELFSection(Section, ELF::SHT_PROGBITS, ELF::SHF_ALLOC);
- Streamer.SwitchSection(S);
- Streamer.EmitLabel(C.getOrCreateSymbol(StringRef("OBJC_IMAGE_INFO")));
- Streamer.EmitIntValue(Version, 4);
- Streamer.EmitIntValue(Flags, 4);
- Streamer.AddBlankLine();
+ SmallVector<Module::ModuleFlagEntry, 8> ModuleFlags;
+ M.getModuleFlagsMetadata(ModuleFlags);
+
+ MDNode *CFGProfile = nullptr;
+
+ for (const auto &MFE : ModuleFlags) {
+ StringRef Key = MFE.Key->getString();
+ if (Key == "CFG Profile") {
+ CFGProfile = cast<MDNode>(MFE.Val);
+ break;
+ }
+ }
+
+ if (!CFGProfile)
+ return;
+ /*MCSectionELF *Sec =
+ getContext().getELFSection(".note.llvm.callgraph", ELF::SHT_NOTE, 0);
+ Streamer.SwitchSection(Sec);
+ SmallString<256> Out;
+ for (const auto &Edge : CFGProfile->operands()) {
+ raw_svector_ostream O(Out);
+ MDNode *E = cast<MDNode>(Edge);
+ O << cast<MDString>(E->getOperand(0))->getString() << " "
+ << cast<MDString>(E->getOperand(1))->getString() << " "
+ << cast<ConstantAsMetadata>(E->getOperand(2))
+ ->getValue()
+ ->getUniqueInteger()
+ .getZExtValue()
+ << "\n";
+ Streamer.EmitBytes(O.str());
+ Out.clear();
+ }*/
+ for (const auto &Edge : CFGProfile->operands()) {
+ MDNode *E = cast<MDNode>(Edge);
+ const MCSymbol *From = Streamer.getContext().getOrCreateSymbol(
+ cast<MDString>(E->getOperand(0))->getString());
+ const MCSymbol *To = Streamer.getContext().getOrCreateSymbol(
+ cast<MDString>(E->getOperand(1))->getString());
+ uint64_t Count = cast<ConstantAsMetadata>(E->getOperand(2))
+ ->getValue()
+ ->getUniqueInteger()
+ .getZExtValue();
+ Streamer.emitCGProfileEntry(From, To, Count);
+ }
}
MCSymbol *TargetLoweringObjectFileELF::getCFIPersonalitySymbol(
diff --git a/lib/MC/ELFObjectWriter.cpp b/lib/MC/ELFObjectWriter.cpp
index e11eaaa3060..b54fc1693aa 100644
--- a/lib/MC/ELFObjectWriter.cpp
+++ b/lib/MC/ELFObjectWriter.cpp
@@ -1299,6 +1299,13 @@ void ELFObjectWriter::writeObject(MCAssembler &Asm,
}
}
+ MCSectionELF *CGProfileSection = nullptr;
+ if (!Asm.CGProfile.empty()) {
+ CGProfileSection =
+ Ctx.getELFSection(".note.llvm.cgprofile", ELF::SHT_NOTE, 0, 16, "");
+ SectionIndexMap[CGProfileSection] = addToSectionTable(CGProfileSection);
+ }
+
for (MCSectionELF *Group : Groups) {
align(Group->getAlignment());
@@ -1333,6 +1340,17 @@ void ELFObjectWriter::writeObject(MCAssembler &Asm,
SectionOffsets[RelSection] = std::make_pair(SecStart, SecEnd);
}
+ if (CGProfileSection) {
+ uint64_t SecStart = getStream().tell();
+ for (const MCAssembler::CGProfileEntry &CGPE : Asm.CGProfile) {
+ write32(CGPE.From->getIndex());
+ write32(CGPE.To->getIndex());
+ write64(CGPE.Count);
+ }
+ uint64_t SecEnd = getStream().tell();
+ SectionOffsets[CGProfileSection] = std::make_pair(SecStart, SecEnd);
+ }
+
{
uint64_t SecStart = getStream().tell();
const MCSectionELF *Sec = createStringTable(Ctx);
diff --git a/lib/MC/MCAsmStreamer.cpp b/lib/MC/MCAsmStreamer.cpp
index f48ae84950e..ba94f70a462 100644
--- a/lib/MC/MCAsmStreamer.cpp
+++ b/lib/MC/MCAsmStreamer.cpp
@@ -290,6 +290,9 @@ public:
SMLoc Loc) override;
void EmitWinEHHandlerData(SMLoc Loc) override;
+ void emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+ uint64_t Count) override;
+
void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
bool PrintSchedInfo) override;
@@ -1548,6 +1551,16 @@ void MCAsmStreamer::EmitWinCFIEndProlog(SMLoc Loc) {
EmitEOL();
}
+void MCAsmStreamer::emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+ uint64_t Count) {
+ OS << "\t.cg_profile ";
+ From->print(OS, MAI);
+ OS << ", ";
+ To->print(OS, MAI);
+ OS << ", " << Count;
+ EmitEOL();
+}
+
void MCAsmStreamer::AddEncodingComment(const MCInst &Inst,
const MCSubtargetInfo &STI,
bool PrintSchedInfo) {
diff --git a/lib/MC/MCELFStreamer.cpp b/lib/MC/MCELFStreamer.cpp
index 366125962a5..292e365b058 100644
--- a/lib/MC/MCELFStreamer.cpp
+++ b/lib/MC/MCELFStreamer.cpp
@@ -365,6 +365,11 @@ void MCELFStreamer::EmitValueToAlignment(unsigned ByteAlignment,
ValueSize, MaxBytesToEmit);
}
+void MCELFStreamer::emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+ uint64_t Count) {
+ getAssembler().CGProfile.push_back({From, To, Count});
+}
+
void MCELFStreamer::EmitIdent(StringRef IdentString) {
MCSection *Comment = getAssembler().getContext().getELFSection(
".comment", ELF::SHT_PROGBITS, ELF::SHF_MERGE | ELF::SHF_STRINGS, 1, "");
diff --git a/lib/MC/MCParser/ELFAsmParser.cpp b/lib/MC/MCParser/ELFAsmParser.cpp
index 38720c23ff2..3a62a49968a 100644
--- a/lib/MC/MCParser/ELFAsmParser.cpp
+++ b/lib/MC/MCParser/ELFAsmParser.cpp
@@ -85,6 +85,7 @@ public:
addDirectiveHandler<
&ELFAsmParser::ParseDirectiveSymbolAttribute>(".hidden");
addDirectiveHandler<&ELFAsmParser::ParseDirectiveSubsection>(".subsection");
+ addDirectiveHandler<&ELFAsmParser::ParseDirectiveCGProfile>(".cg_profile");
}
// FIXME: Part of this logic is duplicated in the MCELFStreamer. What is
@@ -149,6 +150,7 @@ public:
bool ParseDirectiveWeakref(StringRef, SMLoc);
bool ParseDirectiveSymbolAttribute(StringRef, SMLoc);
bool ParseDirectiveSubsection(StringRef, SMLoc);
+ bool ParseDirectiveCGProfile(StringRef, SMLoc);
private:
bool ParseSectionName(StringRef &SectionName);
@@ -838,6 +840,40 @@ bool ELFAsmParser::ParseDirectiveSubsection(StringRef, SMLoc) {
return false;
}
+/// ParseDirectiveCGProfile
+/// ::= .cg_profile identifier, identifier, <number>
+bool ELFAsmParser::ParseDirectiveCGProfile(StringRef, SMLoc) {
+ StringRef From;
+ if (getParser().parseIdentifier(From))
+ return TokError("expected identifier in directive");
+
+ if (getLexer().isNot(AsmToken::Comma))
+ return TokError("expected a comma");
+ Lex();
+
+ StringRef To;
+ if (getParser().parseIdentifier(To))
+ return TokError("expected identifier in directive");
+
+ if (getLexer().isNot(AsmToken::Comma))
+ return TokError("expected a comma");
+ Lex();
+
+ int64_t Count;
+ if (getParser().parseIntToken(
+ Count, "expected integer count in '.cg_profile' directive"))
+ return true;
+
+ if (getLexer().isNot(AsmToken::EndOfStatement))
+ return TokError("unexpected token in directive");
+
+ MCSymbol *FromSym = getContext().getOrCreateSymbol(From);
+ MCSymbol *ToSym = getContext().getOrCreateSymbol(To);
+
+ getStreamer().emitCGProfileEntry(FromSym, ToSym, Count);
+ return false;
+}
+
namespace llvm {
MCAsmParserExtension *createELFAsmParser() {
diff --git a/lib/MC/MCStreamer.cpp b/lib/MC/MCStreamer.cpp
index 4067df0eaf5..e6a00fb46e4 100644
--- a/lib/MC/MCStreamer.cpp
+++ b/lib/MC/MCStreamer.cpp
@@ -639,6 +639,10 @@ void MCStreamer::EmitWinEHHandlerData(SMLoc Loc) {
getContext().reportError(Loc, "Chained unwind areas can't have handlers!");
}
+void MCStreamer::emitCGProfileEntry(const MCSymbol *From, const MCSymbol *To,
+ uint64_t Count) {
+}
+
static MCSection *getWinCFISection(MCContext &Context, unsigned *NextWinCFIID,
MCSection *MainCFISec,
const MCSection *TextSec) {
diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
index 4fa780fa22c..38da7b8ba81 100644
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -654,6 +654,8 @@ void PassManagerBuilder::populateModulePassManager(
MPM.add(createConstantMergePass()); // Merge dup global constants
}
+ MPM.add(createCFGProfilePass());
+
if (MergeFunctions)
MPM.add(createMergeFunctionsPass());
diff --git a/lib/Transforms/Instrumentation/CFGProfile.cpp b/lib/Transforms/Instrumentation/CFGProfile.cpp
new file mode 100644
index 00000000000..6aa76d35a24
--- /dev/null
+++ b/lib/Transforms/Instrumentation/CFGProfile.cpp
@@ -0,0 +1,103 @@
+//===-- CFGProfile.cpp ----------------------------------------------------===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Analysis/BlockFrequencyInfo.h"
+#include "llvm/Analysis/BranchProbabilityInfo.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Instrumentation.h"
+
+#include <array>
+
+using namespace llvm;
+
+class CFGProfilePass : public ModulePass {
+public:
+ static char ID;
+
+ CFGProfilePass() : ModulePass(ID) {
+ initializeCFGProfilePassPass(
+ *PassRegistry::getPassRegistry());
+ }
+
+ StringRef getPassName() const override { return "CFGProfilePass"; }
+
+private:
+ bool runOnModule(Module &M) override;
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.addRequired<BlockFrequencyInfoWrapperPass>();
+ AU.addRequired<BranchProbabilityInfoWrapperPass>();
+ }
+};
+
+bool CFGProfilePass::runOnModule(Module &M) {
+ if (skipModule(M))
+ return false;
+
+ llvm::DenseMap<std::pair<StringRef, StringRef>, uint64_t> Counts;
+
+ for (auto &F : M) {
+ if (F.isDeclaration())
+ continue;
+ getAnalysis<BranchProbabilityInfoWrapperPass>(F).getBPI();
+ auto &BFI = getAnalysis<BlockFrequencyInfoWrapperPass>(F).getBFI();
+ for (auto &BB : F) {
+ Optional<uint64_t> BBCount = BFI.getBlockProfileCount(&BB);
+ if (!BBCount)
+ continue;
+ for (auto &I : BB) {
+ auto *CI = dyn_cast<CallInst>(&I);
+ if (!CI)
+ continue;
+ Function *CalledF = CI->getCalledFunction();
+ if (!CalledF || CalledF->isIntrinsic())
+ continue;
+
+ uint64_t &Count =
+ Counts[std::make_pair(F.getName(), CalledF->getName())];
+ Count = SaturatingAdd(Count, *BBCount);
+ }
+ }
+ }
+
+ if (Counts.empty())
+ return false;
+
+ LLVMContext &Context = M.getContext();
+ MDBuilder MDB(Context);
+ std::vector<Metadata *> Nodes;
+
+ for (auto E : Counts) {
+ SmallVector<Metadata *, 3> Vals;
+ Vals.push_back(MDB.createString(E.first.first));
+ Vals.push_back(MDB.createString(E.first.second));
+ Vals.push_back(MDB.createConstant(
+ ConstantInt::get(Type::getInt64Ty(Context), E.second)));
+ Nodes.push_back(MDNode::get(Context, Vals));
+ }
+
+ M.addModuleFlag(Module::Append, "CFG Profile", MDNode::get(Context, Nodes));
+
+ return true;
+}
+
+char CFGProfilePass::ID = 0;
+INITIALIZE_PASS_BEGIN(CFGProfilePass, "cfg-profile",
+ "Generate profile information from the CFG.", false, false)
+ INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
+ INITIALIZE_PASS_DEPENDENCY(BranchProbabilityInfoWrapperPass)
+ INITIALIZE_PASS_END(CFGProfilePass, "cfg-profile",
+ "Generate profile information from the CFG.", false, false)
+
+ModulePass *llvm::createCFGProfilePass() {
+ return new CFGProfilePass();
+}
diff --git a/lib/Transforms/Instrumentation/CMakeLists.txt b/lib/Transforms/Instrumentation/CMakeLists.txt
index f2806e278e6..9b33edf0631 100644
--- a/lib/Transforms/Instrumentation/CMakeLists.txt
+++ b/lib/Transforms/Instrumentation/CMakeLists.txt
@@ -1,6 +1,7 @@
add_llvm_library(LLVMInstrumentation
AddressSanitizer.cpp
BoundsChecking.cpp
+ CFGProfile.cpp
DataFlowSanitizer.cpp
GCOVProfiling.cpp
MemorySanitizer.cpp
diff --git a/lib/Transforms/Instrumentation/Instrumentation.cpp b/lib/Transforms/Instrumentation/Instrumentation.cpp
index 7bb62d2c845..d147a521683 100644
--- a/lib/Transforms/Instrumentation/Instrumentation.cpp
+++ b/lib/Transforms/Instrumentation/Instrumentation.cpp
@@ -60,6 +60,7 @@ void llvm::initializeInstrumentation(PassRegistry &Registry) {
initializeAddressSanitizerModulePass(Registry);
initializeBoundsCheckingPass(Registry);
initializeGCOVProfilerLegacyPassPass(Registry);
+ initializeCFGProfilePassPass(Registry);
initializePGOInstrumentationGenLegacyPassPass(Registry);
initializePGOInstrumentationUseLegacyPassPass(Registry);
initializePGOIndirectCallPromotionLegacyPassPass(Registry);
diff --git a/tools/llvm-readobj/ELFDumper.cpp b/tools/llvm-readobj/ELFDumper.cpp
index be976ca8826..d7dd759051a 100644
--- a/tools/llvm-readobj/ELFDumper.cpp
+++ b/tools/llvm-readobj/ELFDumper.cpp
@@ -97,6 +97,7 @@ using namespace ELF;
using Elf_Vernaux = typename ELFO::Elf_Vernaux; \
using Elf_Verdef = typename ELFO::Elf_Verdef; \
using Elf_Verdaux = typename ELFO::Elf_Verdaux; \
+ using Elf_CGProfile = typename ELFT::CGProfile; \
using uintX_t = typename ELFO::uintX_t;
namespace {
@@ -161,6 +162,8 @@ public:
void printHashHistogram() override;
+ void printCGProfile() override;
+
void printNotes() override;
private:
@@ -205,6 +208,7 @@ private:
const Elf_Hash *HashTable = nullptr;
const Elf_GnuHash *GnuHashTable = nullptr;
const Elf_Shdr *DotSymtabSec = nullptr;
+ const Elf_Shdr *DotCGProfileSec = nullptr;
StringRef DynSymtabName;
ArrayRef<Elf_Word> ShndxTable;
@@ -249,9 +253,11 @@ public:
Elf_Rela_Range dyn_relas() const;
std::string getFullSymbolName(const Elf_Sym *Symbol, StringRef StrTable,
bool IsDynamic) const;
+ StringRef getStaticSymbolName(uint32_t Index) const;
void printSymbolsHelper(bool IsDynamic) const;
const Elf_Shdr *getDotSymtabSec() const { return DotSymtabSec; }
+ const Elf_Shdr *getDotCGProfileSec() const { return DotCGProfileSec; }
ArrayRef<Elf_Word> getShndxTable() const { return ShndxTable; }
StringRef getDynamicStringTable() const { return DynamicStringTable; }
const DynRegionInfo &getDynRelRegion() const { return DynRelRegion; }
@@ -309,6 +315,7 @@ public:
bool IsDynamic) = 0;
virtual void printProgramHeaders(const ELFFile<ELFT> *Obj) = 0;
virtual void printHashHistogram(const ELFFile<ELFT> *Obj) = 0;
+ virtual void printCGProfile(const ELFFile<ELFT> *Obj) = 0;
virtual void printNotes(const ELFFile<ELFT> *Obj) = 0;
const ELFDumper<ELFT> *dumper() const { return Dumper; }
@@ -336,6 +343,7 @@ public:
size_t Offset) override;
void printProgramHeaders(const ELFO *Obj) override;
void printHashHistogram(const ELFFile<ELFT> *Obj) override;
+ void printCGProfile(const ELFFile<ELFT> *Obj) override;
void printNotes(const ELFFile<ELFT> *Obj) override;
private:
@@ -394,6 +402,7 @@ public:
void printDynamicRelocations(const ELFO *Obj) override;
void printProgramHeaders(const ELFO *Obj) override;
void printHashHistogram(const ELFFile<ELFT> *Obj) override;
+ void printCGProfile(const ELFFile<ELFT> *Obj) override;
void printNotes(const ELFFile<ELFT> *Obj) override;
private:
@@ -734,6 +743,16 @@ std::string ELFDumper<ELFT>::getFullSymbolName(const Elf_Sym *Symbol,
return FullSymbolName;
}
+template <typename ELFT>
+StringRef ELFDumper<ELFT>::getStaticSymbolName(uint32_t Index) const {
+ StringRef StrTable = unwrapOrError(Obj->getStringTableForSymtab(*DotSymtabSec));
+ Elf_Sym_Range Syms = unwrapOrError(Obj->symbols(DotSymtabSec));
+ if (Index >= Syms.size())
+ reportError("Invalid symbol index");
+ const Elf_Sym *Sym = &Syms[Index];
+ return unwrapOrError(Sym->getName(StrTable));
+}
+
template <typename ELFT>
static void
getSectionNameIndex(const ELFFile<ELFT> &Obj, const typename ELFT::Sym *Symbol,
@@ -1341,6 +1360,12 @@ ELFDumper<ELFT>::ELFDumper(const ELFFile<ELFT> *Obj, ScopedPrinter &Writer)
reportError("Multiple SHT_GNU_verneed");
dot_gnu_version_r_sec = &Sec;
break;
+ case ELF::SHT_NOTE:
+ if (unwrapOrError(Obj->getSectionName(&Sec)) != ".note.llvm.cgprofile")
+ break;
+ if (DotCGProfileSec != nullptr)
+ reportError("Multiple .note.llvm.cgprofile");
+ DotCGProfileSec = &Sec;
}
}
@@ -1482,6 +1507,10 @@ template <class ELFT> void ELFDumper<ELFT>::printHashHistogram() {
ELFDumperStyle->printHashHistogram(Obj);
}
+template <class ELFT> void ELFDumper<ELFT>::printCGProfile() {
+ ELFDumperStyle->printCGProfile(Obj);
+}
+
template <class ELFT> void ELFDumper<ELFT>::printNotes() {
ELFDumperStyle->printNotes(Obj);
}
@@ -3344,6 +3373,11 @@ void GNUStyle<ELFT>::printHashHistogram(const ELFFile<ELFT> *Obj) {
}
}
+template <class ELFT>
+void GNUStyle<ELFT>::printCGProfile(const ELFFile<ELFT> *Obj) {
+ OS<< "GNUStyle::printCGProfile not implemented\n";
+}
+
static std::string getGNUNoteTypeName(const uint32_t NT) {
static const struct {
uint32_t ID;
@@ -3937,6 +3971,22 @@ void LLVMStyle<ELFT>::printHashHistogram(const ELFFile<ELFT> *Obj) {
W.startLine() << "Hash Histogram not implemented!\n";
}
+
+
+template <class ELFT>
+void LLVMStyle<ELFT>::printCGProfile(const ELFFile<ELFT> *Obj) {
+ ListScope L(W, "CGProfile");
+ if (!this->dumper()->getDotCGProfileSec())
+ return;
+ auto CGProfile = unwrapOrError(Obj->template getSectionContentsAsArray<Elf_CGProfile>(this->dumper()->getDotCGProfileSec()));
+ for (const Elf_CGProfile &CGPE : CGProfile) {
+ DictScope D(W, "CGProfileEntry");
+ W.printNumber("From", this->dumper()->getStaticSymbolName(CGPE.cgp_from), CGPE.cgp_from);
+ W.printNumber("To", this->dumper()->getStaticSymbolName(CGPE.cgp_to), CGPE.cgp_to);
+ W.printNumber("Weight", CGPE.cgp_weight);
+ }
+}
+
template <class ELFT>
void LLVMStyle<ELFT>::printNotes(const ELFFile<ELFT> *Obj) {
W.startLine() << "printNotes not implemented!\n";
diff --git a/tools/llvm-readobj/ObjDumper.h b/tools/llvm-readobj/ObjDumper.h
index f283e559e2a..84c259e3fb4 100644
--- a/tools/llvm-readobj/ObjDumper.h
+++ b/tools/llvm-readobj/ObjDumper.h
@@ -47,6 +47,7 @@ public:
virtual void printVersionInfo() {}
virtual void printGroupSections() {}
virtual void printHashHistogram() {}
+ virtual void printCGProfile() {}
virtual void printNotes() {}
// Only implemented for ARM ELF at this time.
diff --git a/tools/llvm-readobj/llvm-readobj.cpp b/tools/llvm-readobj/llvm-readobj.cpp
index 05b7c800cc1..1eb39b1bb3a 100644
--- a/tools/llvm-readobj/llvm-readobj.cpp
+++ b/tools/llvm-readobj/llvm-readobj.cpp
@@ -284,6 +284,8 @@ namespace opts {
cl::alias HashHistogramShort("I", cl::desc("Alias for -elf-hash-histogram"),
cl::aliasopt(HashHistogram));
+ cl::opt<bool> CGProfile("elf-cg-profile", cl::desc("Display callgraph profile section"));
+
cl::opt<OutputStyleTy>
Output("elf-output-style", cl::desc("Specify ELF dump style"),
cl::values(clEnumVal(LLVM, "LLVM default style"),
@@ -439,6 +441,8 @@ static void dumpObject(const ObjectFile *Obj) {
Dumper->printGroupSections();
if (opts::HashHistogram)
Dumper->printHashHistogram();
+ if (opts::CGProfile)
+ Dumper->printCGProfile();
if (opts::Notes)
Dumper->printNotes();
}
-------------- next part --------------
diff --git a/ELF/CMakeLists.txt b/ELF/CMakeLists.txt
index 205702975..3b3c388d2 100644
--- a/ELF/CMakeLists.txt
+++ b/ELF/CMakeLists.txt
@@ -18,6 +18,7 @@ add_lld_library(lldELF
Arch/SPARCV9.cpp
Arch/X86.cpp
Arch/X86_64.cpp
+ CallGraphSort.cpp
Driver.cpp
DriverUtils.cpp
EhFrame.cpp
diff --git a/ELF/CallGraphSort.cpp b/ELF/CallGraphSort.cpp
new file mode 100644
index 000000000..3e4e12867
--- /dev/null
+++ b/ELF/CallGraphSort.cpp
@@ -0,0 +1,276 @@
+//===- CallGraphSort.cpp --------------------------------------------------===//
+//
+// The LLVM Linker
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file This file implements Call-Chain Clustering from:
+/// Optimizing Function Placement for Large-Scale Data-Center Applications
+/// https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
+///
+/// The goal of this algorithm is to improve runtime performance of the final
+/// executable by arranging code sections such that page table and i-cache
+/// misses are minimized.
+///
+/// It does so given a call graph profile by the following:
+/// * Build a call graph from the profile
+/// * While there are unresolved edges
+/// * Find the edge with the highest weight
+/// * Check if merging the two clusters would create a cluster larger than the
+/// target page size
+/// * If not, contract that edge putting the callee after the caller
+/// * Sort remaining clusters by density
+///
+//===----------------------------------------------------------------------===//
+
+#include "CallGraphSort.h"
+#include "SymbolTable.h"
+#include "Target.h"
+
+#include "llvm/Support/MathExtras.h"
+
+#include <queue>
+#include <unordered_set>
+
+using namespace llvm;
+using namespace lld;
+using namespace lld::elf;
+
+namespace {
+class CallGraphSort {
+ using NodeIndex = std::ptrdiff_t;
+ using EdgeIndex = std::ptrdiff_t;
+
+ struct Node {
+ Node() = default;
+ Node(const InputSectionBase *IS);
+ std::vector<const InputSectionBase *> Sections;
+ std::vector<EdgeIndex> IncidentEdges;
+ int64_t Size = 0;
+ uint64_t Weight = 0;
+ };
+
+ struct Edge {
+ NodeIndex From;
+ NodeIndex To;
+ mutable uint64_t Weight;
+ bool operator==(const Edge Other) const;
+ bool operator<(const Edge Other) const;
+ void kill();
+ bool isDead() const;
+ };
+
+ std::vector<Node> Nodes;
+ std::vector<Edge> Edges;
+ struct EdgePriorityCmp {
+ std::vector<Edge> &Edges;
+ bool operator()(EdgeIndex A, EdgeIndex B) const {
+ return Edges[A].Weight < Edges[B].Weight;
+ }
+ };
+ std::priority_queue<EdgeIndex, std::vector<EdgeIndex>, EdgePriorityCmp>
+ WorkQueue{EdgePriorityCmp{Edges}};
+
+ void contractEdge(EdgeIndex CEI);
+ void generateClusters();
+
+public:
+ CallGraphSort(DenseMap<std::pair<const SymbolBody *, const SymbolBody *>,
+ uint64_t> &Profile);
+
+ DenseMap<const InputSectionBase *, int> run();
+};
+} // end anonymous namespace
+
+CallGraphSort::Node::Node(const InputSectionBase *IS) {
+ Sections.push_back(IS);
+ Size = IS->getSize();
+}
+
+bool CallGraphSort::Edge::operator==(const Edge Other) const {
+ return From == Other.From && To == Other.To;
+}
+
+bool CallGraphSort::Edge::operator<(const Edge Other) const {
+ if (From != Other.From)
+ return From < Other.From;
+ if (To != Other.To)
+ return To < Other.To;
+ return false;
+}
+
+void CallGraphSort::Edge::kill() {
+ From = 0;
+ To = 0;
+}
+
+bool CallGraphSort::Edge::isDead() const { return From == 0 && To == 0; }
+
+// Take the edge list in Config->CallGraphProfile, resolve symbol names to
+// SymbolBodys, and generate a graph between InputSections with the provided
+// weights.
+CallGraphSort::CallGraphSort(
+ DenseMap<std::pair<const SymbolBody *, const SymbolBody *>, uint64_t>
+ &Profile) {
+ DenseMap<const InputSectionBase *, NodeIndex> SecToNode;
+ std::map<Edge, EdgeIndex> EdgeMap;
+
+ auto GetOrCreateNode = [&](const InputSectionBase *IS) -> NodeIndex {
+ auto Res = SecToNode.insert(std::make_pair(IS, Nodes.size()));
+ if (Res.second)
+ Nodes.emplace_back(IS);
+ return Res.first->second;
+ };
+
+ // Create the graph.
+ for (const auto &C : Profile) {
+ if (C.second == 0)
+ continue;
+ auto FromDR = dyn_cast_or_null<DefinedRegular>(C.first.first);
+ auto ToDR = dyn_cast_or_null<DefinedRegular>(C.first.second);
+ if (!FromDR || !ToDR)
+ continue;
+ auto FromSB = dyn_cast_or_null<const InputSectionBase>(FromDR->Section);
+ auto ToSB = dyn_cast_or_null<const InputSectionBase>(ToDR->Section);
+ if (!FromSB || !ToSB || FromSB->getSize() == 0 || ToSB->getSize() == 0)
+ continue;
+ NodeIndex From = GetOrCreateNode(FromSB);
+ NodeIndex To = GetOrCreateNode(ToSB);
+ Edge E{From, To, C.second};
+ auto Res = EdgeMap.insert(std::make_pair(E, Edges.size()));
+ EdgeIndex EI = Res.first->second;
+ if (Res.second) {
+ Edges.push_back(E);
+ Nodes[From].IncidentEdges.push_back(EI);
+ Nodes[To].IncidentEdges.push_back(EI);
+ } else
+ Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, C.second);
+ Nodes[To].Weight = SaturatingAdd(Nodes[To].Weight, C.second);
+ }
+}
+
+void CallGraphSort::contractEdge(EdgeIndex CEI) {
+ // Make a copy of the edge as the original will be marked killed while being
+ // used.
+ Edge CE = Edges[CEI];
+ std::vector<EdgeIndex> &FE = Nodes[CE.From].IncidentEdges;
+ // Remove the self edge from From.
+ FE.erase(std::remove(FE.begin(), FE.end(), CEI));
+ std::vector<EdgeIndex> &TE = Nodes[CE.To].IncidentEdges;
+ // Update all edges incident with To to reference From instead. Then if they
+ // aren't self edges add them to From.
+ for (EdgeIndex EI : TE) {
+ Edge &E = Edges[EI];
+ // E.From = E.From == CE.To ? CE.From : E.From;
+ // E.To = E.To == CE.To ? CE.From : E.To;
+ if (E.From == CE.To)
+ E.From = CE.From;
+ if (E.To == CE.To)
+ E.To = CE.From;
+ if (E.To == E.From) {
+ E.kill();
+ continue;
+ }
+ FE.push_back(EI);
+ }
+
+ // Free memory.
+ std::vector<EdgeIndex>().swap(TE);
+
+ if (FE.empty())
+ return;
+
+ // Sort edges so they can be merged. The stability of this sort doesn't matter
+ // as equal edges will be merged in an order independent manner.
+ std::sort(FE.begin(), FE.end(),
+ [&](EdgeIndex AI, EdgeIndex BI) { return Edges[AI] < Edges[BI]; });
+
+ // std::unique, but also merge equal values.
+ auto First = FE.begin();
+ auto Last = FE.end();
+ auto Result = First;
+ while (++First != Last) {
+ if (Edges[*Result] == Edges[*First]) {
+ Edges[*Result].Weight =
+ SaturatingAdd(Edges[*Result].Weight, Edges[*First].Weight);
+ Edges[*First].kill();
+ // Add the updated edge to the work queue without removing the previous
+ // entry. Edges will never be contracted twice as they are marked as dead.
+ WorkQueue.push(*Result);
+ } else if (++Result != First)
+ *Result = *First;
+ }
+ FE.erase(++Result, FE.end());
+}
+
+// Group InputSections into clusters using the Call-Chain Clustering heuristic
+// then sort the clusters by density.
+void CallGraphSort::generateClusters() {
+ for (size_t I = 0; I < Edges.size(); ++I)
+ WorkQueue.push(I);
+ // Collapse the graph.
+ while (!WorkQueue.empty()) {
+ EdgeIndex MaxI = WorkQueue.top();
+ const Edge MaxE = Edges[MaxI];
+ WorkQueue.pop();
+ if (MaxE.isDead())
+ continue;
+ // Merge the Nodes.
+ Node &From = Nodes[MaxE.From];
+ Node &To = Nodes[MaxE.To];
+ if (From.Size + To.Size > Target->PageSize)
+ continue;
+ contractEdge(MaxI);
+ From.Sections.insert(From.Sections.end(), To.Sections.begin(),
+ To.Sections.end());
+ From.Size += To.Size;
+ From.Weight = SaturatingAdd(From.Weight, To.Weight);
+ To.Sections.clear();
+ To.Size = 0;
+ To.Weight = 0;
+ }
+
+ // Remove empty or dead nodes.
+ Nodes.erase(std::remove_if(Nodes.begin(), Nodes.end(),
+ [](const Node &N) {
+ return N.Size == 0 || N.Sections.empty();
+ }),
+ Nodes.end());
+
+ // Sort by density. Invalidates all NodeIndexs.
+ std::sort(Nodes.begin(), Nodes.end(), [](const Node &A, const Node &B) {
+ return (APFloat(APFloat::IEEEdouble(), A.Weight) /
+ APFloat(APFloat::IEEEdouble(), A.Size))
+ .compare(APFloat(APFloat::IEEEdouble(), B.Weight) /
+ APFloat(APFloat::IEEEdouble(), B.Size)) ==
+ APFloat::cmpLessThan;
+ });
+}
+
+DenseMap<const InputSectionBase *, int> CallGraphSort::run() {
+ generateClusters();
+
+ // Generate order.
+ llvm::DenseMap<const InputSectionBase *, int> OrderMap;
+ ssize_t CurOrder = 1;
+
+ for (const Node &N : Nodes)
+ for (const InputSectionBase *IS : N.Sections)
+ OrderMap[IS] = CurOrder++;
+
+ return OrderMap;
+}
+
+// Sort sections by the profile data provided by -callgraph-profile-file
+//
+// This first builds a call graph based on the profile data then iteratively
+// merges the hottest call edges as long as it would not create a cluster larger
+// than the page size. All clusters are then sorted by a density metric to
+// further improve locality.
+DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder() {
+ CallGraphSort CGS(Config->CallGraphProfile);
+ return CGS.run();
+}
diff --git a/ELF/CallGraphSort.h b/ELF/CallGraphSort.h
new file mode 100644
index 000000000..46455489c
--- /dev/null
+++ b/ELF/CallGraphSort.h
@@ -0,0 +1,24 @@
+//===- CallGraphSort.h ------------------------------------------*- C++ -*-===//
+//
+// The LLVM Linker
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLD_ELF_CALL_GRAPH_SORT_H
+#define LLD_ELF_CALL_GRAPH_SORT_H
+
+#include "llvm/ADT/DenseMap.h"
+
+namespace lld {
+namespace elf {
+class InputSectionBase;
+
+llvm::DenseMap<const InputSectionBase *, int>
+computeCallGraphProfileOrder();
+}
+}
+
+#endif
diff --git a/ELF/Config.h b/ELF/Config.h
index d5d830921..5ac01a3ee 100644
--- a/ELF/Config.h
+++ b/ELF/Config.h
@@ -10,6 +10,7 @@
#ifndef LLD_ELF_CONFIG_H
#define LLD_ELF_CONFIG_H
+#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/StringSet.h"
@@ -24,7 +25,7 @@ namespace lld {
namespace elf {
class InputFile;
-struct Symbol;
+class SymbolBody;
enum ELFKind {
ELFNoneKind,
@@ -89,6 +90,7 @@ struct Configuration {
llvm::StringRef SoName;
llvm::StringRef Sysroot;
llvm::StringRef ThinLTOCacheDir;
+ llvm::StringRef CallGraphProfileFile;
std::string Rpath;
std::vector<VersionDefinition> VersionDefinitions;
std::vector<llvm::StringRef> Argv;
@@ -101,10 +103,13 @@ struct Configuration {
std::vector<SymbolVersion> VersionScriptGlobals;
std::vector<SymbolVersion> VersionScriptLocals;
std::vector<uint8_t> BuildIdVector;
+ llvm::DenseMap<std::pair<const SymbolBody *, const SymbolBody *>, uint64_t>
+ CallGraphProfile;
bool AllowMultipleDefinition;
bool AsNeeded = false;
bool Bsymbolic;
bool BsymbolicFunctions;
+ bool CallGraphProfileSort = true;
bool ColorDiagnostics = false;
bool CompressDebugSections;
bool DefineCommon;
diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
index af56b2ba3..f5aebfa7e 100644
--- a/ELF/Driver.cpp
+++ b/ELF/Driver.cpp
@@ -601,6 +601,42 @@ static std::vector<StringRef> getLines(MemoryBufferRef MB) {
return Ret;
}
+// This reads a list of call edges with weights one line at a time from a file
+// with the following format for each line:
+//
+// ^[.*]+ [.*]+ [.*]+$
+//
+// It interprets the first value as an unsigned 64 bit weight, the second as
+// the symbol the call is from, and the third as the symbol the call is to.
+//
+// Example:
+//
+// 5000 c a
+// 4000 c b
+// 18446744073709551615 e d
+//
+template <typename ELFT>
+void readCallGraphProfile(MemoryBufferRef MB) {
+ for (StringRef L : getLines(MB)) {
+ SmallVector<StringRef, 3> Fields;
+ L.split(Fields, ' ');
+ if (Fields.size() != 3) {
+ error("parse error: " + MB.getBufferIdentifier() + ": " + L);
+ return;
+ }
+ uint64_t Count;
+ if (!to_integer(Fields[0], Count)) {
+ error("parse error: " + MB.getBufferIdentifier() + ": " + L);
+ return;
+ }
+ StringRef From = Fields[1];
+ StringRef To = Fields[2];
+ Config->CallGraphProfile[std::make_pair(
+ Symtab->addUndefined<ELFT>(From)->body(),
+ Symtab->addUndefined<ELFT>(To)->body())] = Count;
+ }
+}
+
static bool getCompressDebugSections(opt::InputArgList &Args) {
StringRef S = Args.getLastArgValue(OPT_compress_debug_sections, "none");
if (S == "none")
@@ -626,6 +662,8 @@ void LinkerDriver::readConfigs(opt::InputArgList &Args) {
Config->AuxiliaryList = getArgs(Args, OPT_auxiliary);
Config->Bsymbolic = Args.hasArg(OPT_Bsymbolic);
Config->BsymbolicFunctions = Args.hasArg(OPT_Bsymbolic_functions);
+ Config->CallGraphProfileSort = getArg(Args, OPT_call_graph_profile_sort,
+ OPT_no_call_graph_profile_sort, true);
Config->Chroot = Args.getLastArgValue(OPT_chroot);
Config->CompressDebugSections = getCompressDebugSections(Args);
Config->DefineCommon = getArg(Args, OPT_define_common, OPT_no_define_common,
@@ -768,6 +806,9 @@ void LinkerDriver::readConfigs(opt::InputArgList &Args) {
if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))
Config->SymbolOrderingFile = getLines(*Buffer);
+ if (auto *Arg = Args.getLastArg(OPT_call_graph_profile_file))
+ Config->CallGraphProfileFile = Arg->getValue();
+
// If --retain-symbol-file is used, we'll keep only the symbols listed in
// the file and discard all others.
if (auto *Arg = Args.getLastArg(OPT_retain_symbols_file)) {
@@ -1033,6 +1074,11 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &Args) {
Config->HasDynSymTab =
!SharedFiles.empty() || Config->Pic || Config->ExportDynamic;
+ if (!Config->CallGraphProfileFile.empty())
+ if (Optional<MemoryBufferRef> Buffer =
+ readFile(Config->CallGraphProfileFile))
+ readCallGraphProfile<ELFT>(*Buffer);
+
// Some symbols (such as __ehdr_start) are defined lazily only when there
// are undefined symbols for them, so we add these to trigger that logic.
for (StringRef Sym : Script->ReferencedSymbols)
diff --git a/ELF/InputFiles.cpp b/ELF/InputFiles.cpp
index fb9559131..1b6b82e0a 100644
--- a/ELF/InputFiles.cpp
+++ b/ELF/InputFiles.cpp
@@ -112,6 +112,15 @@ std::string ObjFile<ELFT>::getLineInfo(InputSectionBase *S, uint64_t Offset) {
return "";
}
+template<class ELFT>
+void lld::elf::ObjFile<ELFT>::parseCGProfile() {
+ for (const Elf_CGProfile &CGPE : CGProfile) {
+ uint64_t &C = Config->CallGraphProfile[std::make_pair(
+ &getSymbolBody(CGPE.cgp_from), &getSymbolBody(CGPE.cgp_to))];
+ C = std::max(C, (uint64_t)CGPE.cgp_weight);
+ }
+}
+
// Returns "<internal>", "foo.a(bar.o)" or "baz.o".
std::string lld::toString(const InputFile *F) {
if (!F)
@@ -177,6 +186,7 @@ void ObjFile<ELFT>::parse(DenseSet<CachedHashStringRef> &ComdatGroups) {
// Read section and symbol tables.
initializeSections(ComdatGroups);
initializeSymbols();
+ parseCGProfile();
}
// Sections with SHT_GROUP and comdat bits define comdat section groups.
@@ -496,6 +506,13 @@ InputSectionBase *ObjFile<ELFT>::createInputSection(const Elf_Shdr &Sec) {
if (Name == ".eh_frame" && !Config->Relocatable)
return make<EhInputSection>(this, &Sec, Name);
+ // Profile data.
+ if (Name == ".note.llvm.cgprofile") {
+ CGProfile = check(
+ this->getObj().template getSectionContentsAsArray<Elf_CGProfile>(&Sec));
+ return &InputSection::Discarded;
+ }
+
if (shouldMerge(Sec))
return make<MergeInputSection>(this, &Sec, Name);
return make<InputSection>(this, &Sec, Name);
diff --git a/ELF/InputFiles.h b/ELF/InputFiles.h
index 00c8ee936..17692299c 100644
--- a/ELF/InputFiles.h
+++ b/ELF/InputFiles.h
@@ -154,6 +154,7 @@ template <class ELFT> class ObjFile : public ELFFileBase<ELFT> {
typedef typename ELFT::Sym Elf_Sym;
typedef typename ELFT::Shdr Elf_Shdr;
typedef typename ELFT::Word Elf_Word;
+ typedef typename ELFT::CGProfile Elf_CGProfile;
StringRef getShtGroupSignature(ArrayRef<Elf_Shdr> Sections,
const Elf_Shdr &Sec);
@@ -201,6 +202,7 @@ private:
initializeSections(llvm::DenseSet<llvm::CachedHashStringRef> &ComdatGroups);
void initializeSymbols();
void initializeDwarfLine();
+ void parseCGProfile();
InputSectionBase *getRelocTarget(const Elf_Shdr &Sec);
InputSectionBase *createInputSection(const Elf_Shdr &Sec);
StringRef getSectionName(const Elf_Shdr &Sec);
@@ -217,6 +219,8 @@ private:
// parse it only once for each object file we link.
std::unique_ptr<llvm::DWARFDebugLine> DwarfLine;
llvm::once_flag InitDwarfLine;
+
+ ArrayRef<Elf_CGProfile> CGProfile;
};
// LazyObjFile is analogous to ArchiveFile in the sense that
diff --git a/ELF/Options.td b/ELF/Options.td
index 316c1162d..1334fbb80 100644
--- a/ELF/Options.td
+++ b/ELF/Options.td
@@ -51,6 +51,12 @@ def allow_multiple_definition: F<"allow-multiple-definition">,
def as_needed: F<"as-needed">,
HelpText<"Only set DT_NEEDED for shared libraries if used">;
+def call_graph_profile_file: S<"call-graph-profile-file">,
+ HelpText<"Layout sections to optimize the given callgraph">;
+
+def call_graph_profile_sort: F<"call-graph-profile-sort">,
+ HelpText<"Sort sections by call graph profile information">;
+
// -chroot doesn't have a help text because it is an internal option.
def chroot: S<"chroot">;
@@ -163,6 +169,9 @@ def nostdlib: F<"nostdlib">,
def no_as_needed: F<"no-as-needed">,
HelpText<"Always DT_NEEDED for shared libraries">;
+def no_call_graph_profile_sort: F<"no-call-graph-profile-sort">,
+ HelpText<"Don't sort sections by call graph profile information">;
+
def no_color_diagnostics: F<"no-color-diagnostics">,
HelpText<"Do not use colors in diagnostics">;
diff --git a/ELF/Writer.cpp b/ELF/Writer.cpp
index 836d36a51..8daa84118 100644
--- a/ELF/Writer.cpp
+++ b/ELF/Writer.cpp
@@ -8,6 +8,7 @@
//===----------------------------------------------------------------------===//
#include "Writer.h"
+#include "CallGraphSort.h"
#include "Config.h"
#include "Filesystem.h"
#include "LinkerScript.h"
@@ -869,6 +870,20 @@ template <class ELFT> void Writer<ELFT>::createSections() {
Vec.end());
Script->fabricateDefaultCommands();
+
+ // Use the rarely used option -call-graph-ordering-file to sort sections.
+ if (Config->CallGraphProfileSort && !Config->CallGraphProfile.empty()) {
+ DenseMap<const InputSectionBase *, int> OrderMap =
+ computeCallGraphProfileOrder();
+
+ for (BaseCommand *Base : Script->SectionCommands)
+ if (auto *Sec = dyn_cast<OutputSection>(Base))
+ if (Sec->Name == ".text") {
+ Sec->sort([&](InputSectionBase *S) { return OrderMap.lookup(S); });
+ break;
+ }
+ }
+
sortBySymbolsOrder();
sortInitFini(findSection(".init_array"));
sortInitFini(findSection(".fini_array"));
diff --git a/test/ELF/Inputs/cgprofile.txt b/test/ELF/Inputs/cgprofile.txt
new file mode 100644
index 000000000..6b60397a6
--- /dev/null
+++ b/test/ELF/Inputs/cgprofile.txt
@@ -0,0 +1,7 @@
+5000 c a
+4000 c b
+0 d e
+18446744073709551615 e d
+18446744073709551611 f d
+18446744073709551612 f e
+6000 c h
diff --git a/test/ELF/cgprofile-object.s b/test/ELF/cgprofile-object.s
new file mode 100644
index 000000000..b308d58de
--- /dev/null
+++ b/test/ELF/cgprofile-object.s
@@ -0,0 +1,50 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
+# RUN: ld.lld %t -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s
+# RUN: ld.lld %t -o %t2 -no-call-graph-profile-sort
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT
+
+ .section .text.hot._Z4fooav,"ax", at progbits
+ .globl _Z4fooav
+_Z4fooav:
+ retq
+
+ .section .text.hot._Z4foobv,"ax", at progbits
+ .globl _Z4foobv
+_Z4foobv:
+ retq
+
+ .section .text.hot._Z3foov,"ax", at progbits
+ .globl _Z3foov
+_Z3foov:
+ retq
+
+ .section .text.hot._start,"ax", at progbits
+ .globl _start
+_start:
+ retq
+
+
+ .cg_profile _start, _Z3foov, 1
+ .cg_profile _Z4fooav, _Z4foobv, 1
+ .cg_profile _Z3foov, _Z4fooav, 1
+
+# CHECK: Name: _Z3foov
+# CHECK-NEXT: Value: 0x201001
+# CHECK: Name: _Z4fooav
+# CHECK-NEXT: Value: 0x201002
+# CHECK: Name: _Z4foobv
+# CHECK-NEXT: Value: 0x201003
+# CHECK: Name: _start
+# CHECK-NEXT: Value: 0x201000
+
+# NOSORT: Name: _Z3foov
+# NOSORT-NEXT: Value: 0x201002
+# NOSORT: Name: _Z4fooav
+# NOSORT-NEXT: Value: 0x201000
+# NOSORT: Name: _Z4foobv
+# NOSORT-NEXT: Value: 0x201001
+# NOSORT: Name: _start
+# NOSORT-NEXT: Value: 0x201003
diff --git a/test/ELF/cgprofile.s b/test/ELF/cgprofile.s
new file mode 100644
index 000000000..ce0e0a51b
--- /dev/null
+++ b/test/ELF/cgprofile.s
@@ -0,0 +1,128 @@
+# REQUIRES: x86
+#
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t1
+# RUN: ld.lld %t1 -e a -o %t -call-graph-profile-file %p/Inputs/cgprofile.txt
+# RUN: llvm-readobj -symbols %t | FileCheck %s
+
+ .section .text.a,"ax", at progbits
+ .global a
+a:
+ .zero 20
+
+ .section .text.b,"ax", at progbits
+ .global b
+b:
+ .zero 1
+
+ .section .text.c,"ax", at progbits
+ .global c
+c:
+ .zero 4095
+
+ .section .text.d,"ax", at progbits
+ .global d
+d:
+ .zero 51
+
+ .section .text.e,"ax", at progbits
+ .global e
+e:
+ .zero 42
+
+ .section .text.f,"ax", at progbits
+ .global f
+f:
+ .zero 42
+
+ .section .text.g,"ax", at progbits
+ .global g
+g:
+ .zero 34
+
+ .section .text.h,"ax", at progbits
+ .global h
+h:
+
+# CHECK: Symbols [
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: (0)
+# CHECK-NEXT: Value: 0x0
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Local (0x0)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: Undefined (0x0)
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: a
+# CHECK-NEXT: Value: 0x202022
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: b
+# CHECK-NEXT: Value: 0x202021
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: c
+# CHECK-NEXT: Value: 0x201022
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: d
+# CHECK-NEXT: Value: 0x20208A
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: e
+# CHECK-NEXT: Value: 0x202060
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: f
+# CHECK-NEXT: Value: 0x202036
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: g
+# CHECK-NEXT: Value: 0x201000
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT: Symbol {
+# CHECK-NEXT: Name: h
+# CHECK-NEXT: Value: 0x201022
+# CHECK-NEXT: Size: 0
+# CHECK-NEXT: Binding: Global (0x1)
+# CHECK-NEXT: Type: None (0x0)
+# CHECK-NEXT: Other: 0
+# CHECK-NEXT: Section: .text
+# CHECK-NEXT: }
+# CHECK-NEXT:]
More information about the llvm-commits
mailing list