[LLVMdev] [Proposal] Parallelize post-IPO stage.
Shuxin Yang
shuxin.llvm at gmail.com
Fri Jul 12 15:49:06 PDT 2013
Hi, There:
This is the proposal for parallelizing post-ipo stage. See the
following for details.
I also attach a toy-grade rudimentary implementation. This
implementation can be
used to illustrate some concepts here. This patch is not going to be
committed.
Unfortunately, this weekend I will be too busy to read emails. Please
do not construe
delayed response as being rude :-).
Thanks a lot in advance for your time insightful comments!
Shuxin
The proposal
------------
It is organized as following:
1) background info, if you heard "/usr/bin/ls", please skip it
2) the motivation of parallelize post-IPO stage
3) how to parallelize post-IPO
4) the linker problems.
5) the toy-grade rudimentary implementation
6) misc
1.Some background
------------------
The Interprocedural-optimization compilation, aka IPO or IPA, typically
consists of three stages:
S1) pre-IPO
Each function goes through some analysis and not-very-aggressive
optimizations.
Some information is collected during this stage, this info will be
to IPO stages.
This info is usually called summary info.
The result of this stage is "fake-objects" which is binary files using
some known object format to encapsulate IR as well as summary info
along with
other stuff.
S2) IPO:
Compiler works with linker to resolve and merge symbols in the
"fake-objects"
Then Interprocedural analyses (IPA) are invoked to perform
interprocedural
analysis either based on the summary-info, or directly on the IR.
Interprocedural optimizations (IPO) are called based on the IPA result.
In some compilers, IPA and IPO are separated. One reason is that
many IPAs can
be directly conduct on the concise summary info, while many IPOs
need to load
IRs and bulky annotation/metadata into memory.
S3) post-IPO:
Typically consist of Loop-Nest-Opt, Scalar Opt, Code-Gen etc etc.
While they
are intra-procedural analyses/optimizers, they may directly benefit
from
the info collected in the IPO stages and pass down the road.
LLVM collectively call S2 and S3 as "LTO CodeGen", which is very
confusing.
2. Why parallelize post-IPO stage
==============================
R1) To improve the scalarbility
It is quite obvious that we are not able to put everything about a
monster
program in memory at once.
Even if you can afford a expensive computer, the address space of a
single compiler process cannot accommodate a monster program.
R2) to take advantage of ample HW resource to shorten compile time.
R3) make debugging lot easier.
One can triage problems in a much smaller partition rather than
the huge monster program.
This proposal is not able to shoot the goal R1 at this moment,
because during
the IPO stage, currently the compiler brings everything into memory at once.
3. How to parallelize post-IPO stage
====================================
From 5k' high, the concept is very simple, just to
step 1).divide the merged IR into small pieces,
step 2).and compile each of this pieces independendly.
step 3) the objects of each piece are fed back to linker to are linked
into an executable, or a dynamic lib.
Section 3.1 through 3.3 describe these three steps respectively.
3.1. Partitioning
-----------------
Partitioning is to cut a resonabely-sized chunk from the big merged IRs.
It roughly consists of two steps, 1) determine the partition scheme, which
is relatively easy step, and 2) physically scoop the partition out of
the merged IR, which is much more involved.
3.1.1 Figure out Partition scheme
----------------------------------
we randomly pick up some function and put them in a partition.
It would be nice to perform some optimization at this moment. One opt
in my mind is to reorder functions in order to reduce working-set and
improve locality.
Unfortunately, this opt seems to be bit blind at this time, because
- CallGraph is not annotated with estimated or profiled frequency.
- some linkers don't respect the order. It seems they just
remembers the function order of the pristine input obj/fake-obj,
and enforce this order at final link (link-exec/shared-lib) stage.
Anyway, I try to ignore all these problems, and try to perform partition
via following steps. Maybe we have some luck on some platforms:
o. DFS the call-graph, ignoring the self-resursive edges, if freq is
available, prioritizing the edges (i.e. corresponding to call-sites)
such that frequent edges are visited first.
o. Cut the DFS spanning tree obtained from the previous step bottom-up,
Each cut/partition contains reasonable # of functions, and the
aggregate
size of the functions of the partition should not exceeds predefined
threshold.
o. repeat the previous step until the Call-graph's DFS spanning tree
is empty.
3.1.2 Partition transformation
------------------------------
This is bit involved. There are bunch of problems we have to tackle.
1) When the use/def of a symbol are separated in different modules,
its attribute, like linkage, visibility, need to be changed
as well.
[Example 1], if a symbol is flagged as "internal" to the module where
the it is defined, the linkage need to be changed into "internal"
to the executable/lib being compiled.
[Example 2], For compile-time constants, their initialized value
needs to to cloned to the partitions where it is referenced,
The rationale is to make the post-ipo passes to take advantage
of the initlized value to squeeeze some performance.
In order to not bloat the code size, the cloned constant should
mark "don't emit". [end of eg2]
Being able to precisely update symbols' attribute is not only
vital to correctness, it has significant impact to the the
performance as well.
I have not yet taken a thorough investigation of this issue. My
rudimentary implementation is simply flag symbol "external" when its
use/def are separated in different module. I believe this is one
of the most difficult part of this work. I guess it is going to
take long time to become stable.
2) In order to compile each partition in each separate thread (see
Section 3.2), we have to put partitions in distinct LLVMContext.
I could be wrong, but I don't find the code which is able to
perform function cloning across LLVMContext.
My workaround in the patch is to perform function cloning in
one LLVMContext (but in different Module, of course), then
save the module to disk file, and load it to memory using a
new LLVMContext.
It is bit circuitous and expensive.
One random observation:
Currently, function-scoped static variables are considered
as "global variables". When cloning a function with static variable,
compiler has no idea if the static variables are used only by
the function being cloned, and hence separate the function
and the variables.
I guess it would be nice if we organized symbols by its scope
instead of its live-time. it would be convenient for this situation.
3.2 Compile partitions independently
--------------------------------------
There are two camps: one camp advocate compiling partitions via
multi-process,
the other one favor multi-thread.
Inside Apple compiler teams, I'm the only one belong to the 1st comp.
I think
while multi-proc sounds bit red-neck, it has its advantage for this
purpose, and
while multi-thread is certainly more eye-popping, it has its advantage
as well.
The advantage of multi-proc are:
1) easier to implement, the process run in its own address space.
We don't need to worry about they can interfere with each other.
2)huge, or not unlimited, address space.
The disadvantage is that it's expensive. But I guess the cost is
almost negligible compared to the overall IPO compilation.
The advantage of multi-threads I can imagine are:
1) sound fancy
2) it is light-weight
3) inter-thread communication is easier than IPC.
Its disadvantage are:
1). Oftentime we will come across race-condition, and it took
awful long time to figure it out. While the code is supposed
to be mult-thread safe, we might miss some tricky case.
Trouble-shooting race condition is a nightmare.
2) Small address space. This is big problem if we the compiler
is built 32-bit . In that case, the compiler is not able to bring
lots of stuff in memory even if the HW dose
provide ample mem.
3) The thread-safe run-time lib is more expensive.
I once linked a compiler using -lpthread (I dose not have to) on a
UNIX platform, and saw the compiler slow down by about 1/3.
I'm not able to convince the folks in other camp, neither are they
able to convince me. I decide to implement both. Fortunately, this
part is not difficult, it seems to be rather easy to crank out one
within short
period of time. It would be interesting to compare them side-by-side,
and see which camp lose:-). On the other hand, if we run into
race-condition
problem, we choose multi-proc version as a fall-back.
Regardless which tech is going to use to compile partition
independently, in order to judiciously and adaptively choose appropriate
parallel-factor, the compiler certainly need a lib which is able to
figure out the load the entire system is in. I don't know if there are
such magic lib or not.
4. the tale of two kinds of linker
----------------------------------
As far as I can tell, llvm suports two kind linker for its IPO
compilation,
and the supports are embodied by two set of APIs/interfaces.
o. Interface 1, those stuff named lto_xxxx().
o. GNU gold interface,
The compiler interact with GNU gold via the adapter implemented
in tools/gold/gold-plugin.cpp.
This adpater calls the interface-1 to control the IPO process.
It dose not have to call the interface APIs, I think it is definitely
ok it call internal functions.
The compiler used to generate a single object file from the merged
IR, now it will generate multiple of them, one for each partition.
So, the interface 1 is *NOT* sufficient any more.
For gold linker users, it is easy to make them happy just by
hacking the adapter, informing the linker new input object
files. This is done transparently, the users don't need to install new ld.
For those system which invoke ld interacting with the libLTO.{so,dylib},
it has to accept the new APIs I added to the interface-1 in order to
enable the new functionality. Or maybe we can invoke '/the/path/to/ld -r
*.o -o merged.o'
and feed the merged.o the linker (this will keep the interface
interact)? Unfortunately, it dose not work at all, how can I know the path
the ld? the libLTO.{so,dydlib} is invoked as plugin; it cannot see the argv.
How about hack them by adding a nasty flag pointing to the right ld?
Well, it works. However, I don't believe many people like to do it this way,
that means I loose huge number of "QA" who are working hard for this
compiler.
What's wrong with the interface-1? The ld side is more active than
the compiler side, however, in concept the IPO is driven by the compiler
side.
This mean this interface is changing over time.
In contrast, the gold interface (as I rever-engineer from the adpator
code) is more symbol-centric, taking little IPO-thing into account.
That interface is simple and stable.
5. the rudimentary implementation
---------------------------------
I make it works for bzip2 at cpu2kint yesterday. bzip2 is "tiny"
program, I intentionally lower the partition size to get 3 partitions.
There is no comment in the code, and it definitely need rewrite. I
just check the correctness (with ref input), and I don't measure how much
it degrade the performance. (due to the problem I have not got chance
to tackle, see section 3.1.2, the symbol attribute stuff).
The control flow basically is:
1. add a module pass to the IPO pass-manager, and figure
out the partition scheme.
2) physically partition the merged partition.
the IR and the obj of partition are placed in a new dir.
"llvmipo" by default
--
ls llvmipo/
Makefile merged part1.bc part1.o part2.bc
part2.o part3.bc part3.o
--
3) For demo purpose, I drive the post-IPO stage via a makefile,
which encapsulate
hack and other nasty stuff.
NOTE that the post-ipo pass in my hack contains only CodeGen
pass, we need to
reorganize the PassManagerBuilder::populateLTOPassManager(), which
intermingle
IPO pass along with intra-proc scalar pass, we need to separate
them and the intra-proc
scalar pass to post-IPO stage.
1 .PHONY = all
2
3
4 BC = part1.bc part2.bc part3.bc
5 OBJ = ${BC:.bc=.o}
6
7 all : merged
8 %.o : %.bc
9 $(HOME)/tmp/lto.llc -filetype=obj $+ -o $@
10
11 merged : $(OBJ)
12 /usr/bin/ld $+ -r -o $@
13
4. as the Makefile sugguest, the *.o of the partions are linked into
a single obj "merged"
and feed back to link.
6) Miscellaneous
===========
Will partitioning degrade performance in theory. I think it depends
on the definition of
performance. If performance means execution-time, I guess it dose not.
However, if performance includes code-size, I think it may have some
negative impact.
Following is few scenario:
- constants generated by the post-IPO passes are not shared across
partitions
- dead func may be detected during the post-IPO stage, and they may
not be deleted.
-------------- next part --------------
Index: tools/lto/LTOCodeGenerator.cpp
===================================================================
--- tools/lto/LTOCodeGenerator.cpp (revision 186109)
+++ tools/lto/LTOCodeGenerator.cpp (working copy)
@@ -17,8 +17,10 @@
#include "llvm/ADT/StringExtras.h"
#include "llvm/Analysis/Passes.h"
#include "llvm/Analysis/Verifier.h"
+#include "llvm/Analysis/CallGraph.h"
#include "llvm/Bitcode/ReaderWriter.h"
#include "llvm/Config/config.h"
+#include "llvm/ADT/SetVector.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
@@ -29,16 +31,19 @@
#include "llvm/MC/MCContext.h"
#include "llvm/MC/SubtargetFeature.h"
#include "llvm/PassManager.h"
+#include "llvm/IRReader/IRReader.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/Host.h"
#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Program.h"
#include "llvm/Support/Signals.h"
#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Support/system_error.h"
+#include "llvm/Support/SourceMgr.h"
#include "llvm/Target/Mangler.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"
@@ -46,8 +51,11 @@
#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/ObjCARC.h"
+#include "llvm/Transforms/Utils/ValueMapper.h"
+#include "llvm/Transforms/Utils/Cloning.h"
using namespace llvm;
+
static cl::opt<bool>
DisableOpt("disable-opt", cl::init(false),
cl::desc("Do not run any optimization passes"));
@@ -68,12 +76,154 @@
#endif
}
+class ModPartScheme {
+public:
+ typedef SetVector<Function*> PartitionTy;
+ typedef PartitionTy::iterator iterator;
+ typedef PartitionTy::const_iterator const_iterator;
+
+ ModPartScheme() {}
+ ModPartScheme(const ModPartScheme &That) : Partition(That.Partition) {}
+
+ void AddFunction(Function *F) { Partition.insert(F); }
+
+ iterator begin() { return Partition.begin(); }
+ iterator end() { return Partition.end(); }
+ const_iterator begin() const { return Partition.begin(); }
+ const_iterator end() const { return Partition.end(); }
+ int size() { return Partition.size(); }
+ bool count(Function *F) const { return Partition.count(F); }
+
+private:
+ PartitionTy Partition;
+};
+
+class ModPartSchemeMgr {
+public:
+ typedef std::vector<ModPartScheme *> MPSchemeTy;
+ typedef MPSchemeTy::iterator iterator;
+ typedef MPSchemeTy::const_iterator const_iterator;
+
+ ~ModPartSchemeMgr();
+
+ ModPartScheme *CreateEmptyPartition(void) {
+ ModPartScheme *P = new ModPartScheme;
+ PartSchemes.push_back(P);
+ return P;
+ }
+ iterator begin() { return PartSchemes.begin(); }
+ iterator end() { return PartSchemes.end(); }
+ const_iterator begin() const { return PartSchemes.begin(); }
+ const_iterator end() const { return PartSchemes.end(); }
+ int empty() const { return PartSchemes.empty(); }
+
+private:
+ MPSchemeTy PartSchemes;
+};
+
+class ModPartAnalysis : public ModulePass {
+public:
+ static char ID;
+ ModPartAnalysis(ModPartSchemeMgr &MPSM):
+ ModulePass(ID), ModPartMgr(MPSM), CG(0) {}
+
+ virtual bool runOnModule(Module &M);
+ virtual void getAnalysisUsage(AnalysisUsage &AU) const;
+
+private:
+ // Partition threshold, currently the metric for "size" is the number
+ // of functions in a partition.
+ enum {
+ MaxFuncInPart = 3
+ };
+
+ class SizeMetric {
+ public:
+ SizeMetric(int func_num=0) : FuncNum(func_num) {};
+ bool ExceedThreshold() const { return FuncNum > MaxFuncInPart; }
+ bool ExceedThresholdTooMuch() const
+ { return FuncNum >= MaxFuncInPart * 3 / 2; }
+ void IncFuncNum(int amt = 1) { FuncNum += amt; };
+ const SizeMetric& operator+=(const SizeMetric &That)
+ { FuncNum += That.FuncNum; return *this; }
+ void Reset() { FuncNum = 0; }
+
+ private:
+ int FuncNum;
+ };
+
+ void setVisited(CallGraphNode *N) { Visited[N] = true; }
+ bool isVisited(CallGraphNode *N) const {
+ return Visited.find(N) != Visited.end();
+ }
+
+ SizeMetric PerformPartitionHelper(CallGraphNode *Root);
+ void EmitPartition(CallGraphNode *DFSRoot, SizeMetric &SM);
+ SizeMetric EvaluateModuleSize(const Module *M) const;
+
+private:
+ ModPartSchemeMgr &ModPartMgr;
+ CallGraph *CG;
+ std::vector<CallGraphNode *> DFSStack;
+ SizeMetric RemainingModSize;
+ DenseMap<CallGraphNode *, bool> Visited;
+};
+
+char ModPartAnalysis::ID = 0;
+
+class ModPartXform {
+public:
+ ModPartXform(Module *Mod, ModPartSchemeMgr &MPSM, IPOPartMgr &PM) :
+ PartSchemeMgr(MPSM), IPOPartMgr(PM), MergedModule(Mod), NextPartId(0) {}
+
+ void getWorkDir();
+
+ void PerformTransform();
+
+private:
+ IPOPartition *PerformTransform(ModPartScheme &PartScheme);
+
+ void CollectGlobalSymbol(ModPartScheme &Part, Module *New,
+ ValueToValueMapTy &VMap);
+ void CollectGlobalSymbol(Function *F, Module *New,
+ ValueToValueMapTy &VMap);
+
+ Function *CreateFuncDecl(const Function *F, Module *NewMod);
+ GlobalVariable *CreateVarDecl(const GlobalVariable *GV, Module *NewMod);
+
+private:
+ ModPartSchemeMgr &PartSchemeMgr;
+ IPOPartMgr &IPOPartMgr;
+ Module *MergedModule;
+ int NextPartId;
+};
+
+class PostIPOCompile {
+public:
+ PostIPOCompile(IPOPartMgr &IPM, IPOFileMgr &IFM, bool ToMergeObjs = false) :
+ PartMgr(IPM), FileMgr(IFM), MergedObjFile(0), MergeObjs(ToMergeObjs) {}
+
+ IPOFile *getMergedObjFile() const { return MergedObjFile; }
+
+ bool Compile(std::string &ErrMsg);
+
+private:
+ bool generateMakefile(std::string &ErrMsg);
+
+private:
+ IPOPartMgr &PartMgr;
+ IPOFileMgr &FileMgr;
+ IPOFile *MergedObjFile;
+ bool MergeObjs;
+};
+
LTOCodeGenerator::LTOCodeGenerator()
: _context(getGlobalContext()),
_linker(new Module("ld-temp.o", _context)), _target(NULL),
_emitDwarfDebugInfo(false), _scopeRestrictionsDone(false),
_codeModel(LTO_CODEGEN_PIC_MODEL_DYNAMIC),
- _nativeObjectFile(NULL) {
+ _nativeObjectFile(NULL),
+ _IPOPartMgr(_IPOFileMgr) {
InitializeAllTargets();
InitializeAllTargetMCs();
InitializeAllAsmPrinters();
@@ -161,34 +311,42 @@
}
bool LTOCodeGenerator::compile_to_file(const char** name, std::string& errMsg) {
- // make unique temp .o file to put generated object file
- SmallString<128> Filename;
- int FD;
- error_code EC = sys::fs::createTemporaryFile("lto-llvm", "o", FD, Filename);
- if (EC) {
- errMsg = EC.message();
+ if (determineTarget(errMsg))
return true;
- }
- // generate object file
- tool_output_file objFile(Filename.c_str(), FD);
+ PostIPOCompile PostIPOStage(_IPOPartMgr, _IPOFileMgr, true/*merge objects*/);
+ if (!_IPOFileMgr.CreateWorkDir(errMsg))
+ return true;
- bool genResult = generateObjectFile(objFile.os(), errMsg);
- objFile.os().close();
- if (objFile.os().has_error()) {
- objFile.os().clear_error();
- sys::fs::remove(Twine(Filename));
+ performIPO(errMsg, true);
+
+ if (!PostIPOStage.Compile(errMsg))
return true;
- }
- objFile.keep();
- if (genResult) {
- sys::fs::remove(Twine(Filename));
+ *name = PostIPOStage.getMergedObjFile()->getPath().c_str();
+ return false;
+}
+
+bool LTOCodeGenerator::compile_to_files(const char** name, std::string& errMsg) {
+ if (determineTarget(errMsg))
return true;
+
+ performIPO(errMsg);
+
+ // Parallelize post-IPO
+ _nativeObjectPath.clear();
+ PostIPOCompile PostIPOStage(_IPOPartMgr, _IPOFileMgr);
+ if (!PostIPOStage.Compile(errMsg))
+ return true;
+
+ for (IPOPartMgr::iterator I = _IPOPartMgr.begin(), E = _IPOPartMgr.end();
+ I != E; I++) {
+ _nativeObjectPath.append((*I)->getObjFilePath().data());
+ _nativeObjectPath.append('\0');
}
+ _nativeObjectPath.append('\0');
+ *name = _nativeObjectPath.c_str();
- _nativeObjectPath = Filename.c_str();
- *name = _nativeObjectPath.c_str();
return false;
}
@@ -357,16 +515,12 @@
_scopeRestrictionsDone = true;
}
-/// Optimize merged modules using various IPO passes
-bool LTOCodeGenerator::generateObjectFile(raw_ostream &out,
- std::string &errMsg) {
- if (this->determineTarget(errMsg))
- return true;
+void LTOCodeGenerator::performIPO(std::string &errMsg, bool PerformPartition) {
Module* mergedModule = _linker.getModule();
// Mark which symbols can not be internalized
- this->applyScopeRestrictions();
+ applyScopeRestrictions();
// Instantiate the pass manager to organize the passes.
PassManager passes;
@@ -390,13 +544,30 @@
// Make sure everything is still good.
passes.add(createVerifierPass());
+ ModPartSchemeMgr MPSM;
+ if (PerformPartition)
+ passes.add(new ModPartAnalysis(MPSM));
+
+ passes.run(*mergedModule);
+
+ if (!MPSM.empty()) {
+ ModPartXform MPT(mergedModule, MPSM, _IPOPartMgr);
+ MPT.PerformTransform();
+ } else {
+ IPOPartition *P = _IPOPartMgr.CreateIPOPart(mergedModule);
+ P->SaveBitCode();
+ }
+}
+
+bool LTOCodeGenerator::performPostLTO(Module *Mod, formatted_raw_ostream &Out,
+ std::string &errMsg) {
+ // placeholder for post-IPO scalar opt
+
PassManager codeGenPasses;
codeGenPasses.add(new DataLayout(*_target->getDataLayout()));
_target->addAnalysisPasses(codeGenPasses);
- formatted_raw_ostream Out(out);
-
// If the bitcode files contain ARC code and were compiled with optimization,
// the ObjCARCContractPass must be run, so do it unconditionally here.
codeGenPasses.add(createObjCARCContractPass());
@@ -404,16 +575,31 @@
if (_target->addPassesToEmitFile(codeGenPasses, Out,
TargetMachine::CGFT_ObjectFile)) {
errMsg = "target file type not supported";
+ return true;;
+ }
+
+ // Run the code generator, and write assembly file
+ codeGenPasses.run(*Mod);
+ return false;
+}
+
+/// Optimize merged modules using various IPO passes
+bool LTOCodeGenerator::generateObjectFile(Module *Mod, const char *FN,
+ std::string &errMsg) {
+ std::string errFile;
+ tool_output_file OutFile(FN, errMsg, raw_fd_ostream::F_Binary);
+
+ if (!errFile.empty()) {
+ errMsg += errFile;
return true;
}
+ OutFile.keep();
- // Run our queue of passes all at once now, efficiently.
- passes.run(*mergedModule);
+ formatted_raw_ostream OS(OutFile.os());
+ bool Fail = performPostLTO(Mod, OS, errMsg);
+ OutFile.os().close();
- // Run the code generator, and write assembly file
- codeGenPasses.run(*mergedModule);
-
- return false; // success
+ return Fail;
}
/// setCodeGenDebugOptions - Set codegen debugging options to aid in debugging
@@ -428,3 +614,495 @@
_codegenOptions.push_back(strdup(o.first.str().c_str()));
}
}
+
+///////////////////////////////////////////////////////////////////////////
+//
+// Implementation of ModPartSchemeMgr, ModPartXform
+//
+///////////////////////////////////////////////////////////////////////////
+//
+ModPartSchemeMgr::~ModPartSchemeMgr() {
+ while (!PartSchemes.empty()) {
+ delete PartSchemes.back();
+ PartSchemes.pop_back();
+ }
+}
+
+Function *ModPartXform::CreateFuncDecl(const Function *F, Module *NewModule) {
+ Function *NF = Function::Create(F->getFunctionType(),
+ GlobalValue::ExternalLinkage,
+ F->getName(), NewModule);
+ NF->copyAttributesFrom(F);
+ return NF;
+}
+
+static void PromoteGlobalVarLinkage(GlobalVariable *GV) {
+ GV->setLinkage(GlobalValue::ExternalLinkage);
+}
+
+static void PromoteGlobalFuncLinkage(Function *F) {
+ F->setLinkage(GlobalValue::ExternalLinkage);
+}
+
+GlobalVariable *ModPartXform::CreateVarDecl(const GlobalVariable *GV,
+ Module *NewMod) {
+ GlobalVariable *G;
+ G = new GlobalVariable(*NewMod, GV->getType()->getElementType(),
+ GV->isConstant(),
+ GlobalValue::ExternalLinkage,
+ 0 /* InitVal */,
+ Twine(GV->getName()),
+ 0 /* Before */,
+ GV->getThreadLocalMode(),
+ GV->getType()->getPointerAddressSpace(),
+ GV->hasInitializer() ? true : false);
+ return G;
+}
+
+void ModPartXform::CollectGlobalSymbol(Function *F,
+ Module *New,
+ ValueToValueMapTy &VMap) {
+ DenseMap<Value *, bool> Visited;
+ SmallVector<Constant *, 16> WorkList;
+
+ for (Function::iterator BI = F->begin(), BE = F->end(); BI != BE; BI++) {
+ for (BasicBlock::iterator II = BI->begin(), IE = BI->end();
+ II != IE; II++) {
+ Instruction &Inst = *II;
+ for (User::op_iterator op = Inst.op_begin(), E = Inst.op_end();
+ op != E; ++op) {
+ if (Constant *C = dyn_cast<Constant>(*op)) {
+ if (!isa<BasicBlock>(C) && Visited.find(C) == Visited.end()) {
+ Visited[C] = true;
+ WorkList.push_back(C);
+ }
+ }
+ }
+ }
+
+ while(!WorkList.empty()) {
+ Constant *C = WorkList.pop_back_val();
+ if (GlobalVariable *GV = dyn_cast<GlobalVariable>(C)) {
+ if (VMap.find(GV) == VMap.end()) {
+ VMap[GV] = CreateVarDecl(GV, New);
+ PromoteGlobalVarLinkage(GV);
+ }
+ continue;
+ } else if (Function *Func = dyn_cast<Function>(C)) {
+ if (VMap.find(Func) == VMap.end()) {
+ VMap[Func] = CreateFuncDecl(Func, New);
+ PromoteGlobalFuncLinkage(Func);
+ }
+ continue;
+ }
+
+ for (User::const_op_iterator I = C->op_begin(), E = C->op_end();
+ I != E; ++I) {
+ Constant *C2 = dyn_cast<Constant>(*I);
+ if (C2 && Visited.find(C2) == Visited.end()) {
+ Visited[C2] = true;
+ WorkList.push_back(C2);
+ }
+ }
+ }
+ }
+}
+
+void ModPartXform::CollectGlobalSymbol(ModPartScheme &Part, Module *New,
+ ValueToValueMapTy &VMap) {
+ for (ModPartScheme::iterator I = Part.begin(), E = Part.end();
+ I != E; I++) {
+ const Function *F = *I;
+ VMap[F] = CreateFuncDecl(F, New);
+ }
+
+ for (ModPartScheme::iterator I = Part.begin(), E = Part.end();
+ I != E; I++)
+ CollectGlobalSymbol(*I, New, VMap);
+}
+
+void __attribute__((used)) dump_module(Module *M) {
+ std::string EI;
+ tool_output_file f("module.ll", EI);
+ f.keep();
+
+ M->print(f.os(), 0);
+}
+
+void __attribute__((used)) dump_type(Type *T) {
+ T->dump();
+}
+
+// Splitting the merged module by moving the specified functions to the
+// new module
+IPOPartition *ModPartXform::PerformTransform(ModPartScheme &PartScheme) {
+ std::string FN;
+ raw_string_ostream OS(FN);
+ OS << "partition" << NextPartId++;
+
+ Module *NewMod = new Module(OS.str(), MergedModule->getContext());
+ NewMod->setDataLayout(MergedModule->getDataLayout());
+ NewMod->setTargetTriple(MergedModule->getTargetTriple());
+
+ ValueToValueMapTy VMap;
+ CollectGlobalSymbol(PartScheme, NewMod, VMap);
+
+ // Copy over functions in the partition
+ for (ModPartScheme::iterator I = PartScheme.begin(), E = PartScheme.end();
+ I != E; I++) {
+ Function *OF = *I;
+ Function *NF = cast<Function>(VMap[OF]);
+
+ // Steal some code from llvm::CloneFunction.
+ {
+ // Loop over the arguments, copying the names of the mapped arguments over...
+ Function::arg_iterator DestI = NF->arg_begin();
+ for (Function::const_arg_iterator I = OF->arg_begin(), E = OF->arg_end();
+ I != E; ++I)
+ // Is this argument preserved? WTF, how come an argument is preserved?
+ if (VMap.count(I) == 0) {
+ DestI->setName(I->getName()); // Copy the name over...
+ VMap[I] = DestI++; // Add mapping to VMap
+ }
+ }
+ SmallVector<ReturnInst*, 8> Returns;
+ CloneFunctionInto(NF, OF, VMap, true, Returns);
+
+ OF->deleteBody();
+ }
+
+ IPOPartition *NewPart = IPOPartMgr.CreateIPOPart(NewMod);
+
+ // We have to save the module to disk such that next time it's loaded
+ // it relong to different context.
+ NewPart->SaveBitCode();
+
+ return NewPart;
+}
+
+void ModPartXform::PerformTransform() {
+ for (ModPartSchemeMgr::iterator I = PartSchemeMgr.begin(),
+ E = PartSchemeMgr.end(); I != E; I++)
+ (void)PerformTransform(**I);
+
+ IPOPartition *MP = IPOPartMgr.CreateIPOPart(MergedModule);
+ MP->SaveBitCode();
+}
+
+///////////////////////////////////////////////////////////////////////////
+//
+// Implementation of ModPartAnalysis
+//
+///////////////////////////////////////////////////////////////////////////
+//
+void ModPartAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {
+ AU.setPreservesAll();
+ AU.addRequired<CallGraph>();
+}
+
+void ModPartAnalysis::EmitPartition(CallGraphNode *DFSRoot, SizeMetric &SM) {
+ ModPartScheme *P = ModPartMgr.CreateEmptyPartition();
+ while (!DFSStack.empty()) {
+ CallGraphNode *N = DFSStack.back();
+ P->AddFunction(N->getFunction());
+ DFSStack.pop_back();
+ if (N == DFSRoot)
+ break;
+ }
+}
+
+ModPartAnalysis::SizeMetric
+ModPartAnalysis::PerformPartitionHelper(CallGraphNode *R) {
+ SizeMetric SM;
+ setVisited(R);
+
+ // Skip dummy call-graph node or declaration
+ {
+ Function *F = R->getFunction();
+ if (!F || F->isDeclaration())
+ return SM;
+ }
+
+ DFSStack.push_back(R);
+ SM.IncFuncNum();
+
+ for (CallGraphNode::iterator I = R->begin(), E = R->end(); I != E; I++) {
+ CallGraphNode *Callee = (*I).second;
+ if (isVisited(Callee))
+ continue;
+
+ setVisited(Callee);
+
+ // Skip dummy call-graph node or declaration
+ Function *F = R->getFunction();
+ if (!F || F->isDeclaration())
+ continue;
+
+ SizeMetric T = PerformPartitionHelper(Callee);
+ bool Emit = false;
+
+ if (T.ExceedThreshold())
+ Emit = true;
+ else {
+ SM += T;
+ Emit = SM.ExceedThreshold();
+ }
+
+ if (Emit) {
+ EmitPartition(R, SM);
+ SM.Reset();
+ if (!RemainingModSize.ExceedThresholdTooMuch())
+ break;
+ }
+ }
+ return SM;
+}
+
+// Return the "size" of given module.
+ModPartAnalysis::SizeMetric ModPartAnalysis::EvaluateModuleSize
+ (const Module *M) const {
+ SizeMetric S;
+ for (Module::const_iterator I = M->begin(), E = M->end(); I != E; I++) {
+ const Function &F = *I;
+ if (!F.isDeclaration())
+ S.IncFuncNum();
+ }
+ return S;
+}
+
+bool ModPartAnalysis::runOnModule(Module &M) {
+ SizeMetric S = EvaluateModuleSize(&M);
+ if (!S.ExceedThresholdTooMuch()) {
+ // While it may be big, it is okay.
+ return false;
+ }
+
+ if (!(CG = getAnalysisIfAvailable<CallGraph>()))
+ return false;
+
+ CallGraphNode *R = CG->getRoot();
+ (void)PerformPartitionHelper(R);
+
+ return false;
+}
+
+// /////////////////////////////////////////////////////////////////////////////
+//
+// Implementation of IPOPartition and IPOPartMgr
+//
+// /////////////////////////////////////////////////////////////////////////////
+//
+IPOPartition::IPOPartition(Module *M, const char *NameWoExt, IPOFileMgr &FM) :
+ Mod(0), Ctx(0), IRFile(0), ObjFile(0), FileNameWoExt(NameWoExt), FileMgr(FM) {
+}
+
+IPOFile &IPOPartition::getIRFile() const {
+ if (IRFile)
+ return *IRFile;
+ else {
+ std::string FN(FileNameWoExt + ".bc");
+ return *(IRFile = FileMgr.CreateIRFile(FN.c_str()));
+ }
+}
+
+IPOFile &IPOPartition::getObjFile() const {
+ if (ObjFile)
+ return *ObjFile;
+ else {
+ std::string FN(FileNameWoExt + ".o");
+ return *(ObjFile = FileMgr.CreateObjFile(FN.c_str()));
+ }
+}
+
+
+bool IPOPartition::SaveBitCode() {
+ if (!Mod) {
+ // the bit-code have already saved in disk
+ return true;
+ }
+
+ IPOFile &F = getIRFile();
+ if (F.ErrOccur())
+ return false;
+
+ raw_fd_ostream OF(F.getPath().c_str(), F.getLastErrStr(),
+ raw_fd_ostream::F_Binary);
+ WriteBitcodeToFile(Mod, OF);
+ OF.close();
+
+ Mod = 0;
+ delete Ctx;
+ Ctx = 0;
+
+ return !F.ErrOccur();
+}
+
+bool IPOPartition::LoadBitCode() {
+ if (Mod)
+ return true;
+
+ IPOFile &F = getIRFile();
+ if (F.ErrOccur())
+ return false;
+
+ Ctx = new LLVMContext;
+ SMDiagnostic Diag;
+ Mod = ParseIRFile(getIRFilePath(), Diag, *Ctx);
+ if (!Mod) {
+ F.getLastErrStr() = Diag.getMessage();
+ return false;
+ }
+
+ return true;
+}
+
+IPOPartition *IPOPartMgr::CreateIPOPart(Module *M) {
+ std::string PartName;
+ raw_string_ostream OS(PartName);
+ OS << "part" << NextPartId++;
+
+ IPOPartition *P = new IPOPartition(M, OS.str().c_str(), FileMgr);
+ P->Mod = M;
+ IPOParts.push_back(P);
+ return P;
+}
+
+// /////////////////////////////////////////////////////////////////////////////
+//
+// Implementation of IPOFile and IPOFileMgr
+//
+// /////////////////////////////////////////////////////////////////////////////
+IPOFile::IPOFile(const char *DirName, const char* BaseName, bool KeepFile)
+ : Fname(BaseName), Fpath(DirName), Keep(KeepFile) {
+ Fpath = Fpath + "/" + BaseName;
+ Keep = true;
+}
+
+IPOFile::~IPOFile() {
+ if (Keep)
+ sys::fs::remove(Fpath);
+}
+
+IPOFileMgr::IPOFileMgr(): WorkDir("llvmipo") {
+ IRFiles.reserve(20);
+ ObjFiles.reserve(20);
+ OtherFiles.reserve(8);
+ KeepFiles = true;
+ WorkDirCreated = false;
+}
+
+IPOFileMgr::~IPOFileMgr() {
+ if (!KeepFiles) {
+ uint32_t NumRm;
+ sys::fs::remove_all(Twine(WorkDir), NumRm);
+ }
+}
+
+bool IPOFileMgr::CreateWorkDir(std::string &ErrorInfo) {
+ if (WorkDirCreated)
+ return true;
+
+ bool Exist;
+ error_code EC = sys::fs::create_directory(Twine(WorkDir), Exist);
+ if (EC == error_code::success()) {
+ WorkDirCreated = true;
+ return true;
+ }
+
+ return false;
+}
+
+IPOFile *IPOFileMgr::CreateIRFile(const char *Name) {
+ IPOFile *F = CreateFile(Name);
+ IRFiles.push_back(F);
+ return F;
+}
+
+IPOFile *IPOFileMgr::CreateObjFile(const char *Name) {
+ IPOFile *F = CreateFile(Name);
+ ObjFiles.push_back(F);
+ return F;
+}
+
+IPOFile *IPOFileMgr::CreateMakefile(const char *Name) {
+ IPOFile *F = CreateFile(Name);
+ OtherFiles.push_back(F);
+ return F;
+}
+
+// /////////////////////////////////////////////////////////////////////////////
+//
+// Implementation of PostIPOCompile
+//
+// /////////////////////////////////////////////////////////////////////////////
+
+// The makefile looks something like this:
+//
+// .PHONY = all
+//
+// BC = part1.bc part2.bc part3.bc
+// OBJ = ${BC:.bc=.o}
+//
+// all : merged.o
+// %.o : %.bc
+// $(HOME)/tmp/lto.llc -filetype=obj $< -o $@
+//
+// merged.o : $(OBJ)
+// /usr/bin/ld $+ -r -o $@
+//
+bool PostIPOCompile::generateMakefile(std::string &ErrMsg) {
+
+ IPOFile *MkFile = FileMgr.CreateMakefile("Makefile");
+
+ std::string NewErrMsg;
+ raw_fd_ostream Mk(MkFile->getPath().c_str(), NewErrMsg, 0);
+
+ if (!NewErrMsg.empty()) {
+ ErrMsg += NewErrMsg;
+ return false;
+ }
+
+ std::string BCFiles;
+ for (IPOPartMgr::iterator I = PartMgr.begin(), E = PartMgr.end();
+ I != E; I++) {
+ BCFiles += (*I)->getIRFile().getName();
+ BCFiles += " ";
+ }
+
+ Mk << ".PHONY = all\n\n";
+
+ Mk << "\nBC = " << BCFiles << "\n";
+ Mk << "OBJ = ${BC:.bc=.o}\n\n";
+
+ if (MergeObjs)
+ Mk << "all : " << MergedObjFile->getName() << "\n";
+ else
+ Mk << "all : $(OBJ)\n";
+
+ // Emit rule
+ Mk << "%.o : %.bc\n\t$(HOME)/tmp/lto.llc -filetype=obj $+ -o $@\n\n";
+
+ if (MergeObjs) {
+ Mk << MergedObjFile->getName() << " : $(OBJ)\n";
+ Mk << "\t/usr/bin/ld $+ -r -o $@\n\n";
+ }
+
+ Mk.close();
+
+ return true;
+}
+
+bool PostIPOCompile::Compile(std::string &ErrMsg) {
+ if (MergeObjs)
+ MergedObjFile = FileMgr.CreateObjFile("merged");
+
+ if (!generateMakefile(ErrMsg))
+ return false;
+
+ const char *args[] = {"/usr/bin/make", "-C", 0, 0};
+ args[2] = FileMgr.getWorkDir().c_str();
+
+ bool Fail;
+ sys::ExecuteAndWait("/usr/bin/make", args, 0/*envp*/, 0/*redirect*/, 0/*wait*/, 0, &ErrMsg, &Fail);
+ return !Fail;
+}
Index: tools/lto/LTOCodeGenerator.h
===================================================================
--- tools/lto/LTOCodeGenerator.h (revision 186109)
+++ tools/lto/LTOCodeGenerator.h (working copy)
@@ -18,6 +18,8 @@
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/Linker.h"
+#include "llvm/Support/FormattedStream.h"
+#include "llvm/Support/system_error.h"
#include <string>
#include <vector>
@@ -28,6 +30,111 @@
class MemoryBuffer;
class TargetMachine;
class raw_ostream;
+
+ class IPOFile;
+ class IPOFileMgr;
+ class IPOPartition {
+ public:
+ bool isInMemory() const { return Mod != 0; }
+ bool SaveBitCode();
+ bool LoadBitCode();
+ const std::string &getIRFilePath() const;
+ const std::string &getObjFilePath() const;
+ Module *getModule() const { return Mod; }
+
+ IPOFile &getIRFile() const;
+ IPOFile &getObjFile() const;
+
+ private:
+ friend class IPOPartMgr;
+ IPOPartition(Module *M, const char *FileNameWoExt, IPOFileMgr &FM);
+
+ Module *Mod;
+ LLVMContext *Ctx;
+ mutable IPOFile *IRFile;
+ mutable IPOFile *ObjFile;
+ std::string FileNameWoExt;
+ IPOFileMgr &FileMgr;
+ };
+
+ class IPOPartMgr {
+ public:
+ typedef std::vector<IPOPartition *> IPOPartsTy;
+ typedef IPOPartsTy::iterator iterator;
+ typedef IPOPartsTy::const_iterator const_iterator;
+
+ iterator begin() { return IPOParts.begin(); }
+ iterator end() { return IPOParts.end(); }
+ const_iterator begin() const { return IPOParts.begin(); }
+ const_iterator end() const { return IPOParts.end(); }
+
+ IPOPartition *CreateIPOPart(Module *);
+
+ IPOPartMgr(IPOFileMgr &IFM) : FileMgr(IFM), NextPartId(1) {}
+
+ private:
+ IPOPartsTy IPOParts;
+ IPOFileMgr &FileMgr;
+ int NextPartId;
+ };
+
+ class IPOFile {
+ public:
+ const std::string &getName() { return Fname; }
+ const std::string &getPath() { return Fpath; }
+
+ error_code &getLastErrCode() { return LastErr; }
+ std::string &getLastErrStr() { return LastErrStr; }
+
+ bool ErrOccur() const {
+ return LastErr != error_code::success() || !LastErrStr.empty();
+ }
+
+ void setKeep() { Keep = true; }
+ bool toKeep() const { return Keep; }
+
+ private:
+ friend class IPOFileMgr;
+ IPOFile(const char* DirName, const char *BaseName, bool Keep=false);
+ ~IPOFile();
+
+ private:
+ std::string Fname;
+ std::string Fpath;
+ error_code LastErr;
+ std::string LastErrStr;
+ bool Keep;
+ };
+
+ class IPOFileMgr {
+ public:
+ IPOFileMgr();
+ ~IPOFileMgr();
+
+ bool CreateWorkDir(std::string &ErrorInfo);
+ const std::string &getWorkDir() { return WorkDir; }
+
+ IPOFile *CreateIRFile(const char *Name);
+ IPOFile *CreateObjFile(const char *Name);
+ IPOFile *CreateMakefile(const char *Name);
+
+ typedef std::vector<IPOFile *> FileVect;
+ FileVect &getIRFiles() { return IRFiles; }
+ FileVect &getObjFiles() { return ObjFiles; }
+
+ private:
+ IPOFile *CreateFile(const char *Name) {
+ return new IPOFile(WorkDir.c_str(), Name);
+ }
+
+ private:
+ FileVect IRFiles;
+ FileVect ObjFiles;
+ FileVect OtherFiles;
+ std::string WorkDir;
+ bool KeepFiles;
+ bool WorkDirCreated;
+ };
}
//===----------------------------------------------------------------------===//
@@ -52,11 +159,16 @@
bool writeMergedModules(const char *path, std::string &errMsg);
bool compile_to_file(const char **name, std::string &errMsg);
+ bool compile_to_files(const char** name, std::string& errMsg);
const void *compile(size_t *length, std::string &errMsg);
void setCodeGenDebugOptions(const char *opts);
private:
- bool generateObjectFile(llvm::raw_ostream &out, std::string &errMsg);
+ void performIPO(std::string &errMsg, bool PerformPartition=false);
+ bool performPostLTO(llvm::Module *Mod, llvm::formatted_raw_ostream &Out,
+ std::string &errMsg);
+
+ bool generateObjectFile(llvm::Module *, const char *Out, std::string &errMsg);
void applyScopeRestrictions();
void applyRestriction(llvm::GlobalValue &GV,
std::vector<const char*> &mustPreserveList,
@@ -78,6 +190,8 @@
std::vector<char*> _codegenOptions;
std::string _mCpu;
std::string _nativeObjectPath;
+ llvm::IPOPartMgr _IPOPartMgr;
+ llvm::IPOFileMgr _IPOFileMgr;
};
#endif // LTO_CODE_GENERATOR_H
More information about the llvm-dev
mailing list