[LLVMdev] Cross-module function inlining
mark.i.r.muir at gmail.com
Wed Jan 13 08:38:15 PST 2010
I've developed a working LLVM back-end (based on LLVM 2.6) for a custom architecture with its own tool chain. This tool chain creates stand-alone programs from a single assembly. We used to use GCC, which supported producing a single machine assembly from multiple source files.
I modified Clang to accept the architecture, but discovered that clang-cc (or the Clang Tool subclass inside Clang) doesn't allow multiple source files to be lowered to a single machine assembly. The ToolChain subclasses inside Clang make use of the normal system linker to combine multiple modules, but this isn't possible on our system.
So, I created a new Clang ToolChain subclass that forms a tool pipeline based on the following:
- Run the existing Clang tool on each source file, using -emit-llvm to generate a .bc file for each module.
- Run llvm-link to merge them into a single .bc file.
- Run llc to generate a complete machine assembly.
The last two were implemented together in a single Tool, performing the job of the linker. Optimisation options are passed onto each tool.
This does the trick.
However, with optimisations enabled, the resulting code is not as efficient as it would be if all the code were in a single module. In particular, function inlining is only performed by clang (i.e. only on a module-by-module basis), and not by llvm-link or llc. This can be seen in the resulting pass options with -O3 (obtained using '-Xclang -debug-only=Execution' and '-Xlinker -debug-only=Execution'):
Pass Arguments: -raiseallocs -simplifycfg -domtree -domfrontier -mem2reg -globalopt -globaldce -ipconstprop -deadargelim -instcombine -simplifycfg -basiccg -prune-eh -functionattrs -inline -argpromotion -simplify-libcalls -instcombine -jump-threading -simplifycfg -domtree -domfrontier -scalarrepl -instcombine -break-crit-edges -condprop -tailcallelim -simplifycfg -reassociate -domtree -loops -loopsimplify -domfrontier -lcssa -loop-rotate -licm -lcssa -loop-unswitch -instcombine -scalar-evolution -lcssa -iv-users -indvars -loop-deletion -lcssa -loop-unroll -instcombine -memdep -gvn -memdep -memcpyopt -sccp -instcombine -break-crit-edges -condprop -domtree -memdep -dse -adce -simplifycfg -strip-dead-prototypes -print-used-types -deadtypeelim -constmerge
Pass Arguments: -preverify -domtree -verify -loops -loopsimplify -scalar-evolution -iv-users -loop-reduce -lowerinvoke -unreachableblockelim -codegenprepare -stack-protector -machine-function-analysis -machinedomtree -machine-loops -machinelicm -machine-sink -unreachable-mbb-elimination -livevars -phi-node-elimination -twoaddressinstruction -liveintervals -simple-register-coalescing -livestacks -virtregmap -linearscan-regalloc -stack-slot-coloring -prologepilog -machinedomtree -machine-loops -machine-loops
I'm sure I can hack away to manually add these passes, but I'd prefer an informed opinion on the best way to achieve this, or if there's a more proper way to achieve the same thing (i.e. inter-module function inlining).
Also, I've noticed another problem with this approach: when function declarations are 'inline __attribute__((always_inline))' in header files, where the corresponding function definition is in a separate module to where the function is being called, LLVM will not inline the function call at the call site, but will happily strip away the function body, resulting in broken code. Is there a way to stop this?
Any guidance is much appreciated.
More information about the llvm-dev