[LLVMdev] Proposal: MCLinker - an LLVM integrated linker
Jush Lu
jush.lu at mediatek.com
Thu Nov 3 04:38:17 PDT 2011
Hi
Thank you for reply, Object Files is really an impressive work. We've
studied it as a important related work. Sorry for our relative work without
the mention of Object Files, I sent proposal in a rush.
In fact, I ever implemented readers based on Object Files in MCLinker, and
these readers are used to parse libraries into MCLinker IR. That is, we
design structures to be used as IR for core linking algorithms, not just
for "Integrated".
We intend to design unified abstractions which support the common linking
algorithm. For instances, you can find the definition of LDSymbol in
http://code.google.com/p/mclinker/source/browse/include/mcld/LD/LDSymbol.h.
The readers in MCLinker will convert ELF Elf_Sym, Mach-O nlist, and COFF
symbol to LDSymbol, and running the same symbol resolution algorithm.
Currently, we are implementing symbol tables and symbol resolution in
MCLinker. We will release our code asap.
As LDSymbol.h shows, LDSymbol has a pointer to MCSectionData. Once we
replace MCSectionData here with smaller data units, I think LDSymbol can be
used with a similar concept as atom. As Nick mentioned, ELF will be easier
for parsing with the fields such as st_size if we apply this change, and it
may need efforts to parse Mach-O and COFF to produce such smaller units.
Jush
On Wed, Nov 2, 2011 at 9:26 AM, Michael Spencer <bigcheesegs at gmail.com>wrote:
> 2011/11/1 Jush Lu (盧育龍) <Jush.Lu at mediatek.com>:
> > Hi all,
> >
> > We are developing a linker, MCLinker.
> >
> > MCLinker is a linker for LLVM. It leverages the LLVM machine code (MC)
> layer to link object files and bitcodes, and generate shared objects and
> executable files.
> >
> >
> > Motivation
> > ----------
> >
> > The development of MCLinker was started out of the need for an LLVM
> integrated linker. LLVM lacks an integrated linker; hence, it relies on
> external linkers to generate executables and dynamic shared objects (DSO,
> .so). MCLinker complements LLVM toolchain for direct generation of all
> kinds of outputs without temporary files. Therefore, LLVM toolchain can
> remove all unnecessary read and emission of temporary files. As we know, MC
> layer can generate object files without external assemblers, we look
> forward MCLinker can help LLVM to get rid of external linkers as well.
> Since MCLinker derives from MC layer, it supports multiple targets and
> formats naturally. As a result, it can make the cross-compilation process
> easier.
> >
> > Besides being an integrated linker, MCLinker has other ambition.
> Embedded systems developers are looking for a linker which performs well
> even on platforms with limited memory. MCLinker performs fast and
> format-independent linking with low memory usage. We will illustrate how we
> achieve this later. Furthermore, MCLinker is under UIUC BSD-Style license.
> With that, we hope that more platforms and commercial products can adopt it.
> >
> >
> > Features
> > --------
> >
> > MCLinker provides the following features:
> >
> > * Support Various Formats of Object Files
> >
> > MCLinker, leveraging LLVM machine code (MC) layer, provides a unified
> representation for various formats of object files. Different formats of
> object files, including ELF, COFF, and Mach-O, are read by the
> corresponding readers and translated into the same representation. This
> clearly separates the linking algorithms from reading and writing files in
> different formats. That is, the linking algorithms of MCLinker, such as
> merging symbol tables, applying relocations and relaxing instructions, are
> format-independent.
> >
> > * Fast Linking and Low Memory Usage
> >
> > The linking algorithms in MCLinker are fast and efficient with limited
> memory budget. Here is how we achieve these.
> >
> > + Improve cache locality of both string tables and symbol tables
> >
> > MCLinker keeps symbols and corresponding strings in the same cache line.
> Because symbols and strings are often used in pairs, putting them together
> can improve spatial locality. Furthermore, MCLinker uses a global symbol
> table and avoids copying symbols from input to output symbol tables. Since
> MCLinker always uses the same instance of the symbols with the same name,
> the locality is improved.
> >
> > + Reduce the number of walks over symbol tables
> >
> > Walk over global symbol table is time-consuming, because programs may
> have more than 100,000 symbols. GNU ld reads symbols and relocation entries
> at the same time. Since symbols in that stage are not resolved, GNU ld
> needs to track relocation entries applied to each symbol. This bookkeeping
> causes additional traversals of the symbol table. Following Google gold's
> approach, MCLinker keeps the number of walks over symbol table as few as
> possible. MCLinker resolves symbols before reading relocation entries, and
> this avoids extra symbol table traversals.
> >
> > + Reduce the overhead of traversing all the symbols
> >
> > The symbols are categorized into different groups according to their
> types and bindings. As a result, MCLinker visits a specific group to get
> the symbols it needs instead of traversing the whole symbol pool. For
> example, dynamic symbols are grouped together. When MCLinker generates the
> dynamic symbol section, it only visits the symbols in the dynamic group.
> >
> > + Efficient use of memory mapped I/O
> >
> > Linkers are sensitive to the use of memory mapped I/O. Memory mapped I/O
> brings data to the physical memory only when accessing it, so that it can
> reduce the physical memory usage. However, memory mapped I/O will not
> release pages until pages are un-mapped. In embedded system with limited
> memory, this means the throughput of the system is degraded. MCLinker
> optimizes the use of memory mapped I/O. It requests mapped memory according
> to the average size of object files in the typical case. Thus, more pages
> are available during linking, and it improves the system throughput.
> >
> >
> > The Design of MCLinker
> > ----------------------
> >
> > MCLinker is an integrated linker for LLVM. "Integrated" means MCLinker
> has an adapter to LLVM. MC Layer uses a function pass, called AsmPrinter,
> as an adapter to the standard compilation flow. Like MC Layer, MCLinker
> also provides a function pass, called SectLinker, to be integrated into the
> last stage of the standard compilation flow.
> >
> > Traditional toolchain has a linker driver to prepare parameters for
> linker. In GCC, collect2 is one of such tools to prepare parameters such as
> sysroot, the path of glibc, and so on. In MCLinker, MCLDDriver plays the
> same role as collect2 to prepare all implicit parameters for MCLinker.
> >
> > MCLinker, a class with the same name as the project, provides numerous
> APIs to perform major linking tasks, such as symbol resolution, applying
> relocations, sections merge, and so on. These APIs are defined at high
> level abstraction which is target and format independent. Those target and
> format dependent operations are encapsulated in another class, called
> TargetLDBackend. Therefore, MCLinker can perform linking at a high level
> abstraction without the concern about targets and formats.
> >
> > In order to simplify linking and improve performance, MCLinker defines
> its own symbol data structures, called LDSymbol, instead of using MCSymbol
> directly. Symbols defined in LLVM are separated into two different classes
> – MCSymbol and MCSymbolData, and they are very different from the symbols'
> definition in most object files. If a linker adopts them as the data
> structure of symbols, it needs extra overhead to convert symbols of object
> files into MCSymbol and MCSymbolData. In order to overcome this problem,
> the definition of LDSymbol is close to the symbols' definition in the
> object files, and the overhead of conversion is low. Actually, not only
> reading object files, but also symbol resolution become faster and easier
> when adopting LDSymbol.
> >
> > MCLinker also defines a unified representation for relocation entries,
> called LDRelocation, since LLVM does not provide it. The design of
> relocation in MCLinker is similar to the other portable linkers. Readers
> convert format-dependent relocation entries into general LDRelocation. All
> targets need to provide functions for applying relocations. MCLinker
> connects the functions with corresponding LDRelocations during the
> initialization.
> >
> >
> > Related Work
> > ------------
> >
> > GNU ld and Google gold are well-known linkers with unique features.
> These linkers have their own linking algorithms and data structures. We
> discuss GNU ld and Google gold below.
> >
> > * GNU ld
> >
> > GNU ld is designed for the portable manipulation to support various
> formats of object files. Object files in various formats are represented in
> a common abstraction, provided by the Binary File Descriptor (BFD) library.
> The main job of GNU ld is to read and parse link scripts, that is, it
> behaves as a frontend of the BFD library, and the real linking is done by
> BFD.
> >
> > GNU ld reads symbols and relocations at the same time. This means it
> suffers from the extra overhead to do bookkeeping for relocations. That is
> what both Google gold and MCLinker want to eliminate. Additionally, BFD is
> mainly designed for COFF, so ELF is not handled efficiently in BFD. Both
> Google gold and MCLinker are designed mainly for ELF, and can take
> advantage of ELF features.
> >
> > * Google gold
> >
> > Google gold is designed to be a fast ELF linker. Unlike GUN ld, Google
> gold does not use the BFD library, so it can efficiently use ELF features
> to speed up performance.
> >
> > Another feature of Google gold is multithreading. Google gold uses
> threads to run tasks in parallel, e.g., reading multiple input symbol
> tables in the same time. This improves performance in some cases.
> >
> > In contrast to GNU ld, the linking flow of gold is simplified. This
> makes Google gold's linking stages significantly less than GNU ld, thus
> Google gold saves more linking time.
> >
> > Moreover, on the file operations, Google gold is faster than GNU ld as
> well. GNU ld uses function pointers to deal with format-dependent issues,
> e.g., byte swapping for endianness. In contrast, Google gold uses C++
> template specialization. In GNU ld, all sections and symbols of object
> files are read by calling function pointers, and this leads GNU ld to being
> slower.
> >
> >
> > Current Status
> > --------------
> >
> > So far, the framework of MCLinker is established. Symbol resolution is
> completed and tested. Sections merge, applying relocation, instruction
> relaxation, and writers are ongoing works. If you are interested in
> MCLinker, please find design documents and source code in our website.
> http://code.google.com/p/mclinker/
> >
> > Tested platform: Currently, only Linux, Mac OS X 10.7, and FreeBSD 9.0
> are tested. However, we think you won't run into to any problem on modern
> UN*X machines.
>
> Hello, I'm glad to see other people are interested in building an LLVM
> linker.
>
> I have been working in the dirrection of an LLVM linker for some time
> now. The first key component of this is the libObject library in tree.
> See my talk from the November 2010 Dev Meeting called "Object Files in
> LLVM" <http://www.llvm.org/devmtg/2010-11/>.
>
> We are currently seeking to remove the inherent duplication of object
> file handling in all of the llvm projects.
>
> In your design a lot of the structure seems to be dedicated to the
> idea of integrated linking. While we want to support this kind of use,
> I believe it can be accomplished without as much structure. Most
> projects are built as libraries, and the final executable is generally
> one source file which links to a bunch
> of rather large libraries. The small speedup you get from not writing
> this one object file to disk is negligible, considering it's going to
> be in cache when the link starts, and it will be the first object
> processed.
>
> This structure might make it hardter to use the facilities the linker
> provides to do dynamic loading, and linking JITed code. Both of these
> are cases where having the linker integrated makes a huge performance
> and complexity difference.
>
> A major feature of the linker design I have been working on is the
> object file IR it is based on. It takes the concept of an atom (a
> named range of data) and formalizes it into a graph based data
> structure where edges represent the relations between atoms such as
> relocations, groupings, and layout constraints. This provides a format
> for the core linker algorithms to work on, and also for plugins and
> file-format specific processing to happen. This can also enable more
> efficent intermidate object and debug file formats.
>
> We would all like to avoid duplication and have the best linker
> possible for LLVM. So I think it's important to discuss these issues
> and decide how to best combine our efforts.
>
> - Michael Spencer
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111103/93dbfbf4/attachment.html>
More information about the llvm-dev
mailing list