[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Wed Sep 26 10:24:27 PDT 2018

I'd like to know a bit more about the benefit of allowing conversion from
text to binary. I can imagine that that feature is useful in some tricky
cases, but in general, you need to have a complete, non-stub DSOs to run
your executable, so you cannot freely make up a text file, convert it to a
stub DSO and static-link against it. Perhaps a least error-prone way of
using the tool is to actually create a DSO, strip it using the tool and
then link against it. So I wonder what could be a use case from .tbe to
stub .so.

On Wed, Sep 26, 2018 at 10:03 AM Armando Montanez <amontanez at google.com>
wrote:

> Absolutely. The goal of the tool is to produce both textual and binary
> DSO stubs. This means you could take a DSO, produce a textual stub,
> modify it however you wish, and then produce a linkable binary stub
> from that modified .tbe. That, or you could bypass the textual portion
> altogether and just produce binary stubs from DSOs. While the textual
> format is useful, the goal is to make the tool complete and maximally
> applicable by producing ELF stubs as well.
>
> Alphabetical symbol sorting is currently a part of the plan as well.
> It makes producing a diff easier as well.
> On Wed, Sep 26, 2018 at 9:51 AM Rui Ueyama <ruiu at google.com> wrote:
> >
> > Hi,
> >
> > Have you considered writing a tool to strip DSOs so that they contain
> only the information needed for dynamic linking? Because the linker uses
> only the symbol table and the symbol version table when linking against a
> DSO, all the other sections such as .text or .data can be removed from a
> file without affecting the output.
> >
> > Obviously that stripped DSO is not human readable, but looks like it has
> a few merits over inventing a new text description format: (1) you don't
> need to invent something new at all, (2) is backward compatible with
> existing linkers and other tools, (3) all the details of ELF format (such
> as symbol versions) are naturally preserved, (4) is perhaps faster than
> reading a text (especially given that LLVM YAML library is slow). You can
> make the tool to sort symbols alphabetically, so that the tool produces the
> exact same output for two different files that are semantically equivalent
> to the linker.
> >
> > On Wed, Sep 26, 2018 at 8:30 AM Armando Montanez via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> Hello all,
> >>
> >> LLVM-TAPI seeks to decouple the necessary link-time information for a
> >> dynamic shared object from the implementation of the runtime object.
> >> This process will be referred to as dynamic shared object (DSO)
> >> stubbing throughout this proposal. A number of projects have
> >> implemented their own versions of shared object stubbing for a variety
> >> of reasons related to improving the overall linking experience. This
> >> functionality is absent from LLVM despite how close the practice is to
> >> LLVM’s domain. The goal of this project would be to produce a library
> >> for LLVM that not only provides a means for DSO stubbing, but also
> >> gives meaningful insight into the contents of these stubs and how they
> >> change. I’ve collected a few example instances of object stubbing as
> >> part of larger tools and the key benefits that resulted from them:
> >>
> >> - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build
> times.
> >> - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
> >> times, and improve robustness of build system (against dependency
> >> cycles and race conditions).
> >> - Google’s Bazel [3]: Stubbing used to improve build times.
> >> - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
> >> - Android NDK: Stubbing used to reduce size of native sdk, control
> >> exported symbols, and improve build times.
> >>
> >> Somewhat tangentially, a tool called libabigail [6] provides utilities
> >> for tracking changes relevant to ELF files in a meaningful way. One of
> >> libabigai’s tools provides very detailed textual XML representations
> >> of objects, which is especially useful in the absence of a preexisting
> >> textual representation of shared objects’ exposed interfaces. Glibc
> >> [7] and libc++ [8] have made an effort to address this in their own
> >> ways by using scripts to produce textual representations of object
> >> interfaces. This functionality makes it significantly easier to
> >> analyze and control symbol visibility, though the existing solutions
> >> are quite bespoke. Controlling these symbols can have an implicit
> >> benefit of reducing binary size by pruning visible symbols, but the
> >> more critical feature is being able to easily view and edit the
> >> exposed symbols in the first place. Using human-readable stubs
> >> addresses the issues of DSO analysis and control without requiring
> >> highly specialized tools. This does not strive to replace tools
> >> altogether; it just makes small tasks significantly more approachable.
> >>
> >> llvm-tapi would strive to be an intersection between a means to
> >> produce and link against stubs, and providing tools that offer more
> >> control and insight into the public interfaces of DSOs. More
> >> fundamentally, llvm-tapi would introduce a library to generate and
> >> ingest human-readable stubs from DSOs to address these issues directly
> >> in LLVM. Overall, this idea is most similar to the vein of Apple’s
> >> TAPI, as the original TAPI also uses human-readable stubs.
> >>
> >> In general, llvm-tapi should:
> >>
> >> 1. Produce human-readable text files from dynamic shared objects that
> >> are concise, readable, and contain everything required for linking
> >> that can’t be implicitly derived.
> >> 2. Produce linkable files from said human readable text files.
> >> 3. Provide tools to track and control the exposed interfaces of object
> files.
> >> 4. Integrate well with LLVM’s existing tools.
> >> 5. Strive to enable integration of the original TAPI code for Mach-O
> support.
> >>
> >> There are a number of key benefits to using stubs and text-based
> >> application binary interfaces such as:
> >> - Reducing the size of dynamic shared objects used exclusively for
> linking.
> >> - The ability to avoid re-linking an object when its dependencies’
> >> exposed interfaces do not change but their implementation does (which
> >> happens frequently).
> >> - Simplicity of viewing a diff for a changed DSO interface.
> >> A large number of other use cases exist; this would open up the floor
> >> for a variety of other tools and future work as the concept is rather
> >> generic.
> >>
> >> The proposed YAML format would be analogous to Apple’s .tbd format but
> >> differ in a few ways to support ELF object types. An example would be
> >> as follows:
> >>
> >> --- !tapi-tbe-v1
> >> soname: someobj.so
> >> architecture: aarch64
> >> symbols:
> >>  - name: fish
> >>    type: object
> >>    size: 48
> >>  - name: foobar
> >>    type: function
> >>    warning-text: “deprecated in SOMEOBJ_1.3”
> >>  - name: printf
> >>    type: function
> >>  - name: rndfunc
> >>    type: function
> >>    undefined: true
> >> ...
> >>
> >> (Note that this doesn’t account for version sets, but such
> >> functionality can be included in a later version.)
> >>
> >> Most of the fields are self-explanatory, with size not being relevant
> >> to function symbols, and warning text being purely optional. One
> >> reason this departs from .tbd format is to make diffs much easier:
> >> sorting symbols alphabetically on individual lines makes it much more
> >> obvious which symbols are added, removed, or modified. Despite the
> >> differences, the desire is for llvm-tapi to be structured such that
> >> integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
> >> discussion [9] indicated interest in integrating Apple TAPI into LLVM,
> >> so I’d definitely like to leave that door open and encourage that in
> >> the future.
> >>
> >> I feel the best place to start this is as a library to best facilitate
> >> integration into other areas of LLVM, later wrapping it in a
> >> standalone tool and eventually considering direct integration into
> >> LLD. The tool will initially support basic generation of .tbe and stub
> >> files from .tbe or ELF. This should give enough functionality for
> >> manually checking shared object interface diffs, as well as having
> >> access to linkable stubs. The goal is for the tool to eventually
> >> provide additional functionality such as compatibility checking, but
> >> that’s a ways into the future.shared
> >>
> >> There’s multiple options for integrating llvm-tapi to work with LLD;
> >> LLD could directly use llvm-tapi to produce and ingest .tbe files
> >> directly, or llvm-tapi could be used to produce stubs that LLD can be
> >> taught to use. From a technical standpoint, these are not mutually
> >> exclusive. This step is a ways down the road, but is definitely a
> >> high-priority goal.
> >>
> >> I’m interested to hear your thoughts and feedback on this.
> >>
> >> Best,
> >> Armando
> >>
> >>
> >> [1] https://github.com/ributzka/tapi
> >> [2] https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
> >> [3]
> https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
> >> [4]
> https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
> >> [5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
> >> [6] https://sourceware.org/libabigail/
> >> [7]
> https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
> >> [8]
> https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
> >> [9]
> http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180926/0be037ef/attachment-0001.html>