[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Wed Sep 26 12:39:16 PDT 2018

You can also reduce the size of SDK by replacing regular DSOs with
stripped-down version of DSOs too. I honestly doubt that the .tbe format is
smaller than the stripped ELF format, as YAML is not a very compact file
format, but even if there's a difference, I think it's negligible compared
to the size reduction by stripping DSOs. Or, am I missing something?

A tool to convert DSOs to text files is itself useful to analyze DSOs, but
in most cases, the existing tools such as objdump should suffice. So, I
don't see a strong reasoning to define a new file format because the
stripped ELF DSO is almost functionally equivalent for your purpose. I
agree with you that text file is sometimes slightly more appealing than
binary file in some cases, but I feel that's not strong enough to convince
me that lld needs to support it as an input file format along with existing
ELF, COFF and Mach-O formats. (For the same reason, we don't directly
support obj2yaml file format.)

On Wed, Sep 26, 2018 at 11:40 AM Armando Montanez <amontanez at google.com>
wrote:

> Right. Usually you wouldn't want to write a .tbe from scratch, but for
> the sake of linking against a DSO you might only have access to a .tbe
> stub that was produced from the DSO. This specific functionality
> becomes critical when DSOs only used for linking are replaced entirely
> by .tbe stubs because at a SDK level the complete DSO isn't needed.
> This is what Apple has done to significantly reduce their SDK size.
> The .tbe format brings linking functionality, readability, and smaller
> size than a stripped ELF file, making them slightly more appealing for
> reducing SDK size than a standard ELF stub.
> On Wed, Sep 26, 2018 at 10:24 AM Rui Ueyama <ruiu at google.com> wrote:
> >
> > I'd like to know a bit more about the benefit of allowing conversion
> from text to binary. I can imagine that that feature is useful in some
> tricky cases, but in general, you need to have a complete, non-stub DSOs to
> run your executable, so you cannot freely make up a text file, convert it
> to a stub DSO and static-link against it. Perhaps a least error-prone way
> of using the tool is to actually create a DSO, strip it using the tool and
> then link against it. So I wonder what could be a use case from .tbe to
> stub .so.
> >
> > On Wed, Sep 26, 2018 at 10:03 AM Armando Montanez <amontanez at google.com>
> wrote:
> >>
> >> Absolutely. The goal of the tool is to produce both textual and binary
> >> DSO stubs. This means you could take a DSO, produce a textual stub,
> >> modify it however you wish, and then produce a linkable binary stub
> >> from that modified .tbe. That, or you could bypass the textual portion
> >> altogether and just produce binary stubs from DSOs. While the textual
> >> format is useful, the goal is to make the tool complete and maximally
> >> applicable by producing ELF stubs as well.
> >>
> >> Alphabetical symbol sorting is currently a part of the plan as well.
> >> It makes producing a diff easier as well.
> >> On Wed, Sep 26, 2018 at 9:51 AM Rui Ueyama <ruiu at google.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> > Have you considered writing a tool to strip DSOs so that they contain
> only the information needed for dynamic linking? Because the linker uses
> only the symbol table and the symbol version table when linking against a
> DSO, all the other sections such as .text or .data can be removed from a
> file without affecting the output.
> >> >
> >> > Obviously that stripped DSO is not human readable, but looks like it
> has a few merits over inventing a new text description format: (1) you
> don't need to invent something new at all, (2) is backward compatible with
> existing linkers and other tools, (3) all the details of ELF format (such
> as symbol versions) are naturally preserved, (4) is perhaps faster than
> reading a text (especially given that LLVM YAML library is slow). You can
> make the tool to sort symbols alphabetically, so that the tool produces the
> exact same output for two different files that are semantically equivalent
> to the linker.
> >> >
> >> > On Wed, Sep 26, 2018 at 8:30 AM Armando Montanez via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> Hello all,
> >> >>
> >> >> LLVM-TAPI seeks to decouple the necessary link-time information for a
> >> >> dynamic shared object from the implementation of the runtime object.
> >> >> This process will be referred to as dynamic shared object (DSO)
> >> >> stubbing throughout this proposal. A number of projects have
> >> >> implemented their own versions of shared object stubbing for a
> variety
> >> >> of reasons related to improving the overall linking experience. This
> >> >> functionality is absent from LLVM despite how close the practice is
> to
> >> >> LLVM’s domain. The goal of this project would be to produce a library
> >> >> for LLVM that not only provides a means for DSO stubbing, but also
> >> >> gives meaningful insight into the contents of these stubs and how
> they
> >> >> change. I’ve collected a few example instances of object stubbing as
> >> >> part of larger tools and the key benefits that resulted from them:
> >> >>
> >> >> - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve
> build times.
> >> >> - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
> >> >> times, and improve robustness of build system (against dependency
> >> >> cycles and race conditions).
> >> >> - Google’s Bazel [3]: Stubbing used to improve build times.
> >> >> - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
> >> >> - Android NDK: Stubbing used to reduce size of native sdk, control
> >> >> exported symbols, and improve build times.
> >> >>
> >> >> Somewhat tangentially, a tool called libabigail [6] provides
> utilities
> >> >> for tracking changes relevant to ELF files in a meaningful way. One
> of
> >> >> libabigai’s tools provides very detailed textual XML representations
> >> >> of objects, which is especially useful in the absence of a
> preexisting
> >> >> textual representation of shared objects’ exposed interfaces. Glibc
> >> >> [7] and libc++ [8] have made an effort to address this in their own
> >> >> ways by using scripts to produce textual representations of object
> >> >> interfaces. This functionality makes it significantly easier to
> >> >> analyze and control symbol visibility, though the existing solutions
> >> >> are quite bespoke. Controlling these symbols can have an implicit
> >> >> benefit of reducing binary size by pruning visible symbols, but the
> >> >> more critical feature is being able to easily view and edit the
> >> >> exposed symbols in the first place. Using human-readable stubs
> >> >> addresses the issues of DSO analysis and control without requiring
> >> >> highly specialized tools. This does not strive to replace tools
> >> >> altogether; it just makes small tasks significantly more
> approachable.
> >> >>
> >> >> llvm-tapi would strive to be an intersection between a means to
> >> >> produce and link against stubs, and providing tools that offer more
> >> >> control and insight into the public interfaces of DSOs. More
> >> >> fundamentally, llvm-tapi would introduce a library to generate and
> >> >> ingest human-readable stubs from DSOs to address these issues
> directly
> >> >> in LLVM. Overall, this idea is most similar to the vein of Apple’s
> >> >> TAPI, as the original TAPI also uses human-readable stubs.
> >> >>
> >> >> In general, llvm-tapi should:
> >> >>
> >> >> 1. Produce human-readable text files from dynamic shared objects that
> >> >> are concise, readable, and contain everything required for linking
> >> >> that can’t be implicitly derived.
> >> >> 2. Produce linkable files from said human readable text files.
> >> >> 3. Provide tools to track and control the exposed interfaces of
> object files.
> >> >> 4. Integrate well with LLVM’s existing tools.
> >> >> 5. Strive to enable integration of the original TAPI code for Mach-O
> support.
> >> >>
> >> >> There are a number of key benefits to using stubs and text-based
> >> >> application binary interfaces such as:
> >> >> - Reducing the size of dynamic shared objects used exclusively for
> linking.
> >> >> - The ability to avoid re-linking an object when its dependencies’
> >> >> exposed interfaces do not change but their implementation does (which
> >> >> happens frequently).
> >> >> - Simplicity of viewing a diff for a changed DSO interface.
> >> >> A large number of other use cases exist; this would open up the floor
> >> >> for a variety of other tools and future work as the concept is rather
> >> >> generic.
> >> >>
> >> >> The proposed YAML format would be analogous to Apple’s .tbd format
> but
> >> >> differ in a few ways to support ELF object types. An example would be
> >> >> as follows:
> >> >>
> >> >> --- !tapi-tbe-v1
> >> >> soname: someobj.so
> >> >> architecture: aarch64
> >> >> symbols:
> >> >>  - name: fish
> >> >>    type: object
> >> >>    size: 48
> >> >>  - name: foobar
> >> >>    type: function
> >> >>    warning-text: “deprecated in SOMEOBJ_1.3”
> >> >>  - name: printf
> >> >>    type: function
> >> >>  - name: rndfunc
> >> >>    type: function
> >> >>    undefined: true
> >> >> ...
> >> >>
> >> >> (Note that this doesn’t account for version sets, but such
> >> >> functionality can be included in a later version.)
> >> >>
> >> >> Most of the fields are self-explanatory, with size not being relevant
> >> >> to function symbols, and warning text being purely optional. One
> >> >> reason this departs from .tbd format is to make diffs much easier:
> >> >> sorting symbols alphabetically on individual lines makes it much more
> >> >> obvious which symbols are added, removed, or modified. Despite the
> >> >> differences, the desire is for llvm-tapi to be structured such that
> >> >> integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
> >> >> discussion [9] indicated interest in integrating Apple TAPI into
> LLVM,
> >> >> so I’d definitely like to leave that door open and encourage that in
> >> >> the future.
> >> >>
> >> >> I feel the best place to start this is as a library to best
> facilitate
> >> >> integration into other areas of LLVM, later wrapping it in a
> >> >> standalone tool and eventually considering direct integration into
> >> >> LLD. The tool will initially support basic generation of .tbe and
> stub
> >> >> files from .tbe or ELF. This should give enough functionality for
> >> >> manually checking shared object interface diffs, as well as having
> >> >> access to linkable stubs. The goal is for the tool to eventually
> >> >> provide additional functionality such as compatibility checking, but
> >> >> that’s a ways into the future.shared
> >> >>
> >> >> There’s multiple options for integrating llvm-tapi to work with LLD;
> >> >> LLD could directly use llvm-tapi to produce and ingest .tbe files
> >> >> directly, or llvm-tapi could be used to produce stubs that LLD can be
> >> >> taught to use. From a technical standpoint, these are not mutually
> >> >> exclusive. This step is a ways down the road, but is definitely a
> >> >> high-priority goal.
> >> >>
> >> >> I’m interested to hear your thoughts and feedback on this.
> >> >>
> >> >> Best,
> >> >> Armando
> >> >>
> >> >>
> >> >> [1] https://github.com/ributzka/tapi
> >> >> [2]
> https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
> >> >> [3]
> https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
> >> >> [4]
> https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
> >> >> [5]
> https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
> >> >> [6] https://sourceware.org/libabigail/
> >> >> [7]
> https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
> >> >> [8]
> https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
> >> >> [9]
> http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> llvm-dev at lists.llvm.org
> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180926/be0d4916/attachment-0001.html>