[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Thu Sep 27 17:56:26 PDT 2018

This isn't the same as obj2yaml. It would only contain information relavant
to linking. obj2yaml attempts to be a full textual representation. Also
calling the output of obj2yaml machine readable is kind of dubious since it
has a reasonably complex output format and is *not* an inverse of yaml2obj
as the name might suggest. No inverse of it exists as far as I am aware.
obj2yaml is better for testing than reviewing the public interface of a DSO.

On Thu, Sep 27, 2018, 4:37 PM Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Thu, Sep 27, 2018 at 4:27 PM Petr Hosek <phosek at chromium.org> wrote:
>
>> On Thu, Sep 27, 2018 at 3:12 PM Rui Ueyama via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Thu, Sep 27, 2018 at 2:42 PM Armando Montanez via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Since the goal is to start llvm-tapi more or less from scratch, I feel
>>>> the best approach initially is to focus on the structure as a key
>>>> point of feedback in initial reviews. Once the foundations are set,
>>>> integrating Mach-O TAPI in parallel with the ELF implementation should
>>>> be relatively straightforward. The features outside of stubbing aren't
>>>> as appealing for ELF, so I probably won't be working on extending that
>>>> functionality. With that being said, the overall design goal is
>>>> generalization/abstraction where possible to welcome feature parity in
>>>> case it is eventually desired. I'm sure we'll run into things that
>>>> belong in the tool but end up being uniquely specialized, and it will
>>>> probably be best to address them on a case-by-case basis.
>>>>
>>>
>>> I'm not very sure what you meant by generalizing it, but given that
>>>
>>> 1) implementing a text-based ELF stub format is not appealing, and we
>>> probably won't implement, and
>>>
>>
>> You could use readelf/objdump, but these tools weren't designed with that
>> use-case in mind and their output isn't adequate for many of the common use
>> cases we're considering: it's not well-specified, it's not designed to be
>> machine (and often human) readable or easily diffable. All the ad-hoc
>> solutions out there, many of which were pointed out in Armando's proposal,
>> demonstrate the need for text-based representation. While, I understand
>> your concerns, I think we should focus on the tool and library itself and
>> leave the discussion of direct linker support for later.
>>
>
> I don't have an objection to creating a tool to dump a DSO contents in a
> machine-readable format, though looks like its goal overlaps with existing
> obj2yaml tool, as obj2yaml is intended to convert a native binary object
> file to a YAML text file.
>
> We're also not firmly set on YAML. We've chosen YAML because it's already
>> used by Apple's Mach-O implementation, but we could consider a different
>> format and we're open to suggestions. However, given our requirements
>> (machine and human readable, easily diffable) I'm not sure if we're going
>> to come up with something that's significantly different from YAML.
>> Furthermore, YAML has the advantage of already being supported in variety
>> of languages.
>>
>
> I guess that YAML is fine. LLVM's YAML reader is kind of slow, but that's
> an implementation matter.
>
>
>>
>>> 2) COFF already has its own (binary) stub format,
>>>
>>
>> Do you have a reference that describes the format and the tooling?
>>
>
> Maybe this one?
> https://docs.microsoft.com/en-us/windows/desktop/dlls/dynamic-link-library-creation
>
> When you create a DLL on Windows, the linker produces two files. One is a
> .dll file and the other is a .lib file. The DLL file contains actual code
> and dynamically linked to an executable. The LIB file is an archive file
> that contains fake object files for each exported symbols, each of which
> explains an exported symbol. When you link your program against a DLL, you
> don't directly link against a DLL. Instead, you need to pass a .lib file
> that corresponds to a desired .dll file.
>
> That way, Windows SDKs don't have to include actual DLL files if you just
> want to allow linking against DLLs. (Of course you need actual DLLs to run
> your program though.)
>
>
>>
>>> I don't see a point of generalizing it. Isn't it just a Mach-O only
>>> thing? If (1) is not true, then maybe we should generalize it, so I think
>>> you need to show evidences that we need a text-based ELF stub format.
>>>
>>> On Wed, Sep 26, 2018 at 2:42 PM Steven Wu <stevenwu at apple.com> wrote:
>>>> >
>>>> > Hi Armando
>>>> >
>>>> > Thanks for the detailed RFC and all the background research. I think
>>>> the concept is good and I will be happy to work with you to integrate the
>>>> ELF implementation with Apple's MachO implementation and contribute it
>>>> upstream. Do you have any proposal on how to integrate with Apple's tapi
>>>> and how should we collaborate?
>>>> >
>>>> > Also, Apple's tapi does more than just stubbing. Are you interested
>>>> to add ELF support for other features as well? (I guess it should not be
>>>> too hard to do that).
>>>> >
>>>> > Thanks
>>>> >
>>>> > Steven
>>>> >
>>>> > > On Sep 26, 2018, at 8:29 AM, Armando Montanez via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>> > >
>>>> > > Hello all,
>>>> > >
>>>> > > LLVM-TAPI seeks to decouple the necessary link-time information for
>>>> a
>>>> > > dynamic shared object from the implementation of the runtime object.
>>>> > > This process will be referred to as dynamic shared object (DSO)
>>>> > > stubbing throughout this proposal. A number of projects have
>>>> > > implemented their own versions of shared object stubbing for a
>>>> variety
>>>> > > of reasons related to improving the overall linking experience. This
>>>> > > functionality is absent from LLVM despite how close the practice is
>>>> to
>>>> > > LLVM’s domain. The goal of this project would be to produce a
>>>> library
>>>> > > for LLVM that not only provides a means for DSO stubbing, but also
>>>> > > gives meaningful insight into the contents of these stubs and how
>>>> they
>>>> > > change. I’ve collected a few example instances of object stubbing as
>>>> > > part of larger tools and the key benefits that resulted from them:
>>>> > >
>>>> > > - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve
>>>> build times.
>>>> > > - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
>>>> > > times, and improve robustness of build system (against dependency
>>>> > > cycles and race conditions).
>>>> > > - Google’s Bazel [3]: Stubbing used to improve build times.
>>>> > > - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
>>>> > > - Android NDK: Stubbing used to reduce size of native sdk, control
>>>> > > exported symbols, and improve build times.
>>>> > >
>>>> > > Somewhat tangentially, a tool called libabigail [6] provides
>>>> utilities
>>>> > > for tracking changes relevant to ELF files in a meaningful way. One
>>>> of
>>>> > > libabigai’s tools provides very detailed textual XML representations
>>>> > > of objects, which is especially useful in the absence of a
>>>> preexisting
>>>> > > textual representation of shared objects’ exposed interfaces. Glibc
>>>> > > [7] and libc++ [8] have made an effort to address this in their own
>>>> > > ways by using scripts to produce textual representations of object
>>>> > > interfaces. This functionality makes it significantly easier to
>>>> > > analyze and control symbol visibility, though the existing solutions
>>>> > > are quite bespoke. Controlling these symbols can have an implicit
>>>> > > benefit of reducing binary size by pruning visible symbols, but the
>>>> > > more critical feature is being able to easily view and edit the
>>>> > > exposed symbols in the first place. Using human-readable stubs
>>>> > > addresses the issues of DSO analysis and control without requiring
>>>> > > highly specialized tools. This does not strive to replace tools
>>>> > > altogether; it just makes small tasks significantly more
>>>> approachable.
>>>> > >
>>>> > > llvm-tapi would strive to be an intersection between a means to
>>>> > > produce and link against stubs, and providing tools that offer more
>>>> > > control and insight into the public interfaces of DSOs. More
>>>> > > fundamentally, llvm-tapi would introduce a library to generate and
>>>> > > ingest human-readable stubs from DSOs to address these issues
>>>> directly
>>>> > > in LLVM. Overall, this idea is most similar to the vein of Apple’s
>>>> > > TAPI, as the original TAPI also uses human-readable stubs.
>>>> > >
>>>> > > In general, llvm-tapi should:
>>>> > >
>>>> > > 1. Produce human-readable text files from dynamic shared objects
>>>> that
>>>> > > are concise, readable, and contain everything required for linking
>>>> > > that can’t be implicitly derived.
>>>> > > 2. Produce linkable files from said human readable text files.
>>>> > > 3. Provide tools to track and control the exposed interfaces of
>>>> object files.
>>>> > > 4. Integrate well with LLVM’s existing tools.
>>>> > > 5. Strive to enable integration of the original TAPI code for
>>>> Mach-O support.
>>>> > >
>>>> > > There are a number of key benefits to using stubs and text-based
>>>> > > application binary interfaces such as:
>>>> > > - Reducing the size of dynamic shared objects used exclusively for
>>>> linking.
>>>> > > - The ability to avoid re-linking an object when its dependencies’
>>>> > > exposed interfaces do not change but their implementation does
>>>> (which
>>>> > > happens frequently).
>>>> > > - Simplicity of viewing a diff for a changed DSO interface.
>>>> > > A large number of other use cases exist; this would open up the
>>>> floor
>>>> > > for a variety of other tools and future work as the concept is
>>>> rather
>>>> > > generic.
>>>> > >
>>>> > > The proposed YAML format would be analogous to Apple’s .tbd format
>>>> but
>>>> > > differ in a few ways to support ELF object types. An example would
>>>> be
>>>> > > as follows:
>>>> > >
>>>> > > --- !tapi-tbe-v1
>>>> > > soname: someobj.so
>>>> > > architecture: aarch64
>>>> > > symbols:
>>>> > > - name: fish
>>>> > >   type: object
>>>> > >   size: 48
>>>> > > - name: foobar
>>>> > >   type: function
>>>> > >   warning-text: “deprecated in SOMEOBJ_1.3”
>>>> > > - name: printf
>>>> > >   type: function
>>>> > > - name: rndfunc
>>>> > >   type: function
>>>> > >   undefined: true
>>>> > > ...
>>>> > >
>>>> > > (Note that this doesn’t account for version sets, but such
>>>> > > functionality can be included in a later version.)
>>>> > >
>>>> > > Most of the fields are self-explanatory, with size not being
>>>> relevant
>>>> > > to function symbols, and warning text being purely optional. One
>>>> > > reason this departs from .tbd format is to make diffs much easier:
>>>> > > sorting symbols alphabetically on individual lines makes it much
>>>> more
>>>> > > obvious which symbols are added, removed, or modified. Despite the
>>>> > > differences, the desire is for llvm-tapi to be structured such that
>>>> > > integrating Apple’s Mach-O TAPI will be plausible and welcomed.
>>>> Prior
>>>> > > discussion [9] indicated interest in integrating Apple TAPI into
>>>> LLVM,
>>>> > > so I’d definitely like to leave that door open and encourage that in
>>>> > > the future.
>>>> > >
>>>> > > I feel the best place to start this is as a library to best
>>>> facilitate
>>>> > > integration into other areas of LLVM, later wrapping it in a
>>>> > > standalone tool and eventually considering direct integration into
>>>> > > LLD. The tool will initially support basic generation of .tbe and
>>>> stub
>>>> > > files from .tbe or ELF. This should give enough functionality for
>>>> > > manually checking shared object interface diffs, as well as having
>>>> > > access to linkable stubs. The goal is for the tool to eventually
>>>> > > provide additional functionality such as compatibility checking, but
>>>> > > that’s a ways into the future.shared
>>>> > >
>>>> > > There’s multiple options for integrating llvm-tapi to work with LLD;
>>>> > > LLD could directly use llvm-tapi to produce and ingest .tbe files
>>>> > > directly, or llvm-tapi could be used to produce stubs that LLD can
>>>> be
>>>> > > taught to use. From a technical standpoint, these are not mutually
>>>> > > exclusive. This step is a ways down the road, but is definitely a
>>>> > > high-priority goal.
>>>> > >
>>>> > > I’m interested to hear your thoughts and feedback on this.
>>>> > >
>>>> > > Best,
>>>> > > Armando
>>>> > >
>>>> > >
>>>> > > [1] https://github.com/ributzka/tapi
>>>> > > [2]
>>>> https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
>>>> > > [3]
>>>> https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
>>>> > > [4]
>>>> https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
>>>> > > [5]
>>>> https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
>>>> > > [6] https://sourceware.org/libabigail/
>>>> > > [7]
>>>> https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
>>>> > > [8]
>>>> https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
>>>> > > [9]
>>>> http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
>>>> > > _______________________________________________
>>>> > > LLVM Developers mailing list
>>>> > > llvm-dev at lists.llvm.org
>>>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180927/a4638f73/attachment.html>