[LLVMdev] LLVM as a shared library
Eric Christopher
echristo at gmail.com
Wed Aug 6 10:43:12 PDT 2014
I think you've got some good points here and I think getting the right
balance will be hard, but if it seems that there's community demand
for this then some concrete proposals sound like a good thing here.
I'm also of the opinion that one of the reasons we don't run into this
much is that we're careful about what we open up in the C API. I could
be wrong here, but am cautious about screwing it up and then learning
:)
-eric
On Wed, Aug 6, 2014 at 10:28 AM, Filip Pizlo <fpizlo at apple.com> wrote:
>
> On Aug 6, 2014, at 12:00 AM, Nick Lewycky <nicholas at mxc.ca> wrote:
>
> Filip Pizlo wrote:
>
> This is exciting!
>
> I would be happy to help.
>
>
> On Aug 5, 2014, at 12:38 PM, Chris Bieneman<beanz at apple.com> wrote:
>
> Hello LLVM community,
>
> Over the last few years the LLVM team here at Apple and development teams
> elsewhere have been busily working on finding new and interesting uses for
> LLVM. Some of these uses are traditional compilers, but a growing number of
> them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM
> into existing applications. These embedded uses of LLVM have their own
> unique challenges.
>
> Over the next few months, a few of us at Apple are going to be working on
> tackling a few new problems that we would like solved in open source so
> other projects can benefit from them. Some of these efforts will be
> non-trivial, so we’d like to start a few discussions over the next few
> weeks.
>
> Our primary goals are to (1) make it easier to embed LLVM into external
> projects as a shared library, and (2) generally improve the performance of
> LLVM as a shared library.
>
> The list of the problems we’re currently planning to tackle is:
>
> (1) Reduce or eliminate static initializers, global constructors, and global
> destructors
> (2) Clean up cross compiling in the CMake build system
> (3) Update LLVM debugging mechanisms for being part of a dynamic library
> (4) Move overridden sys calls (like abort) into the tools, rather than the
> libraries
> (5) Update TableGen to support stripping unused content (i.e. Intrinsics for
> backends you’re not building)
>
>
> Also:
>
> (6) Determine if command line options are the best way of passing
> configuration settings into LLVM.
>
>
> They're already banned, so there isn't anything left to determine here, just
> code to fix.
>
> It’s an awkward abstraction when LLVM is embedded. I suspect (6) will be
> closely related to (1) since command line option parsing was the hardest
> impediment to getting rid of static initializers.
>
>
> Yes, for all these reasons. Two libraries may be using llvm under the hood
> unaware of each other, they can't both share global state. Command line
> flags block that. Our command-line tools should be parsing their own flags
> and setting state through some other mechanism, and that state musn't be
> more global than an LLVMContext.
>
> My understanding of the shared library proposal is that the library only
> exposes the C API since the C++ API is not intended to allow for binary
> compatibility. So, I think we need to either add the following as either an
> explicit goal of the shared library work, or as a closely related project:
>
> (7) Make the C API truly great.
>
> I think it’s harmful to LLVM in the long run if external embedders use the
> C++ API.
>
>
> The quality with which we maintain the C API today suggests that we
> collectively think of it as an albatross to be suffered. There is work
> necessary to change that perception too.
>
> I think that one way of ensuring that they don’t have an excuse to do it is
> to flesh out some things:
>
>
> - Add more tests of the C API to ensure that people don’t break it
> accidentally and to give more gravitas to the C API backwards compatibility
> claims.
>
>
> Yes, for well-designed high level APIs like libLTO and libIndex. For other
> APIs, we should remove the backwards compatibility guarantees ...
>
> - Increase C API coverage.
>
>
> ... which in turn allows us to do this.
>
> Designing a good high-level API is hard (even libLTO has very ugly cracks in
> its API surface) and that makes it hard to do. What actually happens is that
> people write C APIs that closely match the C++ APIs in order to access them
> through other languages, but there's no way we can guarantee compatibility
> without freezing the C++ API too. Which we never will. This isn't a
> theoretical problem either, look at this case:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/229354.html
> where we made a straight-forward update to the LLVM IR, but in theory a user
> of the C API would be able to observe the difference, and that could in turn
> break a C API user that was relying on the way old LLVM worked.
>
> The solution is to offer two levels of C API, one intended for people to use
> to bind to their own language. This matches the C++ API closely and changes
> when the C++ API changes. (It could even be partially/wholy auto-generated
> via a clang tool?) Users of it will be broken with newer versions.
>
>
> I want to understand what it is specifically that you’re proposing and how
> it would differ from the current C API.
>
> To me, there are two separate concerns here: the stability of the C API
> itself and the stability of the IR and other data formats which the C API
> necessarily exposes.
>
> I agree that C APIs that directly match some underlying C++ APIs are a bad
> idea. This is avoidable. I think the C API usually does a good job of
> avoiding it. I don’t think this was the direct problem in the global_ctors
> case that you just brought up.
>
> The deeper problem is that the C API currently reveals the full power of the
> LLVM language, and also details of other data formats - for example, MCJIT
> clients are now encouraged to parse sections and those sections may contain
> things formatted in tricky ways deep inside LLVM. Much of the LLVM IR is
> essentially baked into a bunch of C function signatures. I see how you may
> have been referring to this by saying that “people write C APIs that closely
> match the C++ APIs” - but I just want to emphasize that the problem isn’t
> with matching a C++ API as much as it is that both the C API and the C++ API
> are closely matching the LLVM language. Also, any non-syntactic invariant
> of the language is effectively revealed through the API. If the LLVM
> language changes - for example global_ctors should now be used in a
> different way - then C API clients might be broken because of it. Similar
> things could happen if the stackmap, dwarf, EH frame, or compactunwind
> formats change.
>
> I believe that the latter problem is very real and I don’t believe that a
> solution exists that is both practical and absolute. An absolute solution
> would surely involve inventing a whole new IR that is meant to be stable
> forever, and any C API client that generates IR will use this IR instead of
> the real LLVM IR, and then internally when you create this IR then it is
> converted into LLVM IR behind the scenes. You could alternatively view this
> dystopian future as being equivalent to forever supporting auto-upgrade from
> all prior versions of IR for clients of the C API. That seems really dumb
> to me, because I believe that such a solution would be more expensive than
> the price we pay right now for the slight instability - bugs like
> global_ctors are not super common, have limited fallout, and can be worked
> around by clients if they are given notice. So, it would be great to come
> up with a middle ground: we don’t want to throw C API stability out the
> window because of a few bugs that sometimes require breaking changes but we
> also don’t want to carve the API out of stone and never leave wiggle room.
> I believe that this “super stable except when it isn’t" philosophy is
> consistent with what most people mean by “stable API” in the sense that
> well-maintained APIs end up deprecating things and eventually removing them.
> I’ve also seen breaking changes get made to stable API on the grounds that
> all major clients were in the loop and none of them objected.
>
> WebKit has already in the past cooperated through C API changes and will
> probably continue to do so in the future. Of course these happened to
> involve C APIs that didn’t yet fall under the stability rule because they
> hadn’t shipped yet - but that doesn’t make much of a difference to us. When
> we cut a WebKit release branch we lock it against some LLVM branch, so C API
> stability is only an issue for active development on trunk, and we already
> know from experience that there exists some amount of wiggle room that we
> can cope with. I don’t yet know how to define what that is other than “if
> you ask us nicely about an API change then we’ll probably say okay”.
>
>
> Secondly, some people really want a stable interface, so we give them an API
> expressed in higher-level tasks they want to achieve, so that we can change
> the underlying workings of how LLVM works without disturbing the API. That
> can be made ABI stable.
>
> - For example, WebKit currently sidesteps the C API to pass some commandline
> options to LLVM. We don’t want that.
>
>
> Seconded!
>
> - Add more support for reasoning about targets and triples. WebKit still
> has to hardcode triples in some places even though it only ever does
> in-process JITing where host==target. That’s weird.
>
>
> Sounds good.
>
> - Expose debugging and runtime stuff and make sure that there’s a coherent
> integration story with the MCJIT C API.
> - Currently it’s difficult to round-trip debug info: creating it in C is
> awkward and parsing DWARF sections that MCJIT generates involves lots of
> weirdness. WebKit has its own DWARF parser for this, which shouldn’t be
> necessary.
> - WebKit is about to have its own copies of both a compactunwind and EH
> frame parser. The contributor who “wrote” the EH frame parser actually just
> took it from LLVM. The licenses are compatible, but nonetheless, copy-paste
> from LLVM into WebKit should be discouraged.
>
>
> I am not familiar with the MCJIT C API, but this sounds reasonable. I'll
> trust that you know what you're doing.
>
> - Engage with non-WebKit embedders that currently use the C++ API to figure
> out what it would take to get them to switch to the C API.
>
>
> Engage with our users? That's crazy talk! ;)
>
> Nick
>
> I think that a lot of time when C API discussions arise, lots of embedders
> give excuses for using the C++ API. WebKit used the C API for generating IR
> and even doing some IR manipulation, and for driving the MCJIT. It’s been a
> positive experience and we enjoy the binary compatibility that it gives us.
> I think it would be great to see if other embedders can do the same.
>
> -Filip
>
>
> We will be sending more specific proposals and patches for each of the
> changes listed above starting this week. If you’re interested in these
> problems and their solutions, please speak up and help us develop a solution
> that will work for your needs and ours.
>
> Thanks,
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list