[LLVMdev] LLVM as a shared library

Filip Pizlo fpizlo at apple.com
Wed Aug 6 10:28:40 PDT 2014


> On Aug 6, 2014, at 12:00 AM, Nick Lewycky <nicholas at mxc.ca> wrote:
> 
> Filip Pizlo wrote:
>> This is exciting!
>> 
>> I would be happy to help.
>> 
>> 
>>> On Aug 5, 2014, at 12:38 PM, Chris Bieneman<beanz at apple.com>  wrote:
>>> 
>>> Hello LLVM community,
>>> 
>>> Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.
>>> 
>>> Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.
>>> 
>>> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.
>>> 
>>> The list of the problems we’re currently planning to tackle is:
>>> 
>>> (1) Reduce or eliminate static initializers, global constructors, and global destructors
>>> (2) Clean up cross compiling in the CMake build system
>>> (3) Update LLVM debugging mechanisms for being part of a dynamic library
>>> (4) Move overridden sys calls (like abort) into the tools, rather than the libraries
>>> (5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)
>> 
>> Also:
>> 
>> (6) Determine if command line options are the best way of passing configuration settings into LLVM.
> 
> They're already banned, so there isn't anything left to determine here, just code to fix.
> 
>> It’s an awkward abstraction when LLVM is embedded. I suspect (6) will be closely related to (1) since command line option parsing was the hardest impediment to getting rid of static initializers.
> 
> Yes, for all these reasons. Two libraries may be using llvm under the hood unaware of each other, they can't both share global state. Command line flags block that. Our command-line tools should be parsing their own flags and setting state through some other mechanism, and that state musn't be more global than an LLVMContext.
> 
>> My understanding of the shared library proposal is that the library only exposes the C API since the C++ API is not intended to allow for binary compatibility.  So, I think we need to either add the following as either an explicit goal of the shared library work, or as a closely related project:
>> 
>> (7) Make the C API truly great.
>> 
>> I think it’s harmful to LLVM in the long run if external embedders use the C++ API.
> 
> The quality with which we maintain the C API today suggests that we collectively think of it as an albatross to be suffered. There is work necessary to change that perception too.
> 
>  I think that one way of ensuring that they don’t have an excuse to do it is to flesh out some things:
>> 
>> - Add more tests of the C API to ensure that people don’t break it accidentally and to give more gravitas to the C API backwards compatibility claims.
> 
> Yes, for well-designed high level APIs like libLTO and libIndex. For other APIs, we should remove the backwards compatibility guarantees ...
> 
>> - Increase C API coverage.
> 
> ... which in turn allows us to do this.
> 
> Designing a good high-level API is hard (even libLTO has very ugly cracks in its API surface) and that makes it hard to do. What actually happens is that people write C APIs that closely match the C++ APIs in order to access them through other languages, but there's no way we can guarantee compatibility without freezing the C++ API too. Which we never will. This isn't a theoretical problem either, look at this case:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/229354.html
> where we made a straight-forward update to the LLVM IR, but in theory a user of the C API would be able to observe the difference, and that could in turn break a C API user that was relying on the way old LLVM worked.
> 
> The solution is to offer two levels of C API, one intended for people to use to bind to their own language. This matches the C++ API closely and changes when the C++ API changes. (It could even be partially/wholy auto-generated via a clang tool?) Users of it will be broken with newer versions.

I want to understand what it is specifically that you’re proposing and how it would differ from the current C API.

To me, there are two separate concerns here: the stability of the C API itself and the stability of the IR and other data formats which the C API necessarily exposes.

I agree that C APIs that directly match some underlying C++ APIs are a bad idea.  This is avoidable.  I think the C API usually does a good job of avoiding it.  I don’t think this was the direct problem in the global_ctors case that you just brought up.

The deeper problem is that the C API currently reveals the full power of the LLVM language, and also details of other data formats - for example, MCJIT clients are now encouraged to parse sections and those sections may contain things formatted in tricky ways deep inside LLVM.  Much of the LLVM IR is essentially baked into a bunch of C function signatures.  I see how you may have been referring to this by saying that “people write C APIs that closely match the C++ APIs” - but I just want to emphasize that the problem isn’t with matching a C++ API as much as it is that both the C API and the C++ API are closely matching the LLVM language.  Also, any non-syntactic invariant of the language is effectively revealed through the API.  If the LLVM language changes - for example global_ctors should now be used in a different way - then C API clients might be broken because of it.  Similar things could happen if the stackmap, dwarf, EH frame, or compactunwind formats change.

I believe that the latter problem is very real and I don’t believe that a solution exists that is both practical and absolute.  An absolute solution would surely involve inventing a whole new IR that is meant to be stable forever, and any C API client that generates IR will use this IR instead of the real LLVM IR, and then internally when you create this IR then it is converted into LLVM IR behind the scenes.  You could alternatively view this dystopian future as being equivalent to forever supporting auto-upgrade from all prior versions of IR for clients of the C API.  That seems really dumb to me, because I believe that such a solution would be more expensive than the price we pay right now for the slight instability - bugs like global_ctors are not super common, have limited fallout, and can be worked around by clients if they are given notice.  So, it would be great to come up with a middle ground: we don’t want to throw C API stability out the window because of a few bugs that sometimes require breaking changes but we also don’t want to carve the API out of stone and never leave wiggle room.  I believe that this “super stable except when it isn’t" philosophy is consistent with what most people mean by “stable API” in the sense that well-maintained APIs end up deprecating things and eventually removing them.  I’ve also seen breaking changes get made to stable API on the grounds that all major clients were in the loop and none of them objected.

WebKit has already in the past cooperated through C API changes and will probably continue to do so in the future.  Of course these happened to involve C APIs that didn’t yet fall under the stability rule because they hadn’t shipped yet - but that doesn’t make much of a difference to us.  When we cut a WebKit release branch we lock it against some LLVM branch, so C API stability is only an issue for active development on trunk, and we already know from experience that there exists some amount of wiggle room that we can cope with.  I don’t yet know how to define what that is other than “if you ask us nicely about an API change then we’ll probably say okay”.

> 
> Secondly, some people really want a stable interface, so we give them an API expressed in higher-level tasks they want to achieve, so that we can change the underlying workings of how LLVM works without disturbing the API. That can be made ABI stable.
> 
>> 	- For example, WebKit currently sidesteps the C API to pass some commandline options to LLVM.  We don’t want that.
> 
> Seconded!
> 
>> 	- Add more support for reasoning about targets and triples.  WebKit still has to hardcode triples in some places even though it only ever does in-process JITing where host==target.  That’s weird.
> 
> Sounds good.
> 
>> 	- Expose debugging and runtime stuff and make sure that there’s a coherent integration story with the MCJIT C API.
>> 		- Currently it’s difficult to round-trip debug info: creating it in C is awkward and parsing DWARF sections that MCJIT generates involves lots of weirdness.  WebKit has its own DWARF parser for this, which shouldn’t be necessary.
>> 		- WebKit is about to have its own copies of both a compactunwind and EH frame parser.  The contributor who “wrote” the EH frame parser actually just took it from LLVM.  The licenses are compatible, but nonetheless, copy-paste from LLVM into WebKit should be discouraged.
> 
> I am not familiar with the MCJIT C API, but this sounds reasonable. I'll trust that you know what you're doing.
> 
>> - Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.
> 
> Engage with our users? That's crazy talk! ;)
> 
> Nick
> 
>> I think that a lot of time when C API discussions arise, lots of embedders give excuses for using the C++ API.  WebKit used the C API for generating IR and even doing some IR manipulation, and for driving the MCJIT.  It’s been a positive experience and we enjoy the binary compatibility that it gives us.  I think it would be great to see if other embedders can do the same.
>> 
>> -Filip
>> 
>>> 
>>> We will be sending more specific proposals and patches for each of the changes listed above starting this week. If you’re interested in these problems and their solutions, please speak up and help us develop a solution that will work for your needs and ours.
>>> 
>>> Thanks,
>>> -Chris
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140806/053cbadf/attachment.html>


More information about the llvm-dev mailing list