[PATCH] D64939: Add a proposal for a libc project under the LLVM umbrella.
Siva Chandra via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 13 11:49:44 PDT 2019
sivachandra marked an inline comment as done.
sivachandra added inline comments.
================
Comment at: llvm/docs/Proposals/LLVMLibC.rst:37
+ testing like fuzz testing and sanitizer-supported testing.
+- ABI independent implementation as far as possible.
+- Use source based implementations as far possible rather than
----------------
ldionne wrote:
> sivachandra wrote:
> > chandlerc wrote:
> > > sivachandra wrote:
> > > > jfb wrote:
> > > > > jfb wrote:
> > > > > > chandlerc wrote:
> > > > > > > sivachandra wrote:
> > > > > > > > jfb wrote:
> > > > > > > > > dlj wrote:
> > > > > > > > > > jfb wrote:
> > > > > > > > > > > sivachandra wrote:
> > > > > > > > > > > > dlj wrote:
> > > > > > > > > > > > > jfb wrote:
> > > > > > > > > > > > > > sivachandra wrote:
> > > > > > > > > > > > > > > jfb wrote:
> > > > > > > > > > > > > > > > Will it be ABI-stable? Maybe it's worth expanding on how the ABI will evolve, and what will be stable.
> > > > > > > > > > > > > > > I am not sure how exactly to word it here. Do you have any suggestions on what the ABI promise should be and what exactly to say in a proposal like this?
> > > > > > > > > > > > > > Depends on what people want to do with it. I'm just saying it should be given thought. If you want to interop with other libc then you need to match their ABI, which can be a burden. IIUC musl matched glibc almost accidentally, and is moving away from doing so. Then you might consider whether your libc is ABI stable over time, and how you'd manage that. The answer might change between static and dynamic versions.
> > > > > > > > > > > > > (Sorry for chiming in so late... I chatted with Siva, and I volunteered to add my take.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think we would ever try to maintain ABI-level interop with other libc implementations on purpose. There is some subtlety here, though, so let me step back for a moment.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (The next few paragraphs are largely for the benefit of folks who may be following along the review thread, so I'm surely restating more than necessary. This is a bit long, so you can skip to the end if you want... I won't tell anyone. ;-) )
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fundamentally, our aim is to define a tightly-bounded interface (i.e., provides what it must; no more, no less), and retain as much flexibility as we can for everything behind the interface. The "implementation stuff" behind the interface then has to be structured in a rational way, so that bits and pieces can be replaced. (I'm going to come back to that point in a bit.) The "interface" in this case are effectively just the headers: struct and enum definitions, function declarations, maybe a couple of extern variable declarations, and... really, that's about it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The "implementation flexibility" goal is where things like DSOs, interceptors, and delegation come into play. The point in the design space we need (for Google production workloads) is, arguably, one of(*) the simplest: a fully statically linked (*)PIE binary, wherein we are willing to build everything from sources using the same compiler.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The question of things like ABI stability seem to me, quite frankly, out of the scope that we would even //want// to define. There is a simple reason why I would like to stay somewhat purposefully blind to these questions: I do not believe we can predict why folks might need guarantees from the //library//, instead of relying on the guarantees that come from the combination of language and compiler implementation. (Such reasons surely exist, but I expect many of them to be novel, surprising, or both.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > In other words: if you can build the .a or .lib, and you have headers that match it, and that archive works like any other library archive, //why would one need still more guarantees?// FWIW, this question is only maybe 75% rhetorical... I find myself routinely ... let's say, "impressed" by new and interesting ways to ... let's say, "meaningfully change program behavior" using just the linker's command line options. (Obviously, I'm toning down my opinions here... but my strong bias is that these uses need to be brought to light before any fundamental design accommodations are made.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > A logical extension of the above is that the implementation will look as much as possible like any other library. We don't want to insert an "upward contract" (or would it be "downward?") that might get in the way of whatever else a packager might want to do. (Providing a DSO is, IMO, more akin to release or distribution packaging than simply building a library. I'm purposefully using the term "packaging" instead of anchoring on, say, the specifics of ELF DSOs.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think there is an interesting analog here to the LLVM project in general: similar to how the broader LLVM project's "compiler as a library" approach yields a toolbox of things which can be used to build a compiler (but also other things), the "libc as a library" aphorism(/pun) is meant to point out that the task of "building a libc" isn't monolithic, either (and maybe there are other things people want in this space, and we can make our work useful to them). This comes back to the question about using or replacing parts of the overall libc: there's really no reason that the whole thing //needs// to be monolithic. There are, of course, some internally-cohesive subsystems that would be hard to break apart; but at the macro level, a "libc" isn't terribly cohesive. The monolithic coupling is largely artificial.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alright, so that's the background.
> > > > > > > > > > > > >
> > > > > > > > > > > > > There is, quite admittedly, some duplicity in the goals -- at least, as I've stated them.
> > > > > > > > > > > > >
> > > > > > > > > > > > > One good example: we actually //would// like to make reasonable "narrowing" guarantees: if there is some design or implementation space we could leave empty, and doing so would drastically simplify a packaging use case, then that's something we should try to accommodate. (Especially if it doesn't add substantial cost to the implementation.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, a guarantee like "our ABI will match <other libc>" is not what I would call narrowing: it would require us to cover more of the design/implementation/ABI space than absolutely necessary, so it doesn't really seem like a guarantee we would want to make.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On a more technical note, I should also point out that there is an important detail that is easy to miss, and I think it's mostly buried somewhere within this bullet point:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > - Provide C symbols as specified by the standards, but take advantage
> > > > > > > > > > > > > > and use C++ language facilities for the core implementation.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, for example, the entire "implementation" ought to be available without squatting on symbol names used by libc. Our expectation is to use link-time aliasing to satisfy most libc symbols, but I think the important takeaway is that the libc symbols will notionally only exist in an independent layer on top of the "actual" implementation (i.e., separate symbol logic from program logic).
> > > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >
> > > > > > > > > > > > > One seemingly-equivocal goal in all of this is delegation to a separate libc, and this seems like it might raise questions about ABI interop. Personally, I would not characterize this as "interoperability;" rather, I would characterize it by saying that a "syscall" might be implemented by, say, a trap instruction; but it doesn't have to be. For instance, it could delegate -- probably through clever linking tricks -- to a vendor-supplied libc. More generally, I would say that this could apply to any libcall (not just syscalls). My expectation is that, since we'll need something like this as a bootstrapping mechanism anyhow, it would be unwise to try to make it an anti-goal. Inhibiting that use case just seems like more work to cover ground that we don't particularly care about. (Implementing such delegation is hard, but actually preventing it would be even harder... and to what gain? Bragging rights?)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I do **strongly** suspect that there are other users who might want to use this libc, but still have to use a vendor library to actually talk to their kernel. Or maybe folks would want to delegate almost everything to their existing libc, except for one or two routines. Or maybe folks want an easier way to rebuild existing programs to run inside their shiny, new sandbox. Or they want easier kernel-bypass networking, or they want to use an in-process virtual filesystem, or any other thing which, today, commonly requires using new, incompatible APIs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >
> > > > > > > > > > > > > And just as a quick sanity check: I don't actually expect that this libc would be adopted primarily as a system libc... at least, not in the near term, by any existing platform. It should be complete and high enough quality to serve that purpose, though. It might make sense to use for a new platform that hasn't already cemented its ABI (c.f.: Alpine linux and musl), or for a vendor to adopt for an epoch release (I do think an ABI layer could be fashioned to make that feasible). But there is plenty of value in not relying on a system libc, and in being able to replace parts (work around bugs, provide a different implementation, etc.).
> > > > > > > > > > > > If I understand David correctly, he is essentially trying to say something which I have failed to convey so far: We will keep the ABI compatibility and stability questions open for now and let someone who cares for these issues come and provide/fill in these details in future.
> > > > > > > > > > > >
> > > > > > > > > > > > Does that make sense? Considering we (the team at Google I represent) are not particularly interested in these questions, we do not want to say/guess/promise about them. That said, we are also not preventing anyone from formalizing answers to these questions in future.
> > > > > > > > > > > I made two points:
> > > > > > > > > > >
> > > > > > > > > > > * Will this library have ABI compatibility with itself over time?
> > > > > > > > > > > * Since there's a desire to allow mixing this libc with another libc implementation, you'll likely need to be ABI compatible to do so. Alternatively, you can define boundaries which don't require this compatibility, and I'd like to hear more.
> > > > > > > > > > >
> > > > > > > > > > > I'd like them answered separately. I think it's critical to the success of this projects to have other non-Google-production participants, and I expect that some will care about ABI stability.
> > > > > > > > > > It seems like this line of discussion is quickly starting to go in circles... but maybe I'm misunderstanding your questions.
> > > > > > > > > >
> > > > > > > > > > > Will this library have ABI compatibility with itself over time?
> > > > > > > > > >
> > > > > > > > > > Can you be more specific? "ABI" is a pretty broad topic...
> > > > > > > > > >
> > > > > > > > > > For example, we have no control over whether a user of the library chooses an alternate calling convention, then attempts to link against a prebuilt archive. I strongly doubt this is the intent of your question, but it's an example of why I'm asking for narrowing... we need to be careful not to over-promise. "ABI" is a vague enough notion that simply saying "we will have a stable ABI" would be //vastly// overreaching.
> > > > > > > > > >
> > > > > > > > > > > Since there's a desire to allow mixing this libc with another libc implementation, you'll likely need to be ABI compatible to do so.
> > > > > > > > > >
> > > > > > > > > > (We don't believe this is entirely the case, at least for ELF.)
> > > > > > > > > >
> > > > > > > > > > > Alternatively, you can define boundaries which don't require this compatibility, and I'd like to hear more.
> > > > > > > > > >
> > > > > > > > > > This is the intent, and why the term "layer" is used in this bullet point:
> > > > > > > > > >
> > > > > > > > > > > Ability to layer this libc over the system libc [...]
> > > > > > > > > >
> > > > > > > > > > The specific mechanism, however, is something that ought to be addressed in a standalone design doc. We do have such a plan for ELF, but it is fairly intricate.
> > > > > > > > > >
> > > > > > > > > > In any event, the discussion of exactly how (and why) it would be implemented is something that is ... "nuanced," to put it lightly. It seems unnecessary to try to include such a deep technical specification in this particular, high-level document. In fact, I could even go further: trying to define "layering" will require us to anchor to the mechanism for a specific platform, or at least family of platforms. That seems antithetical to the current goals of this document (and, frankly, feels more like a wedge than an actual question... so perhaps this would be better deferred until Siva sends the specific design, after this doc is committed).
> > > > > > > > > > Can you be more specific? "ABI" is a pretty broad topic...
> > > > > > > > >
> > > > > > > > > libc++ has ABI guarantees, I'd expect you to figure out similar guarantees for a libc. Yes it's broad, and yes I'm asking that you figure out exactly what that means. Yes this includes LLVM libc version X -> Y as well as with other libc implementations (since interop with other libc is part of this proposal).
> > > > > > > > As you have said, there are two kinds of ABI compatibility that one could discuss here:
> > > > > > > >
> > > > > > > > 1. LLVM libc X versus LLVM libc Y - To begin with at least, we are not interested in such a question. For our use case, we are OK if LLVM libc X is never compatible with LLVM libc Y. However, I do understand that there will be users and developers out there for whom ABI compatibility matters. In this proposal, I want to keep the compatibility question open as I cannot guess for somebody else requiring such compatibility guarantees
> > > > > > > >
> > > > > > > > [You brought up the libc++ ABI guarantees. One cannot have a namespace based ABI management scheme for a libc as we cannot use namespaces in libc public headers.]
> > > > > > > >
> > > > > > > > 2. LLVM libc versus another system libc - In general, we are not interested in this question as well at this point. Ideally, LLVM libc should not have to make any guarantees about compatibility with another libc. But I think there probably is a misunderstanding of the bullet point "ability to layer LLVM libc over another libc." We do not mean that we want to be able to "mix" LLVM libc with another libc in general. Users of LLVM libc cannot for example open a file using LLVM libc and close it using another libc. The users have to consistently use entry points from LLVM libc alone. However, LLVM libc might call into the system libc for the implementation (and hence act as a layer between the user code and the system-libc). The translation from LLVM libc data structures to the system libc data structures happens inside of LLVM libc, invisible to the LLVM libc user.
> > > > > > > I think there is some miscommunication happening here....
> > > > > > >
> > > > > > > I think what Siva and David are trying to say is that we want people who have specific ABI compatibility needs to drive the ABI compatibility design. It's not any form of opposition to having such a proposal, it just should probably come from the people working in that space.
> > > > > > >
> > > > > > > I'll try to explain this from the perspective JF asked the question: libc++ has ABI guarantees. But originally, libc++ *only* had a stable ABI. There was no "unstable" ABI because the libc++ authors didn't need one. Later on, folks were trying to start using libc++ (us) and happened to want an "unstable" ABI that could easily track updates and fixes that weren't ABI compatible. At that point we worked w/ Marshall and others to come up with an approach that would work for our use case but also wouldn't get in the way of the stable libc++ ABI. I think it was a good thing that libc++ didn't try to design this system up-front. It would have been a waste of time given that there weren't any users involved with libc++ at the time to even consume it. But I also think it was really important to figure out a way to support that once there was a concrete use case in mind.
> > > > > > >
> > > > > > > I think we're basically seeing the reverse position here. Our use case is for an unstable ABI. We're totally open to having a stable ABI but as we don't have a concrete use case in mind, it seems like it would be better to wait for folks to have specific requirements for a stable ABI and then design a solution that works for them.
> > > > > > >
> > > > > > >
> > > > > > > Maybe "we are not interested in this question" is easily misinterpreted as "we are not interested in answering this question at all". Sorry if so, I think that's just a slight communication issue. I think another way to put it "we aren't the right people to figure out the core use cases that will drive any answer to this question, but we're happy for folks to propose or work toward that direction".
> > > > > > >
> > > > > > > Hopefully this helps address some of the confusion.
> > > > > > > I think what Siva and David are trying to say is that we want people who have specific ABI compatibility needs to drive the ABI compatibility design. It's not any form of opposition to having such a proposal, it just should probably come from the people working in that space.
> > > > > >
> > > > > > That's fine with me. What I'm asking is simple:
> > > > > >
> > > > > > 1. Try to leave a placeholder for ABI in the initial proposal.
> > > > > > 2. Reach out to people who've expressed interest in the RFC, in a way that would require ABI compatibility of some form. Most folks from the RFC are clearly not reading this review. Once it's in good shape, send a ping to the RFC. It seems like there's folks within Google interested in usecases of that sort. Dynamic linking certainly would entail an ABI. They might not want to work on it *now*, but at a minimum they might have insights to improve the placeholder above.
> > > > > > 3. Clarify that entire "interop with other libc" thing. It's pretty unclear to me, the way I've understood it sounds like there's ABI implications. If there isn't great!
> > > > > >
> > > > > > I am asking a bunch of questions, but you don't need to have perfect answers or to sign up for the work. However, I do think you want to put some thought into it so either interested folks can jump in now, or when interested folks come along everything is at least set up with ABI in mind.
> > > > > >
> > > > > > Simple examples:
> > > > > >
> > > > > > * Will you force inline functions, and leave other non-C API functions as the non-inlined ODR functions? If someone were to link them then these functions would now be your ABI. You'd need to think about what parts of them can / can't change. You certainly can't remove them... can you change their namespace (if you implement in C++)? Their parameters? Return type, etc...
> > > > > > * How will you handle syscalls? On some platforms you're not supposed to call them directly because they're not part of the platform API.
> > > > > > * How will you manage structs exposed through the C / POSIX API?
> > > > > >
> > > > > > Some of these can stay up in the air for a bit! But some would require moving your code around and setting stuff in stone to enforce into an ABI. What have other libc implementations done? If you think they did something silly, why?
> > > > > To be clear, I'm not asking that you dig into all the bullet points! Just the 3 numbered things. Leaving a placeholder is easy, sending a ping to the RFC is as well.
> > > > >
> > > > > Clarifying the interop thing seems easy too? I guess code might make what you have in mind obvious here... I think it's just been confusing to me and others.
> > > > I understand that everything you are saying and asking is very relevant. One thing not clear to me is, should this proposal block on getting answers to those questions even if the answers do not end up in this proposal? As David mentioned, we will have to prepare design docs for some of the questions you want to see answers for. Not to say we do not want to write up those design docs. On the contrary, we want to write and share those docs and get feedback from the community. But, should this proposal block on having an agreement on those docs?
> > > >
> > > > There are other kind of questions in your list which we think should be answered by people working and having expertise in those areas. I agree that such folk are probably not following this code review. But again, should this proposal block on having answers and agreements to those questions?
> > > >
> > > > If I have understood the process correctly, I can make requests for administrative aspects like mailing lists, SVN/Git repos etc. only after this proposal lands. So, I have been of the opinion that we will first land this proposal and then have discussions about the designs etc via code reviews in the new lists/repos.
> > > >
> > > > > Try to leave a placeholder for ABI in the initial proposal.
> > > >
> > > > The very first time you brought this up, I have asked you back as to how to write it so that it conveys that ABI questions are still open. As it is written now, all it says is that we want an ABI independent implementation as far as possible. That ABI refers to things like calling convention, stack layout etc. The ABI aspects you are asking questions about refers to a different kind of ABI about which the proposal is currently silent.
> > > Siva, I think JF's follow-up maybe clarified this somewhat, but having a placeholder around the fact that there remain open ABI issues seems pretty reasonable. Similarly to at least pinging the RFC thread and trying to write down some of the interop issues.
> > >
> > > Regarding your last point -- I think just adding a section that says there are open questions about the best way to build stable ABI versions of the libc (and the use cases associated with it) would be a decent start?
> > Ah, sorry! I missed JF's follow up which came in as I was typing my response in phabricator.
> I've been lurking -- this is an interesting discussion. If I read JF's comments correctly, I think what he's asking is that you add a section for open questions, one of which would be:
>
> - What ABI stability guarantees should this library provide?
>
> You shouldn't try to answer that question here and now, however this should be part of the proposal and it should be clear that the answer is to be determined by the community.
>
> Now, my personal stance is that a C Standard Library that has no ABI stability guarantees will miss out many potential users (and hence contributors). I think it should be simple to provide ABI stability for those that need it without impeding those that don't require it (such as Google). We do this in libc++, and a C++ library is immensely harder to keep ABI stable than a C library. However, we should probably move this discussion to the list so that others can chime in.
I have added a section below called "ABI stability" which says that the stability question is currently open and will be answered some time in future.
So, I ask the same question I did previously: does this proposal have to block on getting answers to these questions? I do not want this question of mine to be misinterpreted as a dismissal of the open questions. I totally understand why the questions are important and relevant. But, at the same time, I want to know the expectation here.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64939/new/
https://reviews.llvm.org/D64939
More information about the llvm-commits
mailing list