[PATCH] D12618: Reserve a vendor reserved block ID for bitcode

Fri Oct 9 10:17:30 PDT 2015

Hi,

Thanks Chandler for the follow-up.

> On Oct 7, 2015, at 6:09 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> I'm afraid I'm not exactly sure where we left off here, but I'll try to reboot the discussion a bit.
> 
> I see two problems that we really should solve:
> 
> 1) Bitcode from an old major version has a real chance of actually being parsed by a new LLVM and having different, wrong meaning. This is especially true right around the X.1 releases where we've removed a bunch of auto-upgrade functionality. I think we should have a *strong* defense against this, and that the correct way of doing it is to encode the "epoch" of bitcode, and reject unsupported epochs with a clear diagnostic. We'll usually only accept a single epoch, and for X.0 releases we'll support two.
> 
> 2) Within a bitcode epoch, we might have bitcode from a *future* LLVM with some incompatible change, or we might have temporary bugs in the bitcode (oops, I changed an enum used in the bitcode and it hasn't been reverted yet). I think this was Mehdi's original motivation.

Yes this is our motivation.

> The more I think about it, the more I like a simple and non-prescriptive approach here. I think the more complex ideas (including my own) aren't that likely to work, and are likely to be a lot of effort relative to the value provided. My suggested simple approach is essentially what Alex and others have described: a string field which is used to identify the producer of bitcode, so that *when the reader hits an error*, it can print out this human readable identification of the producer. This could contain the '--version' output or something similar. I think *enforcing* that these match (or even exist) isn't really useful because it makes it hard to write bitcode tests ore otherwise do things that would otherwise work today. I think it should be limited to just a diagnostic aid for humans.

The problem is that it is not (only) about the bitcode reader throwing an error, it can more subtle. 
For instance just last week Vedant had an issue that cost us a few hours of debug: we had a linker error with missing symbols. It turned out that the bitcode was produced by a very recent LLVM trunk, and the libLTO was an internal release forked from trunk three months ago. The reader in libLTO was not throwing an error and the optimizer succeeded and generated the optimized bitcode for the linker, but some functions were just “lost" in the process.

What we want is that if a developer is using a static library that contains bitcode for his project but his using an older toolchain than the library producer, we need to be able to detect this and ask the developer to upgrade his toolchain.

Now what we do with the vendor block id is close to the “optional identification string” you are suggesting, except that our toolchains would parse it and enforce the version.
I can replace the vendor_id proposed in this patch with an optional “identification” block that would only contain an opaque string, the open-source reader can use this identification string when reporting an error and our internal versions could act as we find appropriate.

— 
Mehdi

> 
> 
> Do these two changes make sense?
> 
> 
> If so, I suggest we add support for writing #1 now at 0, and verify in the reader that it either is missing or zero. With 4.0 we can increment it to epoch 1, and with 4.1 we can remove support for either 0 or missing.
> 
> And I suggest adding the optional identification string, and filling it out with the version information when using Clang or the basic tools.
> 
> 
> This seem like a reasonable path?
> 
> On Wed, Sep 16, 2015 at 3:31 PM Alex Rosenberg via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
> Yes, something like that. I don't feel a need to bikeshed the shape of it, but it should be documented and vendor IDs of whatever form should be assigned/committed into the source. I'd suggest that each vendor get only one ID and subsequent data be used to differentiate internal cases. We will want to discourage cowboy corporations just picking their own IDs and not registering them with the project.
> 
> Alex
> 
> > On Sep 17, 2015, at 2:39 AM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
> >
> >
> >> On Sep 10, 2015, at 7:47 PM, Alex Rosenberg <alexr at leftfield.org <mailto:alexr at leftfield.org>> wrote:
> >>
> >> alexr added a subscriber: alexr.
> >> alexr added a comment.
> >>
> >> So, we went through this last year.... http://lists.llvm.org/pipermail/llvm-dev/2014-November/078498.html <http://lists.llvm.org/pipermail/llvm-dev/2014-November/078498.html>
> >>
> >> If we add a vendor block, we should at a minimum define a way to identify which vendor it is, in case we are handed bitcode from somewhere else.
> >> Bike shed away, but there should be something simple enough at the start that can tell us we're looking at the wrong thing.
> >
> >
> > The way I do it internally right now is by using a “magic” at the beginning of the block. We could ask “by convention” implementer to always write a (32bits?) unique id at the beginning of the block, but it would be up to the vendor to follow this convention. Does this match what you have in mind?
> >
> > ―
> > Mehdi
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/0edf1261/attachment.html>