[PATCH] D12618: Reserve a vendor reserved block ID for bitcode

Fri Oct 9 11:04:02 PDT 2015

On Fri, Oct 9, 2015 at 10:17 AM Mehdi Amini <mehdi.amini at apple.com> wrote:

> Hi,
>
> Thanks Chandler for the follow-up.
>
> On Oct 7, 2015, at 6:09 PM, Chandler Carruth <chandlerc at google.com> wrote:
>
> I'm afraid I'm not exactly sure where we left off here, but I'll try to
> reboot the discussion a bit.
>
> I see two problems that we really should solve:
>
> 1) Bitcode from an old major version has a real chance of actually being
> parsed by a new LLVM and having different, wrong meaning. This is
> especially true right around the X.1 releases where we've removed a bunch
> of auto-upgrade functionality. I think we should have a *strong* defense
> against this, and that the correct way of doing it is to encode the "epoch"
> of bitcode, and reject unsupported epochs with a clear diagnostic. We'll
> usually only accept a single epoch, and for X.0 releases we'll support two.
>
> 2) Within a bitcode epoch, we might have bitcode from a *future* LLVM with
> some incompatible change, or we might have temporary bugs in the bitcode
> (oops, I changed an enum used in the bitcode and it hasn't been reverted
> yet). I think this was Mehdi's original motivation.
>
>
> Yes this is our motivation.
>
> The more I think about it, the more I like a simple and non-prescriptive
> approach here. I think the more complex ideas (including my own) aren't
> that likely to work, and are likely to be a lot of effort relative to the
> value provided. My suggested simple approach is essentially what Alex and
> others have described: a string field which is used to identify the
> producer of bitcode, so that *when the reader hits an error*, it can print
> out this human readable identification of the producer. This could contain
> the '--version' output or something similar. I think *enforcing* that these
> match (or even exist) isn't really useful because it makes it hard to write
> bitcode tests ore otherwise do things that would otherwise work today. I
> think it should be limited to just a diagnostic aid for humans.
>
>
> The problem is that it is not (only) about the bitcode reader throwing an
> error, it can more subtle.
> For instance just last week Vedant had an issue that cost us a few hours
> of debug: we had a linker error with missing symbols. It turned out that
> the bitcode was produced by a very recent LLVM trunk, and the libLTO was an
> internal release forked from trunk three months ago. The reader in libLTO
> was not throwing an error and the optimizer succeeded and generated the
> optimized bitcode for the linker, but some functions were just “lost" in
> the process.
>
> What we want is that if a developer is using a static library that
> contains bitcode for his project but his using an older toolchain than the
> library producer, we need to be able to detect this and ask the developer
> to upgrade his toolchain.
>
> Now what we do with the vendor block id is close to the “optional
> identification string” you are suggesting, except that our toolchains would
> parse it and enforce the version.
> I can replace the vendor_id proposed in this patch with an optional
> “identification” block that would only contain an opaque string, the
> open-source reader can use this identification string when reporting an
> error and our internal versions could act as we find appropriate.
>

Ok... At first I thought you meant *require* the identification block to
parse, but from chatting with you on IRC it seems you mean "if* it parses
in a particular way, take special actions.

I think this is fine. I'm not super thrilled about people parsing this and
doing weird stuff, but I guess that's inevitable.

The only additional thing I would like to highlight is that I at least
would be *super* interested in enhancing the bitcode reader and verifier to
prevent scenarios such as you describe. While we certainly can't make
bitcode forward compatible (backwards is hard enough!) I think we could do
a reasonable job of plugging most of the easy-to-hit holes where the
bitcode is read, verified, and believed to be correct. For example, we
should reject intrinsics that we don't recognize, etc.

It would be nice to see work in this area, but if you are parsing version
numbers out of strings, it may never come up going forward. I'm pretty sure
it will come up for others, and so I'd like to get LLVM into a less brittle
state.

-Chandler

> —
> Mehdi
>
>
>
>
> Do these two changes make sense?
>
>
> If so, I suggest we add support for writing #1 now at 0, and verify in the
> reader that it either is missing or zero. With 4.0 we can increment it to
> epoch 1, and with 4.1 we can remove support for either 0 or missing.
>
> And I suggest adding the optional identification string, and filling it
> out with the version information when using Clang or the basic tools.
>
>
> This seem like a reasonable path?
>
> On Wed, Sep 16, 2015 at 3:31 PM Alex Rosenberg via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Yes, something like that. I don't feel a need to bikeshed the shape of
>> it, but it should be documented and vendor IDs of whatever form should be
>> assigned/committed into the source. I'd suggest that each vendor get only
>> one ID and subsequent data be used to differentiate internal cases. We will
>> want to discourage cowboy corporations just picking their own IDs and not
>> registering them with the project.
>>
>> Alex
>>
>> > On Sep 17, 2015, at 2:39 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> >
>> >
>> >> On Sep 10, 2015, at 7:47 PM, Alex Rosenberg <alexr at leftfield.org>
>> wrote:
>> >>
>> >> alexr added a subscriber: alexr.
>> >> alexr added a comment.
>> >>
>> >> So, we went through this last year....
>> http://lists.llvm.org/pipermail/llvm-dev/2014-November/078498.html
>> >>
>> >> If we add a vendor block, we should at a minimum define a way to
>> identify which vendor it is, in case we are handed bitcode from somewhere
>> else.
>> >> Bike shed away, but there should be something simple enough at the
>> start that can tell us we're looking at the wrong thing.
>> >
>> >
>> > The way I do it internally right now is by using a “magic” at the
>> beginning of the block. We could ask “by convention” implementer to always
>> write a (32bits?) unique id at the beginning of the block, but it would be
>> up to the vendor to follow this convention. Does this match what you have
>> in mind?
>> >
>> > ―
>> > Mehdi
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/782765a8/attachment.html>