[LLVMdev] Using the unused "version" field in the bitcode wrapper (redux)

Robinson, Paul Paul_Robinson at playstation.sony.com
Fri Nov 21 13:31:26 PST 2014


> Reading the bitcode reader while working on another issues I found
> that we already have a version in the bitcode itself (not the darwin
> wrapper) and it is used! It is stored with the
> bitc::MODULE_CODE_VERSION. It is used to select relative ids, which
> impacts the entire bitcode, and so it makes sense to be based on a
> version.
> 
> If we ever have a new feature that could not be otherwise detected,
> bumping the number is a reasonable way of making sure old versions of
> llvm will reject new bitcode instead of misinterpreting it.

Right, that version number is used to resolve *ambiguities* in how to
interpret some chunk of bitcode.  It is not a generic bitcode version
scheme, because most bitcode format changes involve things like adding
new operands or opcodes, which are easily identified without needing
an explicit version number.

The scenario I am most concerned about is this:

- We as a vendor publish toolchain #12 based on SVN r250000.
- During subsequent LLVM development, changes happen (!).
  For example, a new key letter 'g' in the Data Layout.  This is
  not a bitcode ambiguity so MODULE_CODE_VERSION is unchanged.
- We as a vendor publish toolchain #13 based on SVN r300000.
- Some middleware provider publishes libIncrediblyUseful.bc built
  using spiffy new toolchain #13.
- Some hapless game developer tries to use libIncrediblyUseful.bc
  but is still on toolchain #12. This causes an error during some
  LTO build phase, of course; the question is, what kind of error
  and how does Hapless Game Developer know what to do?

We as compiler developers want to see something along the lines of
"unknown data layout specifier."  That kind of diagnostic is seriously 
helpful to the LLVM community, because it describes the actual problem.

This does *nothing* for Hapless Game Developer.  HGD wants to see
"this bitcode file was generated by a newer version, I don't understand
how to interpret it" because _that's_ the actual problem.

The "actual problem" is context dependent.  How can we account for that?

Proposed solution:

Whether to emit a bitcode wrapper becomes a target-dependent predicate.
Bitcode is written by Module, which already has target info attached,
so it's a matter of picking some convenient place to keep that info.
Initially only Darwin would do this, but it would be a step up from the
current explicit triple check.

The wrapper has a standard header, same as the current header:
- Magic
- Version
- BitcodeOffset
- BitcodeSize
The target can supply additional data to put after the header (and
before the actual bitcode starts). Darwin would supply the CPUType
field like it does now.

This is 100% compatible with what exists today, but will be easy to
extend for (ahem) other vendors who want wrappers.

Any vendor who supports bitcode as a long-lived on-disk format should
specify that it wants a wrapper.  It is the vendor's responsibility
to provide sensible version numbers for successive toolchain releases.

The LLVM project does not specify how to come up with version numbers.
We default to zero (so Darwin automatically gets its historical value).

NOTE: This solution explicitly does NOT solve the "bitcode must be
understandable to older toolchains" problem.  What it DOES solve is the
"older toolchains must provide an easily understood diagnostic when
presented with newer bitcode files" problem.

Vendor toolchain release scenarios:

1) Releasing based on arbitrary trunk revisions.
The vendor's toolchain release number, encoded in to 32 bits, is
likely to serve well as the bitcode wrapper version number.  If you
release strictly from trunk (not release branches) then the SVN
revision number from the LLVM repo can also serve this purpose.

2) Releasing strictly based on LLVM releases.
Using the LLVM version number, encoded into 32 bits, is a pretty
reasonable alternative.  Even if you release multiple toolchains
from the same LLVM release, the bitcode formats will be the same,
so the bitcode wrapper version number can also be the same.

--paulr

P.S. I think the illustrative example of a new DataLayout specifier
would reach an llvm_unreachable, and not emit a proper diagnostic
at all.  This is part of the generic diagnostics-from-LLVM problem.





More information about the llvm-dev mailing list