[LLVMdev] Bitcode format

Mon Sep 3 14:34:53 PDT 2007

Greetings,

I am working on a project (unrelated to LLVM) that needed a 
bytecode-like format.  I found Bitcode and it seems to fit the bill
really nicely.

I am writing an independent implementation of Bitcode in C (I really 
want to keep my runtime pure C).  In the process of doing this, I've
discovered a few things I want to ask about.

I notice the Bitcode format documentation [0] is somewhat incomplete --
there have been a few questions I had to resolve by looking at LLVM
source code.  This is totally understandable (I'm actually impressed
that it's documented so well for something so new), but I wonder if you
would accept a patch that clarified some things.  For example, the
document doesn't mention endianness at all, but from the source code I
discover that bytes are read in order and bits are read 
least-significant first.

I also have a few questions about the format:

- it appears that the only magic number in the file is 
  application-specific.  This seems unfortunate, because it means that
  application-neutral tools cannot be built that process bitcode files,
  since they could not reliably detect that the file is a bitcode file.
  It might seem like there is little room for application- neutral tools
  since almost all the data in the file is application-specific, but off
  the top of my head I can think of a few, like a tool to suggest
  abbreviations that would give a file better compression.

- the LLVM code assumes that several VBR fields can be at most 32 bits 
  (block ids, number of elements in an array, etc).  These assumptions
  seem quite reasonable: can they be considered part of the format and
  added to the document?

Cheers,
Josh

[0] http://llvm.org/releases/2.0/docs/BitCodeFormat.html