[llvm-dev] Experiments with a replacement for the bitcode format
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Wed Sep 28 14:40:28 PDT 2016
On Wed, Sep 28, 2016 at 2:23 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
> On Sep 28, 2016, at 2:03 PM, Xinliang David Li <xinliangli at gmail.com>
> On Tue, Sep 27, 2016 at 12:56 PM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> Disclaimer: I have no plan to break 3.x bitcode compatibility in any way.
>> I’m experimenting with ideas to improve the current bitcode serialization
>> and deserialization in general.
>> I’m interested in improving the speed and the flexibility of the
>> reader/writer, as well as making the code easier to deal with.
>> My current griefs against our bitcode format are mainly the lack of
>> built-in random access (which prevents efficient lazy loading), and the
>> implementation which does not help layering the high-level structure of the
>> serialization from what is emitted (and the compression part). On the speed
>> side, the bit level management of record has also non-negligible overhead
>> right now.
>> The two key issues with the current format that prevents random access
> What are most common IR entities that need fast random accesses?
> It really depends on the use case: if you want to load a full module at
> once, we can get away with all but Metadatas.
> The only things that have cycles are named types and instructions, but
> these have fast RAUW support anyway and the writer can lay them in order so
> that most of the time operands are parsed before the user.
> For lazy-load or selective loading, random access helps a lot for
> everything but instructions (if you materialize a function then you read
> all of the instructions anyway).
Ok. The reason I asked the question is that by understanding the most
important use cases, it is possible to come up with a design that maximize
the benefit with very low overhead (downside e.g, size increase).
>> - Abbrev are lazily emitted in the stream and can be interleaved with
>> record. You have to scan the whole block to get the context for a record.
>> - Because records have variable size, you need to parse then in sequence
>> to find the “interesting” one.
>> I just finished a quick prototype using Google Flatbuffers (
>> http://google.github.io/flatbuffers/ ). This is a solution that provide
>> “zero-copy” serialization/deserialization, by avoiding the use of variable
>> length integer and aligning appropriately every field. The structure of the
>> serialization is described using a DSL, and the
>> serialization/deserialization code is generated from this description.
>> My prototype is a naive description of the IR, but complete enough to
>> handle all the .ll files present in the LLVM validation, and rountrip a
>> 800MB file from the LTO link of Clang itself.
>> The results is that the extra flexibility given by the random access is
>> helping beyond the lazy loading: for instance loading metadata is an issue
>> because of forward reference usually, while the random access allows to
>> “recurse” on the operand when loading a metadata. This simplifies the
>> algorithms both on the reader and the writer.
>> On the drawback, Flatbuffers seems to target small use cases (not a lot
>> of different type of records or large buffers). The implementation is not
>> totally optimized, for example 40% of the writer time is spend on “vtables”
>> (equivalent of our abbrev) deduplication on some of my test cases.
>> Also multiple possible obvious size optimizations are not possible with
>> the current Flatbuffers, currently my prototype emit files that are ~2
>> times larger that our bitcode format. By working on the schema I probably
>> can get this to down 50%, but that may still be too high for us (especially
>> considering that some of these limits are inherent to Flatbuffer
>> implementation itself).
> Do you have read/write time comparison data (for sequential and random
> Keeping in mind that I have a naive implementation (both size and speed):
> the writer is 10% slower (but spend 40% of its time in the
> not-so-smart-implementation of flatbuffers vtable deduplication).
> The reader can be ~40% faster for loading a full module (LTO merged module
> of clang with debug info), but even a lot faster when you want to load only
> one function from the module.
> The current lazy-loading scheme still forces us to create a declaration
> for every global/alias/function and load all the global constants
> expressions. (Duncan did a bunch of work this year to help ThinLTO on this
> issue by storing more record nested inside the function blocks and not at
> the global level).
With a un-tuned protoype, the result sounds pretty good to me. Besides,
reader performance is presumably more important than writer performance.
>> So my current thoughts on the topic is that we could still learn from
>> other recently developed format and bring part of the ideas into our format
>> where it makes sense. For example we don’t need random access in the
>> instruction array, but it’d be appreciable for the metadata though. We'd
>> want zero-copy for a symbol table (see https://llvm.org/bugs/show_bug
>> .cgi?id=27551 ) but not for the instruction stream.
>> I’m also interested in what are other people’s stakes on the bitcode
>> format (other than backward compatibility), what tradeoff are you ready to
>> make on the size vs speed/flexibility for example?
> A hybrid format (using fixed length representation or indirection for
> parts that need fast random access, the rest uses compression) seems
> promising. Why can't backward compatibility be kept?
> I don’t plan to lose backward compatibility in any way. Was you question
> intended to be "why can’t backward compatibility be dropped?" or did you
> misread my disclaimer at the top of the email?
no, not proposing drop backward compatibility --- I simply misread your
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev