[llvm-dev] RFC: APIs for bitcode files containing multiple modules

Fri Oct 28 14:25:51 PDT 2016

> On Oct 28, 2016, at 2:21 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> 
>> On Oct 28, 2016, at 2:16 PM, Will Dietz <willdtz at gmail.com <mailto:willdtz at gmail.com>> wrote:
>> 
>> On Fri, Oct 28, 2016 at 2:06 PM, Peter Collingbourne <peter at pcc.me.uk <mailto:peter at pcc.me.uk>> wrote:
>>> On Fri, Oct 28, 2016 at 6:11 AM, Will Dietz <willdtz at gmail.com <mailto:willdtz at gmail.com>> wrote:
>>>> 
>>>> On Wed, Oct 26, 2016 at 2:04 PM, Peter Collingbourne via llvm-dev
>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>> On Tue, Oct 25, 2016 at 8:36 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>>
>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Oct 25, 2016, at 6:28 PM, Peter Collingbourne <peter at pcc.me.uk <mailto:peter at pcc.me.uk>>
>>>>>> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> As mentioned in my recent RFC entitled "RFC: a more detailed design for
>>>>>> ThinLTO + vcall CFI" I would like to introduce the ability for bitcode
>>>>>> files
>>>>>> to contain multiple modules. In https://reviews.llvm.org/D24786 <https://reviews.llvm.org/D24786> I took
>>>>>> a
>>>>>> step towards that by proposing a change to the module format so that
>>>>>> the
>>>>>> block info block is stored at the top level. The next step is to think
>>>>>> about
>>>>>> what the API would look like for reading and writing multiple modules.
>>>>>> 
>>>>>> Here's what I have in mind. To create a multi-module bitcode file, you
>>>>>> would create a BitcodeWriter object and add modules to it:
>>>>>> 
>>>>>> BitcodeWriter W(OS);
>>>>>> W.addModule(M1);
>>>>>> W.addModule(M2);
>>>>>> W.write();
>>>>>> 
>>>>>> 
>>>>>> That requires the two modules to lives longer than the bitcode write,
>>>>>> the
>>>>>> API could be:
>>>>>> 
>>>>>> BitcodeWriter W(OS);
>>>>>> W.writeModule(M1);
>>>>>> // delete M1
>>>>>> // ...
>>>>>> // create M2
>>>>>> W.writeModule(M2);
>>>>>> 
>>>>>> (Maybe you had this in mind, but the API naming didn’t reflect it so
>>>>>> I’m
>>>>>> not sure).
>>>>> 
>>>>> 
>>>>> In the API I prototyped, I took the maximum BitsRequiredForTypeIndices
>>>>> value
>>>>> from all the modules, and used it to produce the abbreviations for the
>>>>> top/
>>>>> level block info block (without this I was seeing "Unexpected abbrev
>>>>> ordering!" errors in the bitcode writer as a result of emitting the
>>>>> "same"
>>>>> abbreviation multiple times). That would have required us to keep the
>>>>> modules around until the call to write(). However, let me revisit this,
>>>>> because it does not seem necessary (i.e. we can just continue to emit
>>>>> block
>>>>> info blocks within the module block except with different abbreviation
>>>>> numbers for each module).
>>>>>> 
>>>>>> Reading a multi-module bitcode file would be supported with a
>>>>>> BitcodeReader class. Each of the functional reader APIs in
>>>>>> ReaderWriter.h
>>>>>> would have a member function on BitcodeReader. We would also have a
>>>>>> next()
>>>>>> member function which would move to the next module in the file. For
>>>>>> example:
>>>>>> 
>>>>>> BitcodeReader R(MBRef);
>>>>>> Expected<bool> B = R.hasGlobalValueSummary();
>>>> 
>>>> What's this used for?
>>> 
>>> 
>>> This would be the equivalent to the existing llvm::hasGlobalValueSummary()
>>> function, which currently controls whether we compile a module with regular
>>> LTO or with ThinLTO.
>>> 
>>>> Would there be a "readGlobalValueSummary()"
>>>> similar to function summaries?
>>> 
>>> 
>>> There would be a getModuleSummaryIndex() which again would be similar to
>>> llvm::getModuleSummaryIndex(). Note that the module summary already covers
>>> all global values, not just functions.
>>> 
>>>>>> std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the
>>>>>> first module
>>>>>> R.next();
>>>>>> std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load
>>>>>> the
>>>>>> second module
>>>> 
>>>> I'm very excited about the idea of storing multiple modules in a
>>>> bitcode file, and the (thin)LTO and CFI goodness you're building using
>>>> it.
>>>> 
>>>> I have a few questions about where you're going if you don't mind--and
>>>> it's related to the API in that it's awfully hard to judge an API
>>>> without knowing what it's expected to be used for or what the
>>>> underlying data represents.
>>>> 
>>>> On that-- I'm sorry if I've missed this information, but reading
>>>> through your RFC's and posts I'm not finding the answer.
>>>> Is there a definition/explanation of what it means to have a bitcode
>>>> file containing multiple modules?
>>>> 
>>>> Is this a storage optimization where each module is what today is an
>>>> "llvm::Module" but we're encoding them into a single file for
>>>> efficiency/convenience reasons?
>>> 
>>> 
>>> Yes, each module would be an llvm::Module. This is more for convenience
>>> reasons -- it's the simplest way to split modules that use CFI into a
>>> regular LTO part and a ThinLTO part (as described in the RFC entitled "RFC:
>>> a more detailed design for ThinLTO + vcall CFI") while storing the entire
>>> compiled translation unit in a single file.
>>> 
>> 
>> Hmm, interesting.  Thank you for the explanation.
>> 
>> This seems to be closer to partitioning a single Module than
>> supporting multiple modules (at least not yet).
>> Does that seem accurate?
> 
> The use case is portioning a single module. We should have any other assumption at this level (bitcode).

I think my sentence is not well written, let me retry: “The CFI use case here is partitioning a single module in two. But at this level (bitcode), we should not bake such assumptions."

> If you want to stuck multiple version of the same module for various architecture, that’s fine. You can have your own tooling to load the right module for a given architecture.
> 
> 
>> If so maybe the API should be geared towards that--allow
>> "partition-aware" clients to read the pieces individually while
>> transparently treating the overall file as a single Module for
>> existing clients.
>> Just a thought, perhaps this wouldn't work for your use case?
> 
> While this could work for this use case, this would make it either very complex in the bitcode itself, or very inefficient for loading all as single module.
> 
>> 
>> Anyway I actually am very interested in support for multiple modules,
>> my use case being for use in shipping software in IR form as part of
>> the ALLVM project.  Hence questions about things like linker semantics
>> and such.
> 
> Right, I’m interested in this as well, and my vision in general is to try to build `basic blocks`  as neutral as possible, so that it is easier reuse them for such cases as ALLVM.
> 
> Hope this help.
> 
> — 
> Mehdi
> 
> 
>> 
>> Don't mean to burden you with accommodating the use-cases of everyone
>> else (like myself),
>> I guess I was just was surprised to see the bitcode format extended in
>> this way without an explicit discussion of the bigger picture--
>> what this was intended to be used for or why it was necessary, where
>> it was going... :).  Mostly because as you say it seems rather useful
>> for other parties (heterogeneous, for example) but I suppose we/they
>> can chime in and help refine the details later on once these bits are
>> committed :).
>> 
>> Thank you for your explanation, very much appreciated :).
>> 
>>>> If so, can these modules have different triples?
>>> 
>>> 
>>> That would certainly be possible in principle, but it's not part of my use
>>> case. I'd imagine that another potential use case for this could be to allow
>>> for LTO when targeting heterogeneous architectures (e.g. CUDA/OpenMP), but
>>> I'm not sure about the specifics of how that could work.
>>> 
>>>> 
>>>> Different ("conflicting") definitions for a global?
>>> 
>>> 
>>> In principle such inputs would be rejected by the linker with a duplicate
>>> symbol error. That might not be the appropriate thing to do in the
>>> heterogeneous case though.
>> 
>> Yeah, it seemed unclear what this would "mean" and I suppose for now
>> is simply something folks can interpret/handle however makes sense for
>> their use case :).
>> 
>>> 
>>>> There are also multiple tools that take bitcode as input, and
>>>> currently expect a single module.
>>>> Will these be made to reject multiple-module bitcode, and if not is
>>>> the plan to extend tools to handle multiple-module files?
>>> 
>>> 
>>> For testing purposes I was planning to extend llvm-dis (and possibly opt) to
>>> take a flag specifying a module index, and introduce an llvm-join tool which
>>> could be used to create a bitcode from multiple inputs.
>>> 
>> 
>> Awesome! I'm not sure how important it is but it seems that it should
>> be made an error to ignore part of a bitcode file?
>> (Shouldn't llvm-nm print vtable bits?)
>> 
>>> The other tools probably don't need to know about this and could just read
>>> the first module.
>>> 
>>>> Beyond the random access suggestion (+1) and lifetime comments, it
>>>> seems like there should be a way to reason about the contents of these
>>>> modules--names, identifiers, flags, *something* so that "load the
>>>> first module lazily and the second eagerly" can become "load the
>>>> module containing my CFI information eagerly but the rest lazily" or
>>>> something, or at least to check that this file was created using
>>>> -fsanitize=cfi and not something else.
>>> 
>>> 
>>> Right, this is the sort of functionality that would be provided by functions
>>> such as hasGlobalValueSummary().
>> 
>> Ah, neat.  I'll look into that, since apparently it answers many of my
>> questions :D.  Sorry for the trouble :).
>> 
>> Thanks again, happy LLVM'ing...
>> 
>> ~Will
>> 
>>> 
>>> Thanks,
>>> --
>>> Peter
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/954357db/attachment.html>