[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

Mircea Trofin via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 28 11:57:51 PDT 2020


On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at gmail.com> wrote:

> You should probably pull in some folks who implemented/maintain the
> feature for Darwin.
>
> I guess they aren't linking this info, but only communicating in the
> object file between tools - maybe they flag these sections (either in the
> object, or by the linker) as ignored/dropped during linking. That semantic
> could be implemented in ELF too by marking the sections SHF_IGNORED or
> something (same-file split DWARF uses this technique).
>
> So maybe the goal/desire is to have a different semantic, rather than the
> equivalent semantic being different on ELF compared to MachO.
>
> So if it's a different semantic - yeah, I'd guess a flag that prefixes the
> module metadata with a length would make sense, then it can be linked
> naturally on any platform. (if the "don't link these sections" support on
> Darwin is done by the linker hardcoding the section name - then maybe this
> flag would also put the data in a different section that isn't linker
> stripped on Darwin, so users interested in getting everything linked
> together can do so on any platform)
>
> But if this data is linked, then it'd be hard to know which command line
> goes with which module, yes? So maybe it'd make sense then to have the
> command line as a header before the module, in the same section. So they're
> kept together.
>
This last point was my follow-up :)


>
> On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Thanks, Sean, Steven,
>>
>> to explore this a bit further, are there currently users for non-Darwin
>> cases? I wonder if it would it be an issue if we inserted markers in the
>> section (maybe as an opt-in, if there were users), such that, when
>> concatenated, the resulting section would be self-describing, for a
>> specialized reader, of course - basically, achieve what Sean described, but
>> "by design".
>>
>> For instance, each .o file could have a size, followed by the payload
>> (maybe include in the payload the name of the module, too; maybe compress
>> it, too). Same for the .llvmcmd case.
>>
>> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at illinois.edu>
>> wrote:
>>
>>> Hi Mircea,
>>>
>>> If you use an ordinary linker that concatenates .llvmbc sections, you
>>> can use this code to get the size of each bitcode module. As far as I know,
>>> there's no clean way to separate the .llvmcmd sections without making
>>> assumptions about what options were used.
>>>
>>> // Given a bitcode file followed by garbage, get the size of the actual
>>> // bitcode. This only works correctly with some kinds of garbage (in
>>> // particular, it will work if the bitcode file is followed by zeros, or
>>> if
>>> // it's followed by another bitcode file).
>>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
>>>   const unsigned char *BufPtr =
>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());
>>>   const unsigned char *EndBufPtr =
>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
>>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
>>>     const unsigned char *FixedBufPtr = BufPtr;
>>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
>>>       report_fatal_error("Invalid bitcode wrapper");
>>>     return EndBufPtr - BufPtr;
>>>   }
>>>
>>>   if (!isRawBitcode(BufPtr, EndBufPtr))
>>>     report_fatal_error("Invalid magic bytes; not a bitcode file?");
>>>
>>>   BitstreamCursor Reader(Buffer);
>>>   Reader.Read(32); // skip signature
>>>   while (true) {
>>>     size_t EntryStart = Reader.getCurrentByteNo();
>>>     BitstreamEntry Entry =
>>>         Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
>>>     if (Entry.Kind == BitstreamEntry::SubBlock) {
>>>       if (Reader.SkipBlock())
>>>         report_fatal_error("Invalid bitcode file");
>>>     } else {
>>>       // We must have reached the end of the module.
>>>       return EntryStart;
>>>     }
>>>   }
>>> }
>>>
>>> Sean
>>>
>>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
>>>
>>> Hi Mircea
>>>
>>> From the RFC you mentioned, that is a Darwin specific implementation,
>>> which later got extended to support other targets. The main use case for
>>> the embed bitcode option is to allow compiler passing intermediate IR and
>>> command flags in the object file it produced for later use. For Darwin, it
>>> is used for bitcode recompilation, and some might use it to achieve other
>>> goals.
>>>
>>> In order to use this information properly, you needs to have tools that
>>> understand the layout and sections for embedded bitcode. You can't just use
>>> an ordinary linker, because like you said, an ELF linker will just append
>>> the bitcode. Depending on what you are trying to achieve, you need to
>>> implement the downstream tools, like linker, binary analysis tools, etc. to
>>> understand this concept.
>>>
>>> Steven
>>>
>>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to understand how .llvmbc and .llvmcmd fit into an end-to-end
>>> story. From the RFC
>>> <http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>,
>>> and reading through the implementation, I'm piecing together that the goal
>>> was to enable capturing IR right after clang and before passing it to
>>> LLVM's optimization passes, as well as the command line options needed for
>>> later compiling that IR to the same native object it was compiled to
>>> originally (with the same compiler).
>>>
>>> Here's what I don't understand: say you have a.o and b.o compiled with
>>> -fembed-bitcode=all. They are linked into a binary called my_binary. How do
>>> you re-create the corresponding IR for modules a and b (let's call them
>>> a.bc and b.bc), and their corresponding command lines? From what I can
>>> tell, the linker just concatenates the IR for a and b in my_binary's
>>> .llvmbc, and the same for the command line in .llvmcmd. Is there a
>>> separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1 could
>>> be that separator, what about the .llvmbc part? The magic number?
>>>
>>> Thanks!
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> *Attachments:*
>>>
>>>    - ATT00001.txt
>>>
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200828/466c2ab9/attachment.html>


More information about the llvm-dev mailing list