[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

Fri Aug 28 14:16:17 PDT 2020

On 2020-08-28, Mircea Trofin via llvm-dev wrote:
>On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at gmail.com> wrote:
>
>> You should probably pull in some folks who implemented/maintain the
>> feature for Darwin.
>>
>> I guess they aren't linking this info, but only communicating in the
>> object file between tools - maybe they flag these sections (either in the
>> object, or by the linker) as ignored/dropped during linking. That semantic
>> could be implemented in ELF too by marking the sections SHF_IGNORED or
>> something (same-file split DWARF uses this technique).

The .llvmbc / .llvmcmd section does not have the SHF_EXCLUDE flag. It
will be retained in the linked image.

>> So maybe the goal/desire is to have a different semantic, rather than the
>> equivalent semantic being different on ELF compared to MachO.
>>
>> So if it's a different semantic - yeah, I'd guess a flag that prefixes the
>> module metadata with a length would make sense, then it can be linked
>> naturally on any platform. (if the "don't link these sections" support on
>> Darwin is done by the linker hardcoding the section name - then maybe this
>> flag would also put the data in a different section that isn't linker
>> stripped on Darwin, so users interested in getting everything linked
>> together can do so on any platform)
>>
>> But if this data is linked, then it'd be hard to know which command line
>> goes with which module, yes? So maybe it'd make sense then to have the
>> command line as a header before the module, in the same section. So they're
>> kept together.
>>
>This last point was my follow-up :)

A module has a source_filename field.

clang -fembed-bitcode=all -c d/a.c
llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null
llvm-dis < a.bc => source_filename = "d/a.c"

The missing piece is a mechanism to extract a module from concatenated
bitcode (llvm-dis supports multi-module bitcode but not concatenated
bitcode https://reviews.llvm.org/D70153). I'll be happy to look into it:)

---

.llvmcmd may need the source file to be more useful.

>
>>
>> On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Thanks, Sean, Steven,
>>>
>>> to explore this a bit further, are there currently users for non-Darwin
>>> cases? I wonder if it would it be an issue if we inserted markers in the
>>> section (maybe as an opt-in, if there were users), such that, when
>>> concatenated, the resulting section would be self-describing, for a
>>> specialized reader, of course - basically, achieve what Sean described, but
>>> "by design".
>>>
>>> For instance, each .o file could have a size, followed by the payload
>>> (maybe include in the payload the name of the module, too; maybe compress
>>> it, too). Same for the .llvmcmd case.
>>>
>>> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at illinois.edu>
>>> wrote:
>>>
>>>> Hi Mircea,
>>>>
>>>> If you use an ordinary linker that concatenates .llvmbc sections, you
>>>> can use this code to get the size of each bitcode module. As far as I know,
>>>> there's no clean way to separate the .llvmcmd sections without making
>>>> assumptions about what options were used.
>>>>
>>>> // Given a bitcode file followed by garbage, get the size of the actual
>>>> // bitcode. This only works correctly with some kinds of garbage (in
>>>> // particular, it will work if the bitcode file is followed by zeros, or
>>>> if
>>>> // it's followed by another bitcode file).
>>>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
>>>>   const unsigned char *BufPtr =
>>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());
>>>>   const unsigned char *EndBufPtr =
>>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
>>>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
>>>>     const unsigned char *FixedBufPtr = BufPtr;
>>>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
>>>>       report_fatal_error("Invalid bitcode wrapper");
>>>>     return EndBufPtr - BufPtr;
>>>>   }
>>>>
>>>>   if (!isRawBitcode(BufPtr, EndBufPtr))
>>>>     report_fatal_error("Invalid magic bytes; not a bitcode file?");
>>>>
>>>>   BitstreamCursor Reader(Buffer);
>>>>   Reader.Read(32); // skip signature
>>>>   while (true) {
>>>>     size_t EntryStart = Reader.getCurrentByteNo();
>>>>     BitstreamEntry Entry =
>>>>         Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
>>>>     if (Entry.Kind == BitstreamEntry::SubBlock) {
>>>>       if (Reader.SkipBlock())
>>>>         report_fatal_error("Invalid bitcode file");
>>>>     } else {
>>>>       // We must have reached the end of the module.
>>>>       return EntryStart;
>>>>     }
>>>>   }
>>>> }
>>>>
>>>> Sean
>>>>
>>>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
>>>>
>>>> Hi Mircea
>>>>
>>>> From the RFC you mentioned, that is a Darwin specific implementation,
>>>> which later got extended to support other targets. The main use case for
>>>> the embed bitcode option is to allow compiler passing intermediate IR and
>>>> command flags in the object file it produced for later use. For Darwin, it
>>>> is used for bitcode recompilation, and some might use it to achieve other
>>>> goals.
>>>>
>>>> In order to use this information properly, you needs to have tools that
>>>> understand the layout and sections for embedded bitcode. You can't just use
>>>> an ordinary linker, because like you said, an ELF linker will just append
>>>> the bitcode. Depending on what you are trying to achieve, you need to
>>>> implement the downstream tools, like linker, binary analysis tools, etc. to
>>>> understand this concept.
>>>>
>>>> Steven
>>>>
>>>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm trying to understand how .llvmbc and .llvmcmd fit into an end-to-end
>>>> story. From the RFC
>>>> <http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>,
>>>> and reading through the implementation, I'm piecing together that the goal
>>>> was to enable capturing IR right after clang and before passing it to
>>>> LLVM's optimization passes, as well as the command line options needed for
>>>> later compiling that IR to the same native object it was compiled to
>>>> originally (with the same compiler).
>>>>
>>>> Here's what I don't understand: say you have a.o and b.o compiled with
>>>> -fembed-bitcode=all. They are linked into a binary called my_binary. How do
>>>> you re-create the corresponding IR for modules a and b (let's call them
>>>> a.bc and b.bc), and their corresponding command lines? From what I can
>>>> tell, the linker just concatenates the IR for a and b in my_binary's
>>>> .llvmbc, and the same for the command line in .llvmcmd. Is there a
>>>> separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1 could
>>>> be that separator, what about the .llvmbc part? The magic number?
>>>>
>>>> Thanks!
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>> *Attachments:*
>>>>
>>>>    - ATT00001.txt
>>>>
>>>>
>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>

>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev