[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

Thu Aug 27 22:25:57 PDT 2020

Thanks, Sean, Steven,

to explore this a bit further, are there currently users for non-Darwin
cases? I wonder if it would it be an issue if we inserted markers in the
section (maybe as an opt-in, if there were users), such that, when
concatenated, the resulting section would be self-describing, for a
specialized reader, of course - basically, achieve what Sean described, but
"by design".

For instance, each .o file could have a size, followed by the payload
(maybe include in the payload the name of the module, too; maybe compress
it, too). Same for the .llvmcmd case.

On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at illinois.edu> wrote:

> Hi Mircea,
>
> If you use an ordinary linker that concatenates .llvmbc sections, you can
> use this code to get the size of each bitcode module. As far as I know,
> there's no clean way to separate the .llvmcmd sections without making
> assumptions about what options were used.
>
> // Given a bitcode file followed by garbage, get the size of the actual
> // bitcode. This only works correctly with some kinds of garbage (in
> // particular, it will work if the bitcode file is followed by zeros, or if
> // it's followed by another bitcode file).
> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
>   const unsigned char *BufPtr =
>       reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());
>   const unsigned char *EndBufPtr =
>       reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
>     const unsigned char *FixedBufPtr = BufPtr;
>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
>       report_fatal_error("Invalid bitcode wrapper");
>     return EndBufPtr - BufPtr;
>   }
>
>   if (!isRawBitcode(BufPtr, EndBufPtr))
>     report_fatal_error("Invalid magic bytes; not a bitcode file?");
>
>   BitstreamCursor Reader(Buffer);
>   Reader.Read(32); // skip signature
>   while (true) {
>     size_t EntryStart = Reader.getCurrentByteNo();
>     BitstreamEntry Entry =
>         Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
>     if (Entry.Kind == BitstreamEntry::SubBlock) {
>       if (Reader.SkipBlock())
>         report_fatal_error("Invalid bitcode file");
>     } else {
>       // We must have reached the end of the module.
>       return EntryStart;
>     }
>   }
> }
>
> Sean
>
> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
>
> Hi Mircea
>
> From the RFC you mentioned, that is a Darwin specific implementation,
> which later got extended to support other targets. The main use case for
> the embed bitcode option is to allow compiler passing intermediate IR and
> command flags in the object file it produced for later use. For Darwin, it
> is used for bitcode recompilation, and some might use it to achieve other
> goals.
>
> In order to use this information properly, you needs to have tools that
> understand the layout and sections for embedded bitcode. You can't just use
> an ordinary linker, because like you said, an ELF linker will just append
> the bitcode. Depending on what you are trying to achieve, you need to
> implement the downstream tools, like linker, binary analysis tools, etc. to
> understand this concept.
>
> Steven
>
> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hello,
>
> I'm trying to understand how .llvmbc and .llvmcmd fit into an end-to-end
> story. From the RFC
> <http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>, and
> reading through the implementation, I'm piecing together that the goal was
> to enable capturing IR right after clang and before passing it to
> LLVM's optimization passes, as well as the command line options needed for
> later compiling that IR to the same native object it was compiled to
> originally (with the same compiler).
>
> Here's what I don't understand: say you have a.o and b.o compiled with
> -fembed-bitcode=all. They are linked into a binary called my_binary. How do
> you re-create the corresponding IR for modules a and b (let's call them
> a.bc and b.bc), and their corresponding command lines? From what I can
> tell, the linker just concatenates the IR for a and b in my_binary's
> .llvmbc, and the same for the command line in .llvmcmd. Is there a
> separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1 could
> be that separator, what about the .llvmbc part? The magic number?
>
> Thanks!
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> *Attachments:*
>
>    - ATT00001.txt
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200827/4d7756f6/attachment.html>