<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div>Hi Mircea,<br></div><div><br></div><div>If you use an ordinary linker that concatenates .llvmbc sections, you can use this code to get the size of each bitcode module. As far as I know, there's no clean way to separate the .llvmcmd sections without making assumptions about what options were used.<br></div><div><br></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;">// Given a bitcode file followed by garbage, get the size of the actual</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;">// bitcode. This only works correctly with some kinds of garbage (in</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;">// particular, it will work if the bitcode file is followed by zeros, or if</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;">// it's followed by another bitcode file).</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;">size_t GetBitcodeSize(MemoryBufferRef Buffer) {</span><br></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> const unsigned char *BufPtr =</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> const unsigned char *EndBufPtr =</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> if (isBitcodeWrapper(BufPtr, EndBufPtr)) {</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> const unsigned char *FixedBufPtr = BufPtr;</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> report_fatal_error("Invalid bitcode wrapper");</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> return EndBufPtr - BufPtr;</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> }</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> if (!isRawBitcode(BufPtr, EndBufPtr))</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> report_fatal_error("Invalid magic bytes; not a bitcode file?");</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> BitstreamCursor Reader(Buffer);</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> Reader.Read(32); // skip signature</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> while (true) {</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> size_t EntryStart = Reader.getCurrentByteNo();</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> BitstreamEntry Entry =</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> if (Entry.Kind == BitstreamEntry::SubBlock) {</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> if (Reader.SkipBlock())</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> report_fatal_error("Invalid bitcode file");</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> } else {</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> // We must have reached the end of the module.</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> return EntryStart;</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> }</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"> }</span><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family:menlo, consolas, monospace, sans-serif;">}</span><br></div><div><br></div><div>Sean<br></div><div><br></div><div>On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:<br></div><blockquote type="cite" id="qt" style="overflow-wrap:break-word;"><div>Hi Mircea <br></div><div class="qt-"><br></div><div class="qt-">From the RFC you mentioned, that is a Darwin specific implementation, which later got extended to support other targets. The main use case for the embed bitcode option is to allow compiler passing intermediate IR and command flags in the object
file it produced for later use. For Darwin, it is used for bitcode recompilation, and some might use it to achieve other goals.<br></div><div class="qt-"><br></div><div class="qt-">In order to use this information properly, you needs to have tools that understand the layout and sections for embedded bitcode. You can't just use an ordinary linker, because like you said, an ELF linker will just append the bitcode. Depending
on what you are trying to achieve, you need to implement the downstream tools, like linker, binary analysis tools, etc. to understand this concept.<br></div><div class="qt-"><br></div><div class="qt-">Steven<br></div><div class="qt-"><div><div><br></div><blockquote type="cite" class="qt-"><div class="qt-">On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="qt-">llvm-dev@lists.llvm.org</a>> wrote:<br></div><div><br></div><div class="qt-"><div dir="ltr" class="qt-"><div>Hello, <br></div><div class="qt-"><br></div><div class="qt-">I'm trying to understand how .llvmbc and .llvmcmd fit into an end-to-end story. From <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html" class="qt-"> the RFC</a>, and reading through the implementation, I'm piecing together that the goal was to enable capturing IR right after clang and before passing it to LLVM's optimization passes, as well as the command line options needed for later compiling that IR
to the same native object it was compiled to originally (with the same compiler).<br></div><div class="qt-"><br></div><div class="qt-">Here's what I don't understand: say you have a.o and b.o compiled with -fembed-bitcode=all. They are linked into a binary called my_binary. How do you re-create the corresponding IR for modules a and b (let's call them a.bc and b.bc), and their
corresponding command lines? From what I can tell, the linker just concatenates the IR for a and b in my_binary's .llvmbc, and the same for the command line in .llvmcmd. Is there a separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1 could
be that separator, what about the .llvmbc part? The magic number?<br></div><div class="qt-"><br></div><div class="qt-">Thanks!<br></div></div><div>_______________________________________________<br></div><div> LLVM Developers mailing list<br></div><div> <a href="mailto:llvm-dev@lists.llvm.org" class="qt-">llvm-dev@lists.llvm.org</a><br></div><div> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br></div></div></blockquote></div></div><div><br></div><div><b>Attachments:</b><br></div><ul><li>ATT00001.txt<br></li></ul></blockquote><div><br></div></body></html>