<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Aug 28, 2020 at 2:16 PM Fangrui Song <<a href="mailto:maskray@google.com">maskray@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2020-08-28, Mircea Trofin via llvm-dev wrote:<br>

>On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<br>

><br>

>> You should probably pull in some folks who implemented/maintain the<br>

>> feature for Darwin.<br>

>><br>

>> I guess they aren't linking this info, but only communicating in the<br>

>> object file between tools - maybe they flag these sections (either in the<br>

>> object, or by the linker) as ignored/dropped during linking. That semantic<br>

>> could be implemented in ELF too by marking the sections SHF_IGNORED or<br>

>> something (same-file split DWARF uses this technique).<br>

<br>

The .llvmbc / .llvmcmd section does not have the SHF_EXCLUDE flag. It<br>

will be retained in the linked image.<br></blockquote><div><br>Ah, yes, I understand that's the current situation - I meant "if dropping during linking is the semantic that's already implemented for MachO/ld64 (either with some MachO attribute, or hardcoded behavior in ld64) where the feature is fully-fledged/working-as-intended, we could match that semantic on ELF by using SHF_EXCLUDE".<br><br>& then designing a separate but related feature for "I want bitcode and build commands that end up in the final linked binary" - at which point it might be a different format (using a length prefix) and at that point maybe consider putting the build command in the header rather than in a separate section.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

>> So maybe the goal/desire is to have a different semantic, rather than the<br>

>> equivalent semantic being different on ELF compared to MachO.<br>

>><br>

>> So if it's a different semantic - yeah, I'd guess a flag that prefixes the<br>

>> module metadata with a length would make sense, then it can be linked<br>

>> naturally on any platform. (if the "don't link these sections" support on<br>

>> Darwin is done by the linker hardcoding the section name - then maybe this<br>

>> flag would also put the data in a different section that isn't linker<br>

>> stripped on Darwin, so users interested in getting everything linked<br>

>> together can do so on any platform)<br>

>><br>

>> But if this data is linked, then it'd be hard to know which command line<br>

>> goes with which module, yes? So maybe it'd make sense then to have the<br>

>> command line as a header before the module, in the same section. So they're<br>

>> kept together.<br>

>><br>

>This last point was my follow-up :)<br>

<br>

A module has a source_filename field.<br>

<br>

clang -fembed-bitcode=all -c d/a.c<br>

llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null<br>

llvm-dis < a.bc => source_filename = "d/a.c"<br>

<br>

The missing piece is a mechanism to extract a module from concatenated<br>

bitcode (llvm-dis supports multi-module bitcode but not concatenated<br>

bitcode <a href="https://reviews.llvm.org/D70153" rel="noreferrer" target="_blank">https://reviews.llvm.org/D70153</a>). I'll be happy to look into it:)<br>

<br>

---<br>

<br>

.llvmcmd may need the source file to be more useful.<br>

<br>

><br>

>><br>

>> On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <<br>

>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>><br>

>>> Thanks, Sean, Steven,<br>

>>><br>

>>> to explore this a bit further, are there currently users for non-Darwin<br>

>>> cases? I wonder if it would it be an issue if we inserted markers in the<br>

>>> section (maybe as an opt-in, if there were users), such that, when<br>

>>> concatenated, the resulting section would be self-describing, for a<br>

>>> specialized reader, of course - basically, achieve what Sean described, but<br>

>>> "by design".<br>

>>><br>

>>> For instance, each .o file could have a size, followed by the payload<br>

>>> (maybe include in the payload the name of the module, too; maybe compress<br>

>>> it, too). Same for the .llvmcmd case.<br>

>>><br>

>>> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <<a href="mailto:smbarte2@illinois.edu" target="_blank">smbarte2@illinois.edu</a>><br>

>>> wrote:<br>

>>><br>

>>>> Hi Mircea,<br>

>>>><br>

>>>> If you use an ordinary linker that concatenates .llvmbc sections, you<br>

>>>> can use this code to get the size of each bitcode module. As far as I know,<br>

>>>> there's no clean way to separate the .llvmcmd sections without making<br>

>>>> assumptions about what options were used.<br>

>>>><br>

>>>> // Given a bitcode file followed by garbage, get the size of the actual<br>

>>>> // bitcode. This only works correctly with some kinds of garbage (in<br>

>>>> // particular, it will work if the bitcode file is followed by zeros, or<br>

>>>> if<br>

>>>> // it's followed by another bitcode file).<br>

>>>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {<br>

>>>>   const unsigned char *BufPtr =<br>

>>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());<br>

>>>>   const unsigned char *EndBufPtr =<br>

>>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());<br>

>>>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {<br>

>>>>     const unsigned char *FixedBufPtr = BufPtr;<br>

>>>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))<br>

>>>>       report_fatal_error("Invalid bitcode wrapper");<br>

>>>>     return EndBufPtr - BufPtr;<br>

>>>>   }<br>

>>>><br>

>>>>   if (!isRawBitcode(BufPtr, EndBufPtr))<br>

>>>>     report_fatal_error("Invalid magic bytes; not a bitcode file?");<br>

>>>><br>

>>>>   BitstreamCursor Reader(Buffer);<br>

>>>>   Reader.Read(32); // skip signature<br>

>>>>   while (true) {<br>

>>>>     size_t EntryStart = Reader.getCurrentByteNo();<br>

>>>>     BitstreamEntry Entry =<br>

>>>>         Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);<br>

>>>>     if (Entry.Kind == BitstreamEntry::SubBlock) {<br>

>>>>       if (Reader.SkipBlock())<br>

>>>>         report_fatal_error("Invalid bitcode file");<br>

>>>>     } else {<br>

>>>>       // We must have reached the end of the module.<br>

>>>>       return EntryStart;<br>

>>>>     }<br>

>>>>   }<br>

>>>> }<br>

>>>><br>

>>>> Sean<br>

>>>><br>

>>>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:<br>

>>>><br>

>>>> Hi Mircea<br>

>>>><br>

>>>> From the RFC you mentioned, that is a Darwin specific implementation,<br>

>>>> which later got extended to support other targets. The main use case for<br>

>>>> the embed bitcode option is to allow compiler passing intermediate IR and<br>

>>>> command flags in the object file it produced for later use. For Darwin, it<br>

>>>> is used for bitcode recompilation, and some might use it to achieve other<br>

>>>> goals.<br>

>>>><br>

>>>> In order to use this information properly, you needs to have tools that<br>

>>>> understand the layout and sections for embedded bitcode. You can't just use<br>

>>>> an ordinary linker, because like you said, an ELF linker will just append<br>

>>>> the bitcode. Depending on what you are trying to achieve, you need to<br>

>>>> implement the downstream tools, like linker, binary analysis tools, etc. to<br>

>>>> understand this concept.<br>

>>>><br>

>>>> Steven<br>

>>>><br>

>>>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <<br>

>>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>>>><br>

>>>> Hello,<br>

>>>><br>

>>>> I'm trying to understand how .llvmbc and .llvmcmd fit into an end-to-end<br>

>>>> story. From the RFC<br>

>>>> <<a href="http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html" rel="noreferrer" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html</a>>,<br>

>>>> and reading through the implementation, I'm piecing together that the goal<br>

>>>> was to enable capturing IR right after clang and before passing it to<br>

>>>> LLVM's optimization passes, as well as the command line options needed for<br>

>>>> later compiling that IR to the same native object it was compiled to<br>

>>>> originally (with the same compiler).<br>

>>>><br>

>>>> Here's what I don't understand: say you have a.o and b.o compiled with<br>

>>>> -fembed-bitcode=all. They are linked into a binary called my_binary. How do<br>

>>>> you re-create the corresponding IR for modules a and b (let's call them<br>

>>>> a.bc and b.bc), and their corresponding command lines? From what I can<br>

>>>> tell, the linker just concatenates the IR for a and b in my_binary's<br>

>>>> .llvmbc, and the same for the command line in .llvmcmd. Is there a<br>

>>>> separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1 could<br>

>>>> be that separator, what about the .llvmbc part? The magic number?<br>

>>>><br>

>>>> Thanks!<br>

>>>> _______________________________________________<br>

>>>> LLVM Developers mailing list<br>

>>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>>>><br>

>>>><br>

>>>> *Attachments:*<br>

>>>><br>

>>>>    - ATT00001.txt<br>

>>>><br>

>>>><br>

>>>> _______________________________________________<br>

>>> LLVM Developers mailing list<br>

>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>>><br>

>><br>

<br>

>_______________________________________________<br>

>LLVM Developers mailing list<br>

><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br>

</blockquote></div></div>