[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

Fri Aug 28 14:31:38 PDT 2020

On Fri, Aug 28, 2020 at 2:16 PM Fangrui Song <maskray at google.com> wrote:

> On 2020-08-28, Mircea Trofin via llvm-dev wrote:
> >On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at gmail.com>
> wrote:
> >
> >> You should probably pull in some folks who implemented/maintain the
> >> feature for Darwin.
> >>
> >> I guess they aren't linking this info, but only communicating in the
> >> object file between tools - maybe they flag these sections (either in
> the
> >> object, or by the linker) as ignored/dropped during linking. That
> semantic
> >> could be implemented in ELF too by marking the sections SHF_IGNORED or
> >> something (same-file split DWARF uses this technique).
>
> The .llvmbc / .llvmcmd section does not have the SHF_EXCLUDE flag. It
> will be retained in the linked image.
>
> >> So maybe the goal/desire is to have a different semantic, rather than
> the
> >> equivalent semantic being different on ELF compared to MachO.
> >>
> >> So if it's a different semantic - yeah, I'd guess a flag that prefixes
> the
> >> module metadata with a length would make sense, then it can be linked
> >> naturally on any platform. (if the "don't link these sections" support
> on
> >> Darwin is done by the linker hardcoding the section name - then maybe
> this
> >> flag would also put the data in a different section that isn't linker
> >> stripped on Darwin, so users interested in getting everything linked
> >> together can do so on any platform)
> >>
> >> But if this data is linked, then it'd be hard to know which command line
> >> goes with which module, yes? So maybe it'd make sense then to have the
> >> command line as a header before the module, in the same section. So
> they're
> >> kept together.
> >>
> >This last point was my follow-up :)
>
> A module has a source_filename field.
>
> clang -fembed-bitcode=all -c d/a.c
> llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null
> llvm-dis < a.bc => source_filename = "d/a.c"
>
> The missing piece is a mechanism to extract a module from concatenated
> bitcode (llvm-dis supports multi-module bitcode but not concatenated
> bitcode https://reviews.llvm.org/D70153). I'll be happy to look into it:)
>
> ---
>
> .llvmcmd may need the source file to be more useful.
>
Right - I think, for the non-Darwin concatenated case, all three of us
(David, you, and I) are thinking along the lines of keeping together: the
module name, the bytecode, and the command line - effectively not using
.llvmcmd, and being able to correctly extract, by design, the rest of the
information.

> >
> >>
> >> On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <
> >> llvm-dev at lists.llvm.org> wrote:
> >>
> >>> Thanks, Sean, Steven,
> >>>
> >>> to explore this a bit further, are there currently users for non-Darwin
> >>> cases? I wonder if it would it be an issue if we inserted markers in
> the
> >>> section (maybe as an opt-in, if there were users), such that, when
> >>> concatenated, the resulting section would be self-describing, for a
> >>> specialized reader, of course - basically, achieve what Sean
> described, but
> >>> "by design".
> >>>
> >>> For instance, each .o file could have a size, followed by the payload
> >>> (maybe include in the payload the name of the module, too; maybe
> compress
> >>> it, too). Same for the .llvmcmd case.
> >>>
> >>> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at illinois.edu>
> >>> wrote:
> >>>
> >>>> Hi Mircea,
> >>>>
> >>>> If you use an ordinary linker that concatenates .llvmbc sections, you
> >>>> can use this code to get the size of each bitcode module. As far as I
> know,
> >>>> there's no clean way to separate the .llvmcmd sections without making
> >>>> assumptions about what options were used.
> >>>>
> >>>> // Given a bitcode file followed by garbage, get the size of the
> actual
> >>>> // bitcode. This only works correctly with some kinds of garbage (in
> >>>> // particular, it will work if the bitcode file is followed by zeros,
> or
> >>>> if
> >>>> // it's followed by another bitcode file).
> >>>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
> >>>>   const unsigned char *BufPtr =
> >>>>       reinterpret_cast<const unsigned char
> *>(Buffer.getBufferStart());
> >>>>   const unsigned char *EndBufPtr =
> >>>>       reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
> >>>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
> >>>>     const unsigned char *FixedBufPtr = BufPtr;
> >>>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
> >>>>       report_fatal_error("Invalid bitcode wrapper");
> >>>>     return EndBufPtr - BufPtr;
> >>>>   }
> >>>>
> >>>>   if (!isRawBitcode(BufPtr, EndBufPtr))
> >>>>     report_fatal_error("Invalid magic bytes; not a bitcode file?");
> >>>>
> >>>>   BitstreamCursor Reader(Buffer);
> >>>>   Reader.Read(32); // skip signature
> >>>>   while (true) {
> >>>>     size_t EntryStart = Reader.getCurrentByteNo();
> >>>>     BitstreamEntry Entry =
> >>>>         Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
> >>>>     if (Entry.Kind == BitstreamEntry::SubBlock) {
> >>>>       if (Reader.SkipBlock())
> >>>>         report_fatal_error("Invalid bitcode file");
> >>>>     } else {
> >>>>       // We must have reached the end of the module.
> >>>>       return EntryStart;
> >>>>     }
> >>>>   }
> >>>> }
> >>>>
> >>>> Sean
> >>>>
> >>>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
> >>>>
> >>>> Hi Mircea
> >>>>
> >>>> From the RFC you mentioned, that is a Darwin specific implementation,
> >>>> which later got extended to support other targets. The main use case
> for
> >>>> the embed bitcode option is to allow compiler passing intermediate IR
> and
> >>>> command flags in the object file it produced for later use. For
> Darwin, it
> >>>> is used for bitcode recompilation, and some might use it to achieve
> other
> >>>> goals.
> >>>>
> >>>> In order to use this information properly, you needs to have tools
> that
> >>>> understand the layout and sections for embedded bitcode. You can't
> just use
> >>>> an ordinary linker, because like you said, an ELF linker will just
> append
> >>>> the bitcode. Depending on what you are trying to achieve, you need to
> >>>> implement the downstream tools, like linker, binary analysis tools,
> etc. to
> >>>> understand this concept.
> >>>>
> >>>> Steven
> >>>>
> >>>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
> >>>> llvm-dev at lists.llvm.org> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I'm trying to understand how .llvmbc and .llvmcmd fit into an
> end-to-end
> >>>> story. From the RFC
> >>>> <http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>,
> >>>> and reading through the implementation, I'm piecing together that the
> goal
> >>>> was to enable capturing IR right after clang and before passing it to
> >>>> LLVM's optimization passes, as well as the command line options
> needed for
> >>>> later compiling that IR to the same native object it was compiled to
> >>>> originally (with the same compiler).
> >>>>
> >>>> Here's what I don't understand: say you have a.o and b.o compiled with
> >>>> -fembed-bitcode=all. They are linked into a binary called my_binary.
> How do
> >>>> you re-create the corresponding IR for modules a and b (let's call
> them
> >>>> a.bc and b.bc), and their corresponding command lines? From what I can
> >>>> tell, the linker just concatenates the IR for a and b in my_binary's
> >>>> .llvmbc, and the same for the command line in .llvmcmd. Is there a
> >>>> separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1
> could
> >>>> be that separator, what about the .llvmbc part? The magic number?
> >>>>
> >>>> Thanks!
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> llvm-dev at lists.llvm.org
> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>
> >>>>
> >>>> *Attachments:*
> >>>>
> >>>>    - ATT00001.txt
> >>>>
> >>>>
> >>>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>
>
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200828/5dfe3a4f/attachment-0001.html>