[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

Fāng-ruì Sòng via llvm-dev llvm-dev at lists.llvm.org
Sat Aug 29 21:48:24 PDT 2020


On Sat, Aug 29, 2020 at 7:22 PM Sean Bartell via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Aug 28, 2020, at 16:31, Mircea Trofin via llvm-dev wrote:
> >
> >
> > On Fri, Aug 28, 2020 at 2:16 PM Fangrui Song <maskray at google.com> wrote:
> >> On 2020-08-28, Mircea Trofin via llvm-dev wrote:
> >> >On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at gmail.com> wrote:
> >> >
> >> >> So maybe the goal/desire is to have a different semantic, rather than the
> >> >> equivalent semantic being different on ELF compared to MachO.
> >> >>
> >> >> So if it's a different semantic - yeah, I'd guess a flag that prefixes the
> >> >> module metadata with a length would make sense, then it can be linked
> >> >> naturally on any platform. (if the "don't link these sections" support on
> >> >> Darwin is done by the linker hardcoding the section name - then maybe this
> >> >> flag would also put the data in a different section that isn't linker
> >> >> stripped on Darwin, so users interested in getting everything linked
> >> >> together can do so on any platform)
> >> >>
> >> >> But if this data is linked, then it'd be hard to know which command line
> >> >> goes with which module, yes? So maybe it'd make sense then to have the
> >> >> command line as a header before the module, in the same section. So they're
> >> >> kept together.
> >> >>
> >> >This last point was my follow-up :)
> >>
> >> A module has a source_filename field.
> >>
> >> clang -fembed-bitcode=all -c d/a.c
> >> llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null
> >> llvm-dis < a.bc => source_filename = "d/a.c"
> >>
> >> The missing piece is a mechanism to extract a module from concatenated
> >> bitcode (llvm-dis supports multi-module bitcode but not concatenated
> >> bitcode https://reviews.llvm.org/D70153). I'll be happy to look into it:)
> >>
> >> ---
> >>
> >> .llvmcmd may need the source file to be more useful.
> > Right - I think, for the non-Darwin concatenated case, all three of us (David, you, and I) are thinking along the lines of keeping together: the module name, the bytecode, and the command line - effectively not using .llvmcmd, and being able to correctly extract, by design, the rest of the information.
>
> Here's the format I would suggest:
>
> 1. Put command-line flags in the module metadata instead of .llvmcmd.
> 2. Put each module in the bitcode wrapper supported by SkipBitcodeWrapperHeader, which includes a length field. I think LLVM only generates the wrapper for Darwin, but it can read the wrapper correctly on all platforms.
> 3. Change the .llvmbc section alignment so that no extra zeros are added between modules.
>
> My use case: I'm using -fembed-bitcode on Linux as an alternative to the wllvm/whole-program-llvm tool. For my purposes, it'd be nice to also keep track of linker flags and other linker input files, but I can get most of what I need from the modules alone.
>
> Sean

I investigated a bit about the bitcode file format today. The bitcode
is streaming style and I think an optional size field may be useful.
 https://reviews.llvm.org/D86847 proposes to add a
BITCODE_SIZE_BLOCK_ID block. We actually don't need a container
because
the MODULE_CODE_SOURCE_FILENAME record encodes the source filename. We
can do a lightweight parse and obtain the field.
This should be fast because there are typically very few
blocks/records preceding MODULE_CODE_SOURCE_FILENAME.

For .llvmcmd, I am on the fence moving it into the bitcode. Downside:
retrieving the command line will be more difficult...
I'd like to mention that the functionality duplicates the existing
-frecord-command-line a bit...

% readelf -p .GCC.command.line a.o

String dump of section '.GCC.command.line':
  [     1]  /tmp/clang-12 -c -frecord-command-line a.c

(GCC -frecord-gcc-switches uses a different format (some folks
consider it inferior to clang's format; and worse, the section is
SHF_MERGE|SHF_STRINGS):
% readelf -p .GCC.command.line a.o

String dump of section '.GCC.command.line':
  [     0]  -imultiarch x86_64-linux-gnu
  [    1d]  a.c
  [    21]  -mtune=generic
  [    30]  -march=x86-64
  [    3e]  -frecord-gcc-switches
  [    54]  -fasynchronous-unwind-tables


More information about the llvm-dev mailing list