[llvm-dev] [RFC] Embedding Bitcode in Object Files

Wed Feb 3 10:25:32 PST 2016

Apple has some internal implemenation for embedding bitcode in the object file
that we would like to upstream. It has few changes to clang frontend, including
new clang options, clang driver changes and utilities to embed bitcode inside
object file. We believe upstreaming these implementations will benefit the
people who would like to develop software on Apple platform using open source
LLVM. It also helps the driver compatibility and it aligns with some of ongoing
efforts like Thin-LTO which also has an object wrapper for bitcode.

Embedded Bitcode Design:
Embedded Bitcode are designed to enable bitcode distribution without disturbing
normal development flow. When a program is compiled with bitcode, clang will
embed the optimized bitcode in a special section in the object file, together
with the options that is used during the compilation. The object file will still
have the normal TEXT, DATA sections for normal linking. During the linking,
linker will check all the input object files have embedded bitcode and collect
the bitcode into an archive which is embedded in the output. The archive also
contains all the information that is needed to rebuild the linked binary. All
compilation and linking stage can be replayed to generated the final binary.

There are mainly two parts we would like to upstream first:
1. Clang Driver:
Adding -fembed-bitcode option. When this new option is used, it will split the
compilation into 2 stages. The first stage runs the frontend and all the
optimization passes, and the second stage embeds the bitcode from the first
stage then runs the CodeGen passes.  There is also a -fembed-bitcode-marker
option that doesn't split the compilation into 2 stages and it only puts an 1
byte marker into the object file. This is used to speed up the debug build
because bitcode serialization and verification will make -fembed-bitcode slower
especially with -O0 -g. Linker can still check the presence of the section to
provide feedback if any of the object files participated in the linking is
missing bitcode in a full bitcode build.
2. Bitcode Embedding:
Several special sections are used by bitcode to mark the presence of the bitcode
in the MachO file.
"__LLVM, __bitcode" is used to store the optimized bitcode in the object file.
It can have an 1-byte size as a marker to provide diagnostics in debug build.
"__LLVM, __cmdline" is used to store the clang command-line options.  There are
few options that are not reflected in the bitcode that we would like to replay in
the rebuild. For example, '-O0' option makes us run FastISel during rebuild.

Thanks

Steven