[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

Steven Wu via llvm-dev llvm-dev at lists.llvm.org
Mon Jul 25 09:01:38 PDT 2016

> On Jul 25, 2016, at 3:24 AM, Jonas Devlieghere <jonas at devlieghere.com> wrote:
> Hi,
> I hope I'm not breaking any mailing list etiquette by replying to this
> mail, but if I am then please accept my apologies.
> On Fri, Jun 3, 2016 at 8:36 PM, Steven Wu via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi everyone
>> I am still in the process of upstreaming some improvements to the embed
>> bitcode option. If you want more background, you can read the previous RFC
>> (http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html). This
>> is part II of the discussion.
>> Current Status:
>> A basic version of -fembed-bitcode option is upstreamed and functioning.
>> You can use -fembed-bitcode={off, all, bitcode, marker} option to control
>> what gets embedded in the final object file output:
>> off: default, nothing gets embedded.
>> all: optimized bitcode and command line options gets embedded in the object
>> file.
>> bitcode: only optimized bitcode is embedded
>> marker: only put a marker in the object file
>> What needs to be improved:
>> 1. Whitelist for command line options that can be used with bitcode:
>> Current trunk implementation embeds all the cc1 command line options (that
>> includes header include paths, warning flags and other front-end options) in
>> the command line section. That is lot of redundant information. To re-create
>> the object file from the embedded optimized bitcode, most of these options
>> are useless. On the other hand, they can leak information of the source
>> code. One solution will be keeping a list of all the options that can affect
>> code generation but not encoded in the bitcode. I have internally prototyped
>> with disallowing these options explicitly and allowed only the reminder of
>> the  options to be embedded (http://reviews.llvm.org/D17394). A better
>> solution might be encoding that information in "Options.td" as specific
>> group.
>> 2. Assembly input handling:
>> This is a workaround to allow source code written in assembly to work with
>> "-fembed-bitcode" options. When compiling assembly source code with
>> "-fembed-bitcode", clang-as creates an empty section "__LLVM, __asm" in the
>> object file. That is just a way to distinguish object files compiled from
>> assembly source from those compiled from higher level source code but forgot
>> to use "-fembed-bitcode" options. Linker can use this section to diagnose if
>> "-fembed-bitcode" is consistently used on all the object files participated
>> in the linking.
>> 3. Bitcode symbol hiding:
>> There was some concerns for leaking source code information when using
>> bitcode feature. One approach to avoid the leak is to add a pass which
>> renames all the globals and metadata strings. The also keeps a reverse map
>> in case the original name needs to be recovered. The final bitcode should
>> contain no more symbols or debug info than a stripped binary. To make sure
>> modified bitcode can still be linked correctly, the renaming need to be
>> consistent across all bitcode participated in the linking and everything
>> that is external of the linkage unit need to be preserved. This means the
>> pass can only be run during the linking and requires some LTO api.
> Regarding the symbol map, are you planning to upstream a pass that
> restores the symbols? I have been trying to do this myself in order to
> reverse the "BCSymbolMap". However this turned out to be less
> straightforward than I'd hoped. Any info on this would be greatly
> appreciated!

We have tools to restore symbols in the dSYM bundle (check dsymutil -symbol-map option in the Apple toolchain). 
I don't think we have a pass to restore the symbols in the bitcode now but that should be very straight forward and I am happy to implement one as a part of the item 3. 
Of course, that will only happen if the community thinks this feature is beneficial to them. At the meantime, if you need assist, please file a radar to Apple at https://bugreport.apple.com <https://bugreport.apple.com/>.


>> 4. Debug info strip to line-tables pass:
>> As the name suggested, this pass strip down the full debug info to
>> line-tables only. This is also one of the steps we took to prevent the leak
>> of source code information in bitcode.
>> Please let me know what do you think about the pieces above or if you have
>> any concerns about the methodology. I will put up patches for review soon.
>> Thanks
>> Steven
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> Cheers,
> Jonas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/d55bef6d/attachment.html>

More information about the llvm-dev mailing list