[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt
Peter Collingbourne via llvm-dev
llvm-dev at lists.llvm.org
Wed May 4 15:02:20 PDT 2016
I wanted to make this proposal to extend ThinLTO to allow a bitcode module
to embed another bitcode module containing summary information. The purpose
of doing so is to support CFI and whole-program devirtualization
optimizations under ThinLTO.
The CFI and whole-program devirtualization optimizations work by
transforming vtables according to the class hierarchy. For example, if a
class A has two derived classes B and C, CFI will lay out the vtables for
A, B and C consecutively, so that clients can check that a vtable refers to
a derived class of A by performing arithmetic on the virtual function
pointer. For more details, see .
Both CFI and vtable opt rely on bitset metadata  in order to know where
the address points for the vtables are located. This is currently encoded
using module-level metadata.
In order to lay out the vtables correctly, all vtables need to be visible
at once. This is the only part of the process that requires full LTO. The
rest of the process can just rely on a set of summary metadata that
contains information about how to perform CFI checks for a particular
class, or how to devirtualize a particular virtual call. This information
could be made part of the ThinLTO summary.
The idea is to allow bitcode to contain embedded summary blobs. For
example, in our scenario, the summary bitcode would contain a section with
an embedded blob consisting of a bitcode file containing definitions of the
vtables defined by that translation unit and the bitset metadata for CFI
and vtable opt, and the "top-level" bitcode would contain everything else.
The mechanism for merging summaries would be to link the embedded summary
bitcode files into a single module using the IRMover, with a mechanism very
similar to regular LTO. This would move all the necessary vtables and
metadata into a single module where they can be processed using the
existing LowerBitSets and WholeProgramDevirt passes, which would be
extended to export summary metadata. This summary metadata would be copied
into the regular summary information, where it can be used by individual
In the future, we could also consider representing importing summaries as
metadata. That would also make the summary loading process very
1) We could use a native object file, with one section named ".llvmbc"
containing the summary module with the vtables and CFI metadata, and
another section ".llvmbc.thin" containing "everything else". This would be
my preferred option, as it would make things even simpler. For example, the
linker could handle the top-level sections as it reads them, and it would
allow the individual sections to be extracted (e.g. using objcopy) and
inspected by normal tools, such as llvm-as and llvm-dis. The native object
format could also be the container for native code; see my earlier proposal
The implementation in lld is very simple (about 10 lines in my prototype),
but I can accept that it may be more difficult in other linkers, so those
linkers may want to use bitcode as the top-level format. In that case, we
would probably want to go with what I described in "Implementation".
2) We could emit the vtables and CFI metadata directly into the top-level
bitcode. However, this would create a need for a mechanism to distinguish
vtables from non-vtables for when we link the LTO parts of the module. In
order to do this, we could add a new bitcode record type for bitset
metadata that could also act as an index for vtables in a similar way to
how ThinLTO importing summaries already work. However, this would add even
more complexity to the bitcode format, when I feel that we should really be
going the other way with a simpler bitcode format.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev