[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Wed Dec 11 06:21:47 PST 2019

Please send any comments. As mentioned at the end I will follow up with
some patches as soon as they are cleaned up and I create some test cases.

RFC: Safe Whole Program Devirtualization Enablement
===================================================

High Level Summary
------------------

The goal of the changes described in this RFC is to support aggressive
Whole Program Devirtualization without requiring -fvisibility=hidden at
compile time, by pre-enabling bitcode for whole program devirtualization,
but delaying the decision on whether to apply devirtualization until LTO
link time. This is needed both because we may not know whether the link
mode is safe for hidden LTO visibility until link time, and also to allow
bitcode objects to be shared between links of targets with differing valid
LTO visibility. This utilizes the !vcall_visibility metadata added for Dead
Virtual Function Elimination.

The summary of changes required are (these are described in more detail
later):

1) When -fwhole-program-vtables is specified, always insert type test
assumes for virtual calls, and additionally add !vcall_visibility metadata
to vtable definitions (which will be summarized in the ThinLTO index).

2) At LTO link time, apply hidden LTO visibility to vtable definition
vcall_visibility metadata (or summary) when specified by a new link option
(-lto-whole-program-visibility).

3) During the LTO link time Whole Program Devirtualization analysis, only
allow devirtualization when the associated vtable definitions have hidden
LTO visibility, as derived from the !vcall_visibility metadata (summarized
in the index for index-only WPD).

4) Modify the Virtual Function Elimination application in GlobalDCE to
ignore vtables with !vcall_visibility when they are associated with type
tests (and not just type checked loads).

Background
----------

Whole Program Devirtualization is supported for LTO (both regular and Thin)
via the -fwhole-program-vtables option. However, it can only be safely
applied to classes for which LTO can analyze the entire class hierarchy,
and therefore is restricted to those classes with hidden LTO visibility.
See https://clang.llvm.org/docs/LTOVisibility.html for more information.

The LTO visibility of a class is derived at compile time from the class’s
symbol visibility. Generally, only classes that are internal at the source
level (e.g. declared in an anonymous namespace) receive hidden LTO
visibility. Compiling with -fvisibility=hidden tells the compiler that,
unless otherwise marked, symbols are assumed to have hidden visibility,
which also implies that all classes have hidden LTO visibility (unless
decorated with a public visibility attribute). This results in much more
aggressive devirtualization.

However, compiling with -fvisibility=hidden is only safe when we know we
are LTO linking with full view of the class hierarchy. Specifically, this
is true when a binary is being LTO linked with either all sources being
bitcode (so that the LTO unit is the same as the linkage unit), or when the
only translation units being linked as native code are known to not derive
any classes defined in the LTO unit (e.g. system libraries). Additionally,
the binary may not dlopen any libraries at runtime that contain classes
derived from those defined in the main binary.

Assuming we are building and linking a binary that satisfies the above
constraints (we are LTO linking all translation units as bitcode, except
certain (e.g. system) libraries or other native objects known to be safe by
the user or build system, and the binary will not dlopen any libraries
deriving from the binary’s classes), then it should be safe to compile with
-fvisibility=hidden, along with -fwhole-program-vtables.

However, there are cases where it is unknown until link time whether we are
building a target that meets the above constraints. Additionally, we may
want to build additional targets that do not meet the criteria for safe
application of -fvisibility=hidden during the same build invocation
(specifically, because subsets of the code will be linked into shared
libraries instead of linking all code directly into the binary). Even if
possible to build two sets of bitcode object files (one with default
visibility for the unsafely linked targets and one with hidden visibility
for the safely linked targets), this causes duplication in both time and
space, which is prohibitive in an environment where it is common to build
targets with tens of thousands of sources, and multiple targets with
different link modes simultaneously.

The goals of the changes described in this RFC are to essentially delay the
application of -fvisibility=hidden until LTO link time, and allow bitcode
objects to be shared between links of targets with differing link modes and
therefore differing valid LTO visibility.

Type Information for Devirtualization
-------------------------------------

LTO whole program devirtualization is driven off of type information in the
IR. This includes type metadata (on vtable definitions), as well as type
test intrinsics before virtual calls. The former is safe to emit into the
IR in all cases, but the latter is currently not. The virtual call sites
are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence,
which drives the LTO analysis of virtual calls. This sequence is an
assertion that the given pointer is associated with the given type
identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic).
It is currently inserted only for classes with hidden LTO visibility as the
implication of this sequence is that we have full visibility of that type’s
class hierarchy, and may devirtualize the call based on that knowledge.
This assumption is not valid if the class does not have hidden LTO
visibility.

In order to drive later devirtualization, we still need the type
compatibility information provided by the llvm.type.test, but want to delay
a decision on whether it is valid to assume that we have full class
hierarchy visibility, and thus whether devirtualization of that target can
be safely applied.

Specifically, what we want to know at LTO time is whether the vtable has
hidden LTO visibility or not, and use that to guide the application of
devirtualization to the type tested virtual call sites. By default, only
those with statically guaranteed hidden LTO visibility should be marked as
such. And as described later, at LTO link time we can optionally decide to
convert vtables to hidden LTO visibility for more aggressive
devirtualization when appropriate.

There is already a mechanism in the compiler to describe the vtable
visibility, which was recently added for Dead Virtual Function Elimination
(D63932): !vcall_visibility metadata, documented at
https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This
metadata is attached to vtable definitions, currently only when VFE is
enabled. As described in the documentation, because this is currently only
used for VFE, it also requires that the corresponding function pointer
loads use the llvm.type.checked.load intrinsic. This would not be required
for devirtualization (although the VFE support in GlobalDCE will need
modification to ignore the metadata when type checked loads not used, more
on that later).

This RFC proposes adding the !vcall_visibility metadata to vtable
definitions when -fwhole-program-vtables is specified. Unlike for VFE, the
function pointer loads can still use normal loads with corresponding type
test assume sequences (better for optimization).

Additional changes to the LTO compilation steps are detailed below.

Pre-Link LTO Compile
--------------------

First, type test assume sequences will be inserted when
-fwhole-program-vtables is specified, and not just for classes with hidden
LTO visibility.

Second, as mentioned earlier, the !vcall_visibility metadata will be
inserted under -fwhole-program-vtables. For the purposes of index-only WPD,
a single-bit flag indicating whether or not the vtable def has hidden LTO
visibility is added to the GVarFlags on the GlobalVarSummary. Note that we
can collapse the 3 enum values of the metadata down to a single bit,
because for the purposes of devirtualization, both
VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be
treated the same (we only need to have at least VCallVisibilityLinkageUnit
to devirtualize). The ModuleSummaryIndex builder will set this new flag
from the !vcall_visibility metadata on vtable definitions.

Finally, the VFE support in GlobalDCE (which is enabled by default and
currently triggers automatically in the presence of this metadata), will
need to be modified to ignore !vcall_visibility metadata inserted for
devirtualization only, i.e. when there are any type test assume sequences
for that Type ID. This should be straightforward, as we can scan the type
tests and remove any vtables decorated with compatible type ids from
VFESafeVTables. Note that this change will affect the invocation of
GlobalDCE both here in the pre-link LTO compile as well as later in the LTO
Backend (where it is applied to a broader set of vtables).

LTO Link Handling
-----------------

During Whole Program Devirtualization analysis, when looking at the vtables
corresponding to the summarized virtual calls during
tryFindVirtualCallTargets, we must consult the vcall_visibility
information. For hybrid (regular+thin) LTO, the vtable definitions are in
the regular LTO partition and so the IR can be consulted directly. For
index-only WPD, we instead consult the flag on the vtable’s
GlobalVarSummary.

If any of the vtable definitions compatible with a given virtual call have
public LTO visibility, the devirtualization must be skipped.

By default, only classes that have statically determined hidden LTO
visibility would be allowed to devirtualize. However, as noted earlier, we
want to enable more aggressive devirtualization at LTO link time when we
know that the linking mode guarantees full LTO visibility of any code that
may derive classes from the bitcode being linked. To do so, we will add a
new linker option:

For lld, the proposed option is: -lto-whole-program-visibility.
For gold, the corresponding plugin option would be
“whole-program-visibility”.

When this option is set, LTO will convert all vtable definitions to have
hidden LTO visibility before invoking Whole Program Devirtualization. In
the hybrid LTO case this would mean changing the metadata on the IR. In the
index-only case this would be done in the summaries.

LTO Backend Handling
--------------------

No changes are required in the LTO backend’s invocation of Whole Program
Devirtualization, since any visibility constraints are enforced at LTO link
time, and the loosening of visibility under the new link option only needs
to affect the LTO WPD invocation.

As mentioned earlier when describing the pre-link LTO compile changes,
GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata
corresponding to type tests (and not just type checked loads).

Status
------

These changes have been prototyped and tested with index-only WPD (with the
exception of the proposed changes to GlobalDCE, at the moment I have been
testing with -enable-vfe=false). I will be cleaning up the changes and
sending patches for review in the coming days.

-- 
Teresa Johnson | Software Engineer | tejohnson at google.com |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191211/49c62d22/attachment.html>