<div dir="ltr"><div><font face="monospace">Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases.</font></div><div><font face="monospace"><br></font></div><font face="monospace">RFC: Safe Whole Program Devirtualization Enablement<br>===================================================<br><br>High Level Summary<br>------------------<br><br>The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination.<br><br>The summary of changes required are (these are described in more detail later):<br><br>1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index).<br><br>2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility).<br><br>3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD).<br><br>4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads).<br><br>Background<br>----------<br><br>Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See <a href="https://clang.llvm.org/docs/LTOVisibility.html">https://clang.llvm.org/docs/LTOVisibility.html</a> for more information.<br><br>The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization.<br><br>However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary.<br><br>Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables.<br><br>However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously.<br><br>The goals of the changes described in this RFC are to essentially delay the application </font>of<font face="monospace"> -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility.<br><br>Type Information for Devirtualization<br>-------------------------------------<br><br>LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (<a href="https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic">https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic</a>). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility.<br><br>In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied. <br><br>Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate.<br><br>There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at <a href="https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata">https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata</a>. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later).<br><br>This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization).<br><br>Additional changes to the LTO compilation steps are detailed below.<br><br>Pre-Link LTO Compile<br>--------------------<br><br>First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility.<br><br>Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions.<br><br>Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables).<br><br>LTO Link Handling<br>-----------------<br><br>During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary.<br><br>If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped.<br><br>By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option:<br><br>For lld, the proposed option is: -lto-whole-program-visibility.<br>For gold, the corresponding plugin option would be “whole-program-visibility”.<br><br>When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries.<br><br>LTO Backend Handling<br>--------------------<br><br>No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation.<br><br>As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads).<br><br>Status<br>------<br><br>These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days.</font><br><br>-- <br>Teresa Johnson | Software Engineer | <a href="mailto:tejohnson@google.com">tejohnson@google.com</a> |</div>