[llvm-dev] RFC: dynamic_cast optimization in LTO

Mon Apr 6 09:15:09 PDT 2020

Hi Wael,

Sorry for the slow reply. +Peter who is a good person to comment as well.

This sounds interesting, and very related to the analysis needed for
WholeProgramDevirt. What kind of gains do you see from the optimization in
practice?

This is essentially the same type of analysis with some of the same
constraints on type metadata etc we need to do for WPD. Is your patch
implementing this for regular LTO or Thin LTO or both? You could take a
look at WPD to see how it implements this. It handles WPD for pure regular
LTO, for pure ThinLTO (the single implementation devirt only), and for a
hybrid mode where modules are split and the vtables are all in a regular
LTO module.

Specifically, on the two issues you mention:

>  1. the !type MD gets removed by some pass which will erase the evidence
that class types, corresponding to the VFTs that were listed in the MD, are
non-leaf.

We also have this constraint for WPD, so it shouldn't be a problem here.

>  2. the supposedly leaf class is actually derived from in a shared
library, and the transformation would become invalid.
>    I'm hoping this problem is not unique to my situation, and there must
be an existing solution to such a scenario. For example, bail out if we
know we're linking any shared libaries or if we're producing a shared
library.

We have this issue as well for WPD. It is handled a couple of ways. The
first is that by default only vtables with hidden LTO visibility are
considered. See https://clang.llvm.org/docs/LTOVisibility.html for more
info on that. Secondly, I recently added a mechanism to allow refining the
LTO visibility to hidden at link time if it is known then that the LTO link
is safe from this constraint, leveraging some vcall_visibility metadata
added for another whole program vtable related optimization (Dead Virtual
Function Elimination). See the discussion on the RFC:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html, which
was subsequently implemented upstream with these patches:
D71907: [WPD/VFE] Always emit vcall_visibility metadata for
-fwhole-program-vtables
D71911: [ThinLTO] Summarize vcall_visibility metadata
D71913: [LTO/WPD] Enable aggressive WPD under LTO option

Thanks,
Teresa

On Mon, Apr 6, 2020 at 8:54 AM Wael Yehia via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Quiet Ping (thanks)
>
>
> Hi,
> There was a mention of optimizing away C++ dynamic_casts in LTO in this
> presentation: https://www.youtube.com/watch?v=Fd3afoM3UOE&t=1306
> I couldn't find any discussion on llvm-dev pertaining to this optimization.
>
> What is the optimization (TL;DR version):
> The tranformation tries to convert a __dynamic_cast function call, into an
> address comparison and VFT-like lookup, when the following conditions are
> met:
>   1. the destination type is a leaf type, i.e. is never derived from
> (similar to C++ final semantics) in the entire program.
>   2. the static type of the expression being casted is a public base
> (potentially multi-base and never private) class of the destination type.
>
> Example:
> Given a the C++ expression:
>    NULL != dynamic_cast<A*>(ptr)   // where B* ptr;
> which coming out of clang would look like so:
>   NULL ! = __dynamic_cast(ptr,
>                           &_ZTI1B, // typeinfo of B, the static type of
> ptr.
>                           &_ZTI1A, // typeinfo of A, the destination type.
>                           hint)    // a static hint about the location of
> the source subobject w.r.t the complete object.
>
> If the above conditions can be proven to be true, then an equivalent
> expression is:
>     (destType == dynamicType) where: std::typeinfo *destType = &_ZTI1A;
>                                      std::typeinfo *dynamicType =
> ((void**)ptr)[-1];
>
>
> Detailed description:
> A C++ dynamic_cast<A*>(ptr) expression can either
>     (1) be folded by the FE into a static_cast, or
>  or (2) converted to a runtime call to __dynamic_cast if the FE does not
> have enough information (which is the common case for dynamic_cast).
>
> The crux of the transformation is trying to prove that a type is a leaf.
> We utilize the !type metadata (https://llvm.org/docs/TypeMetadata.html)
> that is attached to the virtual function table (VFT) globals to answer this
> question.
> For each VFT, the !type MD lists the other VFTs that are "compatible" with
> it. In general, the VFT of a class B is considered to be "compatible" with
> the VFT of a class A, iff A derives (publicly or privately) from B.
> This means that the VFT of a leaf class type is never compatible with any
> other VFT, and we use this fact to decide which type is a leaf.
> The second fact that we need to prove is the accessibility of the base
> type in the derived object.
> Unfortunately we couldn't find a way to compute this information from the
> existing IR, and had to introduce a custom attribute that the Frontend
> would place on the __dynamic_cast call. The presence of the attribute
> implies that the static type (B in our example) is a public base class and
> never a private base class (in case there are multiple subobjects of the
> static_type inside the complete object) of the destination type (A in our
> example). Hence, if the attribute gets deleted by some pass, our
> transformation will simply do nothing for that __dynamic_cast call.
>
> There are two issues that I could think of that might cause a problem in
> our approach:
>  1. the !type MD gets removed by some pass which will erase the evidence
> that class types, corresponding to the VFTs that were listed in the MD, are
> non-leaf.
>  2. the supposedly leaf class is actually derived from in a shared
> library, and the transformation would become invalid.
>     I'm hoping this problem is not unique to my situation, and there must
> be an existing solution to such a scenario. For example, bail out if we
> know we're linking any shared libaries or if we're producing a shared
> library.
>
> Questions:
> 1. Is there interest in adding such an optimization pass to the LTO
> pipeline?
> 2. We implemented the optimization locally and are interested in
> upstreaming it. However, from what I read the community prefers that we
> don't just post a patch and expect it to be reviewed and approved. So this
> RFC is to get comments on the approach we've taken and whether there's room
> for improvement (if the approach was correct).
> Specifically I would appreciate comments from people from the AMD compiler
> since they are the ones who presented the optimization.
>
> Thanks.
>
> Wael Yehia
> Compiler Development
> IBM Canada Lab
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200406/b333fa74/attachment.html>