<html><head></head><body><div class="ydp5634c5e4yahoo-style-wrap" style="font-family: Helvetica Neue, Helvetica, Arial, sans-serif; font-size: 13px;"><div></div>

        <div dir="ltr" data-setdir="false">Quiet Ping (thanks)</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><div><br>Hi,<br>There was a mention of optimizing away C++ dynamic_casts in LTO in this presentation: https://www.youtube.com/watch?v=Fd3afoM3UOE&t=1306<br>I couldn't find any discussion on llvm-dev pertaining to this optimization.<br><br>What is the optimization (TL;DR version):<br>The tranformation tries to convert a __dynamic_cast function call, into an address comparison and VFT-like lookup, when the following conditions are met:<br>  1. the destination type is a leaf type, i.e. is never derived from (similar to C++ final semantics) in the entire program.<br>  2. the static type of the expression being casted is a public base (potentially multi-base and never private) class of the destination type.<br><br>Example:<br>Given a the C++ expression:<br>   NULL != dynamic_cast<A*>(ptr)   // where B* ptr;<br>which coming out of clang would look like so:<br>  NULL ! = __dynamic_cast(ptr,<br>                          &_ZTI1B, // typeinfo of B, the static type of ptr.<br>                          &_ZTI1A, // typeinfo of A, the destination type.<br>                          hint)    // a static hint about the location of the source subobject w.r.t the complete object.<br>               <br>If the above conditions can be proven to be true, then an equivalent expression is:<br>    (destType == dynamicType) where: std::typeinfo *destType = &_ZTI1A;<br>                                     std::typeinfo *dynamicType = ((void**)ptr)[-1];<br> <br><br>Detailed description:<br>A C++ dynamic_cast<A*>(ptr) expression can either<br>    (1) be folded by the FE into a static_cast, or<br> or (2) converted to a runtime call to __dynamic_cast if the FE does not have enough information (which is the common case for dynamic_cast).<br><br>The crux of the transformation is trying to prove that a type is a leaf.<br>We utilize the !type metadata (https://llvm.org/docs/TypeMetadata.html) that is attached to the virtual function table (VFT) globals to answer this question.<br>For each VFT, the !type MD lists the other VFTs that are "compatible" with it. In general, the VFT of a class B is considered to be "compatible" with the VFT of a class A, iff A derives (publicly or privately) from B.<br>This means that the VFT of a leaf class type is never compatible with any other VFT, and we use this fact to decide which type is a leaf.<br>The second fact that we need to prove is the accessibility of the base type in the derived object.<br>Unfortunately we couldn't find a way to compute this information from the existing IR, and had to introduce a custom attribute that the Frontend would place on the __dynamic_cast call. The presence of the attribute implies that the static type (B in our example) is a public base class and never a private base class (in case there are multiple subobjects of the static_type inside the complete object) of the destination type (A in our example). Hence, if the attribute gets deleted by some pass, our transformation will simply do nothing for that __dynamic_cast call.<br><br>There are two issues that I could think of that might cause a problem in our approach:<br> 1. the !type MD gets removed by some pass which will erase the evidence that class types, corresponding to the VFTs that were listed in the MD, are non-leaf.<br> 2. the supposedly leaf class is actually derived from in a shared library, and the transformation would become invalid.<br>    I'm hoping this problem is not unique to my situation, and there must be an existing solution to such a scenario. For example, bail out if we know we're linking any shared libaries or if we're producing a shared library.<br><br>Questions:<br>1. Is there interest in adding such an optimization pass to the LTO pipeline?<br>2. We implemented the optimization locally and are interested in upstreaming it. However, from what I read the community prefers that we don't just post a patch and expect it to be reviewed and approved. So this RFC is to get comments on the approach we've taken and whether there's room for improvement (if the approach was correct).<br>Specifically I would appreciate comments from people from the AMD compiler since they are the ones who presented the optimization.<br><br>Thanks.<br><br>Wael Yehia<br>Compiler Development<br>IBM Canada Lab<br><br></div><div><br></div></div></div></body></html>