<div dir="ltr"><div style>I'm looking into how we can improve devirtualization in clang, and there a language in C++ feature I'd like to take advantage of which would let us perform elimination of more vptr loads. In this code:</div>
<br> Cls *p = new Cls;<br> p->virtual_method1();<br> p->method_changing_vptr(); // uses placement new to legally change the vptr<br> p->virtual_method2(); // invalid!<br> Cls *q = p;<br> q->virtual_method2(); // this must get a new vptr lookup.<br>
<br>there is no need to reload p's vptr, even if the method did update the vptr.<br><br>C++ [basic.life] gives us a guarantee that a pointer will only update to point to a new object allocated at the same place in memory under certain circumstances. If the C++ code uses the same pointer, reference or name to refer to the object, then we can prove that the vptr and any const members did not change.<br>
<br>I'd like clang to compute whether a load is eligible for this treatment per the rules in C++, and encode that in LLVM IR for further optimization. (Note that this is different from @llvm.invariant because method_changing_vptr_through_placement_new may be inlined and is required to see the updated vptr.)<br>
<br>To implement this, I propose a new intrinsic in LLVM:<br><br> declare {}* @llvm.load.group()<br><br>and new metadata on loads:<br><br> !load.group %group<br><br>where %group must be the result of a call to llvm.load.group. Any two loads with the same %group value are known to produce the same value. We can then choose to eliminate the latter of the loads as redundant in GVN, or in the event of high register pressure we could choose to reload without spilling to the stack.<br>
<br>For clang, let us say that two expressions E1 and E2 of the same type denote the same value if:<br><br> * E1 and E2 both name the same variable, which is either of class or reference-to-class type or const pointer-to-class type, or is of non-const pointer type and is known to have not changed between the evaluations of E1 and E2. (By introductory text in [basic.life]).<br>
* E1 and E2 are of the form E3.x and E4.x, or E3->x and E4->x, or E3[x] and E4[x], for the same x, and E3 and E4 denote the same value, and the denoted subobject either has a const-qualified type or is a reference. (By bullet 3, ibid).<br>
* (Fudging a bit on E1 and E2 being expressions...) E1 and E2 are both references to the same vptr slot. (By bullet 2 or 4, ibid).<br><div><br></div><div style>Let me know if you think this design can be improved, or if there are cases it doesn't handle (or gets wrong). An explicit non-goal is the "constructor not defined in TU" problem.</div>
<div style><br></div><div style>Nick</div></div>