On 3 October 2012 13:30, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="HOEnZb"><div class="h5">On Wed, Oct 3, 2012 at 12:01 PM, Matthieu Monrocq<br>

<<a href="mailto:matthieu.monrocq@gmail.com">matthieu.monrocq@gmail.com</a>> wrote:<br>

> On Sun, Sep 30, 2012 at 5:04 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>

>> I'm not sure whether this is the exact problem at hand in your example,<br>

>> but one of the major hurdles llvm suffers when trying to devirtualize<br>

>> is the second point you made: it doesn't see the invariance of the<br>

>> table pointer post construction. In your specific example the<br>

>> constructors are trivial and inlinable so it I'm not sure why llvm<br>

>> would be having trouble proving the value of the vptr &<br>

>> devirtualizing. , perhaps due to them being static so the<br>

>> initialization is contingent on it being the first call (& llvm doesn't<br>

>> know that the vptr is constant after construction until destruction<br>

>> begins) & doesn't see the connection across all calls.<br>

>><br>

>> So there are a few issues that need to be addressed to help this.<br>

>><br>

>> One is some kind of constant range where the front end can promise not<br>

>> to modify certain values for a range (this might not be correct though<br>

>> - I remember some argument as to whether it was valid to destroy and<br>

>> then placement new the same type into that memory before the object<br>

>> goes out of scope - if so, then it's not obvious if the vptr is<br>

>> constant in any range) & secondly (at least for the case of non inline<br>

>> constructors) the ability to provide some kind of assert that, post<br>

>> construction, the vptr is equal to some known constant. (this assertion<br>

>> is currently possible with a branch to unreachable, except llvm throws<br>

>> out the unreachable code in SimplyCFG before any optimizations run - so<br>

>> we need to see if we can keep them around longer - there are some PRs<br>

>> filed for this but I haven't made much progress on it<br>

>> From: Matthieu Monrocq<br>

>> Sent: 9/29/2012 4:41 PM<br>

>> To: <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a>; llvmdev<br>

>> Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM<br>

>> Hello,<br>

>><br>

>> at the moment the devirtualization of calls is performed in Clang (as<br>

>> far as I understand) whilst inlining and constant propagation are the<br>

>> optimizer's (LLVM) job.<br>

>><br>

>> It is probably necessary for Clang to perform "some" devirtualization<br>

>> (the meaning of `final` is not known to LLVM), however all the stuff<br>

>> to determine what the dynamic type of a variable is seems redundant<br>

>> with LLVM, and is incomplete (in a way) when it's not perform *after*<br>

>> inlining and constant propagation.<br>

>><br>

>><br>

>> It seems to me therefore that because of this we are missing<br>

>> optimization opportunities. Suppose the following example program:<br>

>><br>

>> #include <cstdio><br>

>><br>

>> struct Base { virtual void foo() = 0; };<br>

>><br>

>> struct NothingDerived: Base { virtual void foo() {} };<br>

>><br>

>> struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } };<br>

>><br>

>> Base& select(int i) {<br>

>>     static NothingDerived nd;<br>

>>     static PrintDerived pd;<br>

>>     if (i % 2 == 0) { return nd; }<br>

>>     return pd;<br>

>> }<br>

>><br>

>> int main() {<br>

>>     Base& b = select(0);<br>

>>     b.foo();<br>

>> }<br>

>><br>

>><br>

>> Which gives the following main function (using Try out LLVM and Clang):<br>

>><br>

>> define i32 @main() uwtable {<br>

>> [...]<br>

>><br>

>> _Z6selecti.exit:                                  ; preds = %13, %10, %7<br>

>>   %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived*<br>

>> @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8<br>

>>   %15 = load void (%struct.Base*)** %14, align 8<br>

>>   tail call void %15(%struct.Base* getelementptr inbounds<br>

>> (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0))<br>

>>   ret i32 0<br>

>> }<br>

>><br>

>><br>

>> LLVM trivially sees through the select call and rightly deduce that we<br>

>> have to do with NothingDerived. However it does not go any step<br>

>> further and directly select NothingDerived::foo's function. Instead it<br>

>> dutifully performs all the bitcasting / pointer arithmetic necessary<br>

>> to access the pointer to function stored in the VTable and calls it<br>

>> through a pointer to function.<br>

>><br>

>> I understand it would be awkward to have LLVM be aware of the virtual<br>

>> table implementation. Especially since even in C++ it varies from one<br>

>> implementation to another. However it seems to me that LLVM could<br>

>> still perform this optimization:<br>

>><br>

>>  - LLVM having deduced the exact value to use (select(int)::nd) should<br>

>> be able to get directly to its v-ptr (the first field of Base)<br>

>><br>

>> %struct.NothingDerived = type { %struct.Base }<br>

>> %struct.Base = type { i32 (...)** }<br>

>><br>

>><br>

>>  - the v-ptr (after construction) always point to the same v-table,<br>

>> which is a constant<br>

>><br>

>> store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*<br>

>> @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)***<br>

>> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64<br>

>> 0, i32 0, i32 0), align 8<br>

>><br>

>><br>

>> - the offset in the v-table is "static"<br>

>><br>

>> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)<br>

>><br>

>><br>

>> - the v-table being constant, what is stored at that offset is<br>

>> perfectly deducible as well<br>

>><br>

>> @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*]<br>

>> [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to<br>

>> i8*), i8* bitcast (void (%struct.NothingDerived*)*<br>

>> @_ZN14NothingDerived3fooEv to i8*)]<br>

>><br>

>><br>

>> So the question is, what is lacking for LLVM to perform this optimization ?<br>

>><br>

>> - Is it because of the loss of information in having the v-table<br>

>> stored as a "blob" of bytes ? (which means that Clang should pass more<br>

>> typed information, without changing the exact layout obviously given<br>

>> the ABI constraints)<br>

>><br>

>> - Or is it something internal to LLVM (the information is somehow<br>

>> irremediably lost) ?<br>

>><br>

>><br>

>> I admit that reducing the virtual call overhead is probably not really<br>

>> worth it (in general), however devirtualizing calls also expose more<br>

>> inlining/context opportunities and it's hard (for me) to quantify what<br>

>> such an optimization could bring here. We should also consider the<br>

>> simplification in Clang (and other frontends) if LLVM could perform<br>

>> the job on its own.<br>

>><br>

>> -- Matthieu<br>

>> _______________________________________________<br>

>> cfe-dev mailing list<br>

>> <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

><br>

> Hello David,<br>

><br>

> I have been thinking a bit more about this and as I find the problem<br>

> quite interesting.<br>

><br>

> If I get down one level (and revert to implementing objects in C):<br>

><br>

> #include <stdio.h><br>

><br>

> typedef void (*Function)();<br>

><br>

> void print() { printf("Hello, World!"); }<br>

> void nothing() {}<br>

><br>

> Function get(int i) {<br>

>   if (i % 2 == 0) { return &print; }<br>

>   return &nothing;<br>

> }<br>

><br>

> int main() {<br>

>   Function f = get(2);<br>

>   (*f)();<br>

>   return 0;<br>

> }<br>

><br>

> Which is admittedly quite similar, generates the following main:<br>

><br>

> define i32 @main() nounwind uwtable {<br>

>   %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds<br>

> ([14 x i8]* @.str, i64 0, i64 0)) nounwind<br>

>   ret i32 0<br>

> }<br>

><br>

> This falls out naturally from SSA form, I think. No "constification"<br>

> is necessary.<br>

><br>

> However in our case, it seems that the SSA form (which only references<br>

> the pointer to the structure itself, not the v-table pointer), is not<br>

> sufficient therefore to allow the optimizer to remember which value<br>

> the pointer should have.<br>

><br>

> Of course, in general I agree that the compiler should be told<br>

> explicitly that the value cannot change (otherwise it would not know<br>

> that opaque calls are not accessing that particular value within the<br>

> structure); at least until the start of the destructor.<br>

><br>

> I remember a discussion a while ago involving the design and<br>

> implementation of those "const-ranges" within the LLVM IR, do you<br>

> happen to know where it's at ?<br>

<br>

</div></div>I'm not sure where that is, but I've +nlewycky because he's been<br>

thinking about this recently & there are some wrinkles as I alluded to<br>

in my reply. Saying that the vtable pointer is const from<br>

post-construction to pre-destruction isn't quite correct & would break<br>

certain obscure but valid programs. That being said, there is some<br>

kind of lifetime marker we could create & use to indicate the<br>

semantics of the vtable pointer without breaking those programs. (key<br>

here is that C++ cares about how you derive the pointer to the object,<br>

not just its value - if you explicitly destroy an object then<br>

placement new another object over the same space, the implementation<br>

is allowed to assume that the old pointers you had to the object still<br>

point to the same kind of object - but the one returned from placement<br>

new isn't, even though those pointers may have the same value...<br>

that's my understanding/vague description of the issue)<br></blockquote><div><br></div><div>Right. I've got a proposal in PR13940 that Chris has asked me to mail out to llvmdev. I should do that soon, but I wanted to wait until I would have the time to reply to the comments that came up and possibly even implement the result.</div>


<div><br></div><div>Nick</div><div><br></div></div>