[cfe-dev] Virtual function call optimization(memoization) questions

Ninu-Ciprian Marginean via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 15 15:23:45 PDT 2020


Hello,

I want to investigate if there are any possibilities of optimizing virtual
functions calls. From my knowledge and reading I understand that the
overhead for this is two pointer dereferences. I know about alternatives
like CRTP and std::variant, but for this investigation, I'm interested only
in traditional, dynamic polymorphism. One of my ideas is to use some form
of caching of the computation of the address of the actual method that gets
called. Obviously we cannot always do this, but there are some cases where
we could.

One example:

I have a virtual function call in a loop. All the time, the same method is
called:
https://godbolt.org/z/WFp2rm

The loop is in method work; the virtual call is to method id.

We can see that method work gets inlined, but inside the loop, there are
always two pointer dereferences:
mov     rax, qword ptr [r14]
call    qword ptr [rax]

Since the object referred to by b, never changes to a different object,
this could(at least in this case), be cached.

My assembly might be rusty, but before the loop, we could have:
mov     r13, qword ptr [r14]
mov     r13, qword ptr [r13]

and inside the loop we would only have:

call    r13

*My questions are:*
Do we have a mechanism in C++ to explicitly store the result of the lookup
in the vtable without additional overhead? Some sort of cache for this
result so that we do not do the same computation over and over again? I'm
specifically looking for this solution, not alternatives to dynamic
polymorphism like CRTP or std::variant. I couldn't find one.
For functional programming style "pure functions", we would have:
    int res = pure_function();
    while(true) use(res);
instead of
    while(true) use(pure_function());

Is there anything in the C++ standard that prevents such an optimization?

How would we identify the cases in which such an optimization is possible
and the ones in which it is not?

Are there any other reasons for which such an optimization would not be
desired?


N.B.: I realize the last two questions might be difficult to answer, but
could you at least point me in the right direction for investigating this
myself?

Thanks,
Ninu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200616/0f5aef0e/attachment.html>


More information about the cfe-dev mailing list