[cfe-dev] Virtual function call optimization(memoization) questions

Tue Jun 16 11:47:02 PDT 2020

On Tue, 16 Jun 2020, 11:44 Richard Smith, <richard at metafoo.co.uk> wrote:

> On Mon, 15 Jun 2020, 23:31 Ninu-Ciprian Marginean via cfe-dev, <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hello,
>>
>> I want to investigate if there are any possibilities of optimizing
>> virtual functions calls. From my knowledge and reading I understand that
>> the overhead for this is two pointer dereferences. I know about
>> alternatives like CRTP and std::variant, but for this investigation, I'm
>> interested only in traditional, dynamic polymorphism. One of my ideas is to
>> use some form of caching of the computation of the address of the actual
>> method that gets called. Obviously we cannot always do this, but there are
>> some cases where we could.
>>
>> One example:
>>
>> I have a virtual function call in a loop. All the time, the same method
>> is called:
>> https://godbolt.org/z/WFp2rm
>>
>> The loop is in method work; the virtual call is to method id.
>>
>> We can see that method work gets inlined, but inside the loop, there are
>> always two pointer dereferences:
>> mov     rax, qword ptr [r14]
>> call    qword ptr [rax]
>>
>> Since the object referred to by b, never changes to a different object,
>> this could(at least in this case), be cached.
>>
>> My assembly might be rusty, but before the loop, we could have:
>> mov     r13, qword ptr [r14]
>> mov     r13, qword ptr [r13]
>>
>> and inside the loop we would only have:
>>
>> call    r13
>>
>> *My questions are:*
>> Do we have a mechanism in C++ to explicitly store the result of the
>> lookup in the vtable without additional overhead? Some sort of cache for
>> this result so that we do not do the same computation over and over again?
>> I'm specifically looking for this solution, not alternatives to dynamic
>> polymorphism like CRTP or std::variant. I couldn't find one.
>> For functional programming style "pure functions", we would have:
>>     int res = pure_function();
>>     while(true) use(res);
>> instead of
>>     while(true) use(pure_function());
>>
>> Is there anything in the C++ standard that prevents such an optimization?
>>
>
> The optimization is valid, and clang performs it under
> -fstrict-vtable-pointers (https://godbolt.org/z/SrlUC8). Unfortunately,
> there are still some cases where this optimization can regress performance
> (the annotations that the frontend inserts to enable the optimization can
> get in the way of other transformations), so it's not enabled by default
> yet.
>
How would we identify the cases in which such an optimization is possible
>> and the ones in which it is not?
>>
>> Are there any other reasons for which such an optimization would not be
>> desired?
>>
>>
>> N.B.: I realize the last two questions might be difficult to answer, but
>> could you at least point me in the right direction for investigating this
>> myself?
>>
>
If you want more information, this research paper may be a good place to
start: https://arxiv.org/pdf/2003.04228.pdf

Thanks,
>> Ninu.
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200616/aca64d76/attachment.html>