[llvm] [IRPGO][ValueProfile] Instrument virtual table address that could be used to do virtual table address comparision for indirect-call-promotion. (PR #66825)

Wed Oct 11 16:49:58 PDT 2023

htyu wrote:

> It's interesting that you are exploring a profile-guided direction for this. Chatelet et. al (from Google) published a paper on [automatic generation of memcpy](https://conf.researchr.org/details/ismm-2021/ismm-2021/4/automemcpy-A-framework-for-automatic-generation-of-fundamental-memory-operations) which uses PMU based parameter profiling at ISMM'21. The technique does not use Intel DLA instead we use precise sampling on call instructions in the process and filter the functions of interest. We inspect the RDX register to collect the parameter value for size.

Very interesting to know about this work. Thanks for sharing! The profiling technique sounds like exactly what we use, and the memcpy scheme we use, which is based on the `folly::memcpy` (https://github.com/facebook/folly/blob/main/folly/memcpy.S), is also close to what the paper proposed. We did not try the path to adaptively generate a memcpy implementation based on the profile, instead, we prioritize inlining the memcpy code for a specific range based on the hotness of that range in a particular inline context. The context-sensitivity matters as it can help prioritize a range out of a flat range distribution when contextless. 

We don't see that much of perf win (1%) seen in the paper with our approach. I guess part of the reason is that the `folly::map` that is widely used in our fleet already does a better job than glibc.

I haven't read through the paper yet, but I guess the purpose of dynamically generating memcpy helper is to favor a particular memory range distribution? How much difference do you see across services?

> Extensions to the AutoFDO format to accommodate such hints sounds good. Happy to collaborate on the design which can be leveraged by future work. Perhaps start a separate issue for discussion on Github or a thread on Discourse?

A thread on Discourse sounds good!  

https://github.com/llvm/llvm-project/pull/66825