[PATCH] D107173: [CSSPGO] Introduce MD5-based context-sensitive profile

Thu Aug 5 11:47:56 PDT 2021

hoy added a comment.

In D107173#2927490 <https://reviews.llvm.org/D107173#2927490>, @wmi wrote:

> I tried the idea
>
> In D107173#2920842 <https://reviews.llvm.org/D107173#2920842>, @wmi wrote:
>
>> In D107173#2920602 <https://reviews.llvm.org/D107173#2920602>, @hoy wrote:
>>
>>> In D107173#2920521 <https://reviews.llvm.org/D107173#2920521>, @wmi wrote:
>>>
>>>> In D107173#2920321 <https://reviews.llvm.org/D107173#2920321>, @wenlei wrote:
>>>>
>>>>> Thanks for working on this.
>>>>>
>>>>> Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.
>>>>
>>>> Right, D107173 <https://reviews.llvm.org/D107173> in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299 <https://reviews.llvm.org/D107299>, I believe that issue can also be solved after I add md5 support to it.
>>>>
>>>> Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?
>>>>
>>>> By the way, the context reconstructing time may also be reduced if D107299 <https://reviews.llvm.org/D107299> can be coupled with md5.
>>>
>>> @wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?
>>
>> I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.
>>
>> I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.
>>
>>> I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a `stringRef` field in `SampleContext`, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.
>>
>> Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.
>
> I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107173/new/

https://reviews.llvm.org/D107173