[PATCH] D107173: [CSSPGO] Introduce MD5-based context-sensitive profile
Wei Mi via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 3 10:02:14 PDT 2021
wmi added a comment.
In D107173#2921423 <https://reviews.llvm.org/D107173#2921423>, @wenlei wrote:
> In D107173#2920842 <https://reviews.llvm.org/D107173#2920842>, @wmi wrote:
>
>> In D107173#2920602 <https://reviews.llvm.org/D107173#2920602>, @hoy wrote:
>>
>>> In D107173#2920521 <https://reviews.llvm.org/D107173#2920521>, @wmi wrote:
>>>
>>>> In D107173#2920321 <https://reviews.llvm.org/D107173#2920321>, @wenlei wrote:
>>>>
>>>>> Thanks for working on this.
>>>>>
>>>>> Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.
>>>>
>>>> Right, D107173 <https://reviews.llvm.org/D107173> in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299 <https://reviews.llvm.org/D107299>, I believe that issue can also be solved after I add md5 support to it.
>>>>
>>>> Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?
>>>>
>>>> By the way, the context reconstructing time may also be reduced if D107299 <https://reviews.llvm.org/D107299> can be coupled with md5.
>>>
>>> @wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?
>>
>> I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.
>>
>> I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.
>
> Curious were you trying probe-based CS profile or line-based CS profile?
I was trying probe-based CS profile.
>>> I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a `stringRef` field in `SampleContext`, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.
>>
>> Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.
>
> Yeah, eliminate round-trip conversion is going to help. One of the reason we choose to store context string as a single string is trying to accommodate how profiles are being kept in profile reader/write using StringMap. If we store context strings in decomposed form in profile file, supporting corresponding decomposed context representation as first-class citizen inside compiler to avoid conversion would be great.
>
> Now looking at the use of SampleProfileReader::getProfiles again, actually the key (name/context) string is redundant since FunctionSamples contains the function name/context too. StringMap is used for AFDO to help point look up by name. But for CSSPGO, look up is served by SampleContextTracker, and the tracker is built by iterating over all profiles, so full context string as key for the StringMap isn't really useful for CSSPGO, but more of a choice out of consistency.
>
> If we want to skip reconstructing full context string from decomposed context input, we could change the string conversion for SampleContext to be a dummy string (e.g. MD5 string of context?) instead of reconstructed full context string, and use that as the key for StringMap profiles. Or we could change the interface for SampleProfileReader::getProfiles to return something like map<SampleContext, FunctionSamples>&., and we always use a SampleContext object for look up - the main content of SampleContext would be a StringRef (AFDO) or vector of StringRef (CSSPGO). We also need to be able to order SampleContext like before - that is used in loading a profile subtree to prepare for importing.
Thanks for the suggestion. That is very helpful.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D107173/new/
https://reviews.llvm.org/D107173
More information about the llvm-commits
mailing list