[PATCH] D108435: [CSSPGO] split context string II - reader/writer changes

Sun Aug 29 19:07:45 PDT 2021

hoy added inline comments.

================
Comment at: llvm/include/llvm/ProfileData/SampleProfReader.h:135
 //
 // Binary format
 // -------------
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > In this section, we need to add documentation/spec for CSNameTable how context is stored, and how it interacts with NameTable.
> > CSNameTable is a concept of the extensible binary format, not the basic binary format, like other concepts such as func name offset table, func metadata table. We could comment about CSNameTable in the `SampleProfileReaderExtBinary/SampleProfileWriterExtBinary`, but I didn't find a good centralized place to do that. Seems that descriptions about other sections are scattered around the implementation code.
> Ok, didn't realize we don't use header comment for extbin. That's fine. Yeah, it'd be good to have some comment for CSNameTable somewhere to explain the nesting structure of context and raw names. 
Comment added to the definition of `SampleProfileReaderExtBinaryBase::readCSNameTableSec`.

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:731
   for (uint32_t I = 0; I < *Size; ++I) {
-    auto FName(readStringFromTable());
+    auto FName(readNameFromTable());
     if (std::error_code EC = FName.getError())
----------------
wenlei wrote:
> FName to FContext as well? 
Oops, this was fixed in my other unified patch, somehow got dropped when rebasing. Will send an update to the other patch as you suggested.

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:326-327
+        // otherwise it's treated as context-less function name only.
+        auto Context = createContext(FName);
+        FContext.setContext(Context);
         ++CSProfileCount;
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > Even though we only need to deal with string context in profile reader/writer for text profile, it's probably still cleaner to keep all string context related parsing into SampleContext. 
> > > 
> > > `createContext` is more like a ctor. I'd prefer keep string decoding, createContext in its original place in SampleContext. That way, we can construct a context from string, SampleContext::setContext remain a private helper too, and the logic here can be simpler, just like before.
> > > 
> > > SampleContext does still have getContextString and toString, so it's not really isolated from string representation, might as well keep all string stuff together there for consistency. 
> > Makes sense. Moved back to SampleContext.
> Thanks. Can you move it back in D107299, so we don't see the change back and forth, just for review?
> 
> Can we also fold `FName.startswith("[")` in to SampleContext? 
> 
> Additionally, why we do need a `createContextFromString` instead of using overload ctors? This is also inconsistent between how SampleContext is created from text profile (createContextFromString) vs from binary profile (ctor). What I was thinking is have SampleContext decide how to create the object, so there's no changed needed here, just like before. 
Constructing a CS context will require additional parameter than non-CS profile, especially the underlying context vector. That is causing the inconsistency with the context split work and I'm separating the construction of CS and non-CS contexts. Currently CS context is only constructible from a `SampleContextFrameVector`. 

Another reason is that I was hoping to construct `SampleContext` in a quick manner from `StringRef` to favor non-CS profile.

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:829-846
       for (auto Name : FuncOffsetTable) {
         OrderedNames.insert(Name.first);
       }

       // For each function in current module, load all
       // context profiles for the function.
       for (auto NameOffset : FuncOffsetTable) {
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > Which operation here is the most costly and visible for e2e build time? The insert/sort/find or the prefix check? What's the % of time here for both prelink and postlink?
> > The insert/sort/find operations are the most expansive. For thinlto postlink, they count to 15% to 20% of the whole backend running time. I haven't measured for prelink but given the similarity of prelink and postlink, they should be expansive there too.
> > 
> > I'm actually working on a sample writer change that emits the func offset table in the order of contexts so that we don't need the set operations here. That turns out very effective. Will send out a separate diff for it.
> > I'm actually working on a sample writer change that emits the func offset table in the order of contexts so that we don't need the set operations here. That turns out very effective. 
> 
> Great. I was thinking about exactly that too. The order of binary profile is opaque to users, so we can order them in the file to save sorting on reading. 
> 
> > For thinlto postlink, they count to 15% to 20% of the whole backend running time.  
> 
> What was the % before this work when we were all using StringRef? 
> What was the % before this work when we were all using StringRef?
It's not noticeable. Actually I didn't see the previous sorting as a hot routine in the profile.  With the presorted func offset stable, we are able to achieve similar performance with the previous approach.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108435/new/

https://reviews.llvm.org/D108435