[PATCH] D108435: [CSSPGO] split context string II - reader/writer changes

Sun Aug 29 14:42:24 PDT 2021

wenlei added inline comments.

================
Comment at: llvm/include/llvm/ProfileData/SampleProfReader.h:135
 //
 // Binary format
 // -------------
----------------
hoy wrote:
> wenlei wrote:
> > In this section, we need to add documentation/spec for CSNameTable how context is stored, and how it interacts with NameTable.
> CSNameTable is a concept of the extensible binary format, not the basic binary format, like other concepts such as func name offset table, func metadata table. We could comment about CSNameTable in the `SampleProfileReaderExtBinary/SampleProfileWriterExtBinary`, but I didn't find a good centralized place to do that. Seems that descriptions about other sections are scattered around the implementation code.
Ok, didn't realize we don't use header comment for extbin. That's fine. Yeah, it'd be good to have some comment for CSNameTable somewhere to explain the nesting structure of context and raw names. 

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:731
   for (uint32_t I = 0; I < *Size; ++I) {
-    auto FName(readStringFromTable());
+    auto FName(readNameFromTable());
     if (std::error_code EC = FName.getError())
----------------
FName to FContext as well? 

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:326-327
+        // otherwise it's treated as context-less function name only.
+        auto Context = createContext(FName);
+        FContext.setContext(Context);
         ++CSProfileCount;
----------------
hoy wrote:
> wenlei wrote:
> > Even though we only need to deal with string context in profile reader/writer for text profile, it's probably still cleaner to keep all string context related parsing into SampleContext. 
> > 
> > `createContext` is more like a ctor. I'd prefer keep string decoding, createContext in its original place in SampleContext. That way, we can construct a context from string, SampleContext::setContext remain a private helper too, and the logic here can be simpler, just like before.
> > 
> > SampleContext does still have getContextString and toString, so it's not really isolated from string representation, might as well keep all string stuff together there for consistency. 
> Makes sense. Moved back to SampleContext.
Thanks. Can you move it back in D107299, so we don't see the change back and forth, just for review?

Can we also fold `FName.startswith("[")` in to SampleContext? 

Additionally, why we do need a `createContextFromString` instead of using overload ctors? This is also inconsistent between how SampleContext is created from text profile (createContextFromString) vs from binary profile (ctor). What I was thinking is have SampleContext decide how to create the object, so there's no changed needed here, just like before. 

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:638

-  auto FName(readStringFromTable());
-  if (std::error_code EC = FName.getError())
+  auto FContext(readNameFromTable(IsContextName));
+  if (std::error_code EC = FContext.getError())
----------------
hoy wrote:
> wenlei wrote:
> > nit: this can be confusing, readNameFromTable can mislead people to think FContext is string (or vector of strings) type. the auto also isn't helping. Either spell out the type name, or rename `readNameFromTable` to something like `readSampleContextFromTable`
> > 
> > 
> Fixed by using explicit type.
Ok, but on 2nd thought, why do we call it readName while it's actually returning a context? Especially given that we've renamed addName to AddContext, writeNameIdx to writeContextIdx.

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:829-846
       for (auto Name : FuncOffsetTable) {
         OrderedNames.insert(Name.first);
       }

       // For each function in current module, load all
       // context profiles for the function.
       for (auto NameOffset : FuncOffsetTable) {
----------------
hoy wrote:
> wenlei wrote:
> > Which operation here is the most costly and visible for e2e build time? The insert/sort/find or the prefix check? What's the % of time here for both prelink and postlink?
> The insert/sort/find operations are the most expansive. For thinlto postlink, they count to 15% to 20% of the whole backend running time. I haven't measured for prelink but given the similarity of prelink and postlink, they should be expansive there too.
> 
> I'm actually working on a sample writer change that emits the func offset table in the order of contexts so that we don't need the set operations here. That turns out very effective. Will send out a separate diff for it.
> I'm actually working on a sample writer change that emits the func offset table in the order of contexts so that we don't need the set operations here. That turns out very effective. 

Great. I was thinking about exactly that too. The order of binary profile is opaque to users, so we can order them in the file to save sorting on reading. 

> For thinlto postlink, they count to 15% to 20% of the whole backend running time.  

What was the % before this work when we were all using StringRef? 

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:1117

-    SampleContext FContext(*FName);
-    bool ProfileInMap = Profiles.count(FContext);
+    bool ProfileInMap = Profiles.count(FContext.get());

----------------
hoy wrote:
> wenlei wrote:
> > nit: peal/hoist the `get()` so we don't have to call it for every use.
> You mean save `FContext.get()` in a temp and use it in the loop? Thought the compiler would do it.
Yeah, it's less about optimization but for readability to make the code less verbose. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108435/new/

https://reviews.llvm.org/D108435