[PATCH] D108435: [CSSPGO] split context string II - reader/writer changes

Mon Aug 30 13:33:31 PDT 2021

wenlei added inline comments.

================
Comment at: llvm/lib/ProfileData/SampleProfReader.cpp:326-327
+        // otherwise it's treated as context-less function name only.
+        auto Context = createContext(FName);
+        FContext.setContext(Context);
         ++CSProfileCount;
----------------
hoy wrote:
> wenlei wrote:
> > hoy wrote:
> > > wenlei wrote:
> > > > hoy wrote:
> > > > > wenlei wrote:
> > > > > > hoy wrote:
> > > > > > > wenlei wrote:
> > > > > > > > Even though we only need to deal with string context in profile reader/writer for text profile, it's probably still cleaner to keep all string context related parsing into SampleContext. 
> > > > > > > > 
> > > > > > > > `createContext` is more like a ctor. I'd prefer keep string decoding, createContext in its original place in SampleContext. That way, we can construct a context from string, SampleContext::setContext remain a private helper too, and the logic here can be simpler, just like before.
> > > > > > > > 
> > > > > > > > SampleContext does still have getContextString and toString, so it's not really isolated from string representation, might as well keep all string stuff together there for consistency. 
> > > > > > > Makes sense. Moved back to SampleContext.
> > > > > > Thanks. Can you move it back in D107299, so we don't see the change back and forth, just for review?
> > > > > > 
> > > > > > Can we also fold `FName.startswith("[")` in to SampleContext? 
> > > > > > 
> > > > > > Additionally, why we do need a `createContextFromString` instead of using overload ctors? This is also inconsistent between how SampleContext is created from text profile (createContextFromString) vs from binary profile (ctor). What I was thinking is have SampleContext decide how to create the object, so there's no changed needed here, just like before. 
> > > > > Constructing a CS context will require additional parameter than non-CS profile, especially the underlying context vector. That is causing the inconsistency with the context split work and I'm separating the construction of CS and non-CS contexts. Currently CS context is only constructible from a `SampleContextFrameVector`. 
> > > > > 
> > > > > Another reason is that I was hoping to construct `SampleContext` in a quick manner from `StringRef` to favor non-CS profile.
> > > > > 
> > > > > 
> > > > > Can you move it back in D107299, so we don't see the change back and forth, just for review?
> > > > 
> > > > Sorry my bad, I meant to say D108433. 
> > > > 
> > > > I think the key blocker for everything to be taken care from within ctor is that you need reader to own the context created from the string. How about having a ctor with StringRef and CSNameTable as parameter - it puts the context into CSNameTable (owned by reader) for CS profile, and the CSNameTable would be ignored for non-CS profile. 
> > > > 
> > > > The current implementation works, but reader has to be aware of the actual string representation of context. I thought it'd be cleaner if such representation is all dealt with from within SampleContext. Currently, it's indeed mostly handled by SampleContext, except it's bleeding into reader here. 
> > > > 
> > > > 
> > > Yeah, that's why I originally put everything related to context string parsing in the reader. I thought they were related to profile-specific representation that `SampleContext` shouldn't care. 
> > > 
> > > I guess moving the parsing code back is better for extension and code sharing, when we have new module doing the same thing in the future.  Adding a new constructor with StringRef and CSNameTable as parameter can work but feel like it's easier and more clear for the reader to know what it constructs. Hiding that from the reader is fine, but asking the reader to provide an underlying CS name table which may not be needed for non-CS sounds a bit confusing. That also kinds of exposes reader-specific implementation to `SampleContext`.
> > > 
> > > I think having the reader be aware of the specific representation might be reasonable, since context representation is a part of the profile format. What do you think?
> > > 
> > > 
> > > 
> > > 
> > > asking the reader to provide an underlying CS name table which may not be needed for non-CS sounds a bit confusing. That also kinds of exposes reader-specific implementation to SampleContext. 
> > 
> > I don't see this as exposing reader-specific stuff to sample context. It'd just be the expectation of the sample context API that requires a buffer to hold newly created context. 
> > 
> > This is in essence no different from how string buffer is passed in getRepInFormat (and only used for MD5), and I don't see it as coupling between reader and context.  
> > 
> > > I think having the reader be aware of the specific representation might be reasonable, since context representation is a part of the profile format. 
> > 
> > I think the actual string presentation of context is not tied to the format, but rather the format is using the string representation directly. Also if string representation is considered part of format, then all of the string related stuff should be part of reader (like your earlier change). The possible reusing of string functions you mentioned also shows that the string representation is not format-specific.
> > This is in essence no different from how string buffer is passed in getRepInFormat (and only used for MD5), and I don't see it as coupling between reader and context.
> 
> `getRepInFormat` takes a string buffer instead of a string table which is not coupled with the reader. However, `CSNameTable` is an implementation detail of the reader. Having that populated and updated by `SampleContext` sounds like a coupling with the reader. E.g, from `SampleContext` point of view, it is questionable why it has to update a std::list but not a std::vector or std::set.
> 
> We can pass in a SampleContextFrameVector to the new ctor, and similar with the callsites of `getRepInFormat`, the reader should know where to place it, just like the current code. But then the reader has to redo some checks in the ctor. Anyway, feels that the reader will need to check if it is constructing a CS context, if we don't expose `CSNameTable` to `SampleContext`.
> 
> Since the reader is currently the only consumer, maybe just keep it in the reader for now? When we have a new user we can then decide what kind of new ctor to make? 
Ok, I think we've probably spent too much time on this. But bear with me a bit more, still hope we can reach a consensus. :) That said, whatever we choose to do, I don't agree with your reasoning above.

> However, CSNameTable is an implementation detail of the reader. 

Look at it as buffer, and in this case reader happens to own and provide that buffer. 

> Having that populated and updated by SampleContext sounds like a coupling with the reader. E.g, from SampleContext point of view, it is questionable why it has to update a std::list but not a std::vector or std::set.

I don't think this is really a concern, nor is it a coupling tbh. Regardless of what reader does, what makes sense for a buffer type? I think std::list simply makes sense as it avoid reallocating. Yes, reader uses std::list, but this is just a choice that makes sense in general for a buffer. If we go down this level, sure every single API call is a coupling because every param has a type. 

The way I look at this - sample context abstracts away the string representation (again it's independent of format, and text format just uses that string representation directly), and reader resort to sample context for anything related to that. It's probably not so important to have that logical layering enforced, but if the cost of doing that is small, why not? Clear layering makes it easy to maintain and reason about.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108435/new/

https://reviews.llvm.org/D108435