[PATCH] D123045: [clang][extract-api] Fix small issues with SymbolGraphSerializer

Victoria Mitchell via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 5 07:37:07 PDT 2022

QuietMisdreavus added inline comments.

Comment at: clang/lib/ExtractAPI/Serialization/SymbolGraphSerializer.cpp:510
+  PathComponentContext.pop_back();
dang wrote:
> dang wrote:
> > zixuw wrote:
> > > zixuw wrote:
> > > > QuietMisdreavus wrote:
> > > > > zixuw wrote:
> > > > > > What's the cost/would it worth it to wrap the `emplace_back`s and `pop_back`s of `PathComponentContext` in meaningful APIs like `enterPathComponentContext` and `exitPathComponentContext`?
> > > > > > That way the code is more self-explanatory and easier to read. It took me some time and mental resources to figure out why the `pop_back` is placed here.
> > > > > What's the use of having the `emplace_back` call inside `serializeAPIRecord` but to pop it outside? It seems like it's easier to mess up for new kinds of records.
> > > > The reason to `emplace_back` the path component inside `serializeAPIRecord` is that the `pathComponent` field of the `Record` is serialized in there so you want to include the name of the symbol itself in the path component list.
> > > > The `pop_back` is delayed til all other additional serialization for a specific kind of record, for example `serializeEnumRecord` handles all the enum cases after processing the enum record itself using `serializeAPIRecord`. So in order to correctly include the enum name in the path components of all the enum cases, the enum name has to stay in `PathComponentContext` until all members are serialized.
> > > > 
> > > > This is exactly the reason why I wanted a clearer API to make it easier to see.
> > > Hmm now that I thought about this, it seems that it would be easier to understand and avoid bugs if we lift `PathComponentContext.emplace_back`/`enterPathComponentContext` out of `serializeAPIRecord`, because we have access to the record name before that anyway.
> > > 
> > > So we establish a pre-condition of `serializeAPIRecord` that the correct path components would be set up in `PathComponentContext` before the call so we could still serialize the field inside that method. And in specific `serialize*Record` methods we push the record name, and pop out at the end.
> > > 
> > > This way the push and pop operations would appear in pairs in a single block, saving the confusion and mental work of jumping across functions to see how `PathComponentContext` is evolving.
> > If you think we should have a specialized API I am happy to do this. I figured it was self-explanatory by the name of `PathComponentContext` but it clearly isn't so needs addressing. I put the `emplace_back` call in `serializeAPIRecord` since all the specific serialization methods call it. I thought it would make it impossible to forget to add them. @zixuw is correct in the reason why the pop is outside we don't want to pop before we have serialized the sub records by agree it is ugly and potentially error prone. I can see three ways forward for improving the ergonomics of this since this seems to be problematic:
> > - Provide `serializeAPIRecord` a continuation to serialize the sub records or additional fields, that way we can move the pop inside `serializeAPIRecord` I am not a fan personally because we get into JS style closure hell if we start doing this...
> > - Use a visitor pattern where the visitor would be responsible for managing `PathComponentContext` and do the appropriate push/pops in the `visit` methods. We would need to add a APIRecordVisitor type, and appropriate visit methods for each APIRecord. This would make sense because the records should really know how to visit themselves.
> > - Just add a specialized API although it seems it would really easy to forget to remove path components.
> > 
> > Let me know what you think is the way forward.
> Unless we go with the last option, I think this should be a follow up patch since it would a structural change.
I'm most a fan of the `APIRecordVisitor` option, but you're right in that that would be a rather involved change. Now that you've said that the arrangement is for encoding sub-records, it makes sense to me. As an alternative, i think moving the `emplace_back` outside of `serializeAPIRecord` is actually my other preferred arrangement; without some other way of calculating the path components on-the-fly (e.g. walking up DeclContexts) simply having `serializeAPIRecord` look at what's there and have the wrapper calls deal with setting up its state sounds the most reasonable to me. That way, as @zixuw said, the `emplace_back` and `pop_back` calls are at least lexically close to each other.

  rG LLVM Github Monorepo



More information about the cfe-commits mailing list