[PATCH] D105904: [clangd] Support `#pragma mark` in the outline

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 15 05:00:58 PDT 2021


sammccall added a comment.

In D105904#2877184 <https://reviews.llvm.org/D105904#2877184>, @kadircet wrote:

> So first of all, it is not clear to me if this is a useful enough improvement to justify all the new code, but looking at your investment I'll assume this is really common in ObjC land and gonna be useful for developers working with ObjC. That's enough of a justification for me, but I'd be more than happy to see some real data if you have any.

+1. I think the key for this kind of feature is isolating the complexity so it doesn't need to be part of the core mental model of how the feature works.
For macros I basically failed to do this, but I think we can do better here.

> Can you verify that I understood the intent correctly and make it more explicit in the comments as well?

Yes, I think the comments should explain how these marks are used without assuming any prior knowledge.
Simplest reason for this is that the impl language for clangd is not objc, and the technique seems basically unknown elsewhere.

> Before diving into the details of the implementation it looks like we are doing some post-processing to incorporate mark pragmas into the tree, why can't we just insert them while building the tree as we do with macros?

OK, this is why I'm jumping in here...
I think the fact that we do macros as-we-go is bad but unavoidable. It forces the core algorithm to think about symbol nesting and macro expansion at the same time, and makes the code much harder to reason about than if they were completely separate.

I didn't manage to separate them because macros are so complicated - we need the detailed SourceLocation info, to account for cases where they don't nest concentrically, etc. But this failure isn't something we should be consistent about - the more features that are part of the central algorithm, the exponentially more interactions there are to worry about.

However those difficulties don't apply here - I believe you can take the current final representation (`vector<DocumentSymbol>`) + the list of marks and produce a markified `vector<DocumentSymbol>` via a fairly natural tree walk. And this does a *really* good job of isolating this feature - we literally run the rest of the algorithm without it.
(This is slightly different to what's done in the patch, I would actually take it out of the `DocumentOutline` class entirely).
I don't think this hurts efficiency much (I think/hope this ends up being a move rather than a copy for most nodes). And I think it actually makes the markifying code easier to reason about in this case, as it just deals with line numbers/positions, without having the distraction of SourceLocation.

@kadircet WDYT of this argument, before we make David go back and forth?



================
Comment at: clang-tools-extra/clangd/ParsedAST.cpp:418
       collectIncludeStructureCallback(Clang->getSourceManager(), &Includes));
   // Copy over the macros in the preamble region of the main file, and combine
   // with non-preamble macros below.
----------------
You'll need to do this with marks as well (collect + merge), or you'll miss marks from the preamble region.
Make sure there's a test for this (it's not necessary to actually have any includes, I think)


================
Comment at: clang-tools-extra/clangd/Protocol.h:198
+
+  /// Expand this range to also contain `Rng`.
+  void mergeWith(Range Rng) {
----------------
This doesn't belong in Protocol.h, rather in SourceCode.h or so - these aren't domain objects and adding logic to them can end up causing layering issues in general.
(There are some exceptions for good and bad reasons, but not many. I'm not sure why Range::contains is here)

Maybe union() to be a little less ambiguous?


================
Comment at: clang-tools-extra/clangd/TextMarks.h:22
+/// Represents a programmer specified mark/note, typically used to easily locate
+/// or differentiate code. e.g. a `pragma mark`, a `TODO`.
+///
----------------
This seems like premature abstraction - you don't handle TODOs here, nor are they appropriate for the single use case we have. Moreover they don't satisfy the definition here: `int foo; // TODO: turn into a float` the TODO doesn't occupy the whole line.

If we need it, the appropriate generalization seems just as likely to be "pragmas" or "directives" as "textual marks".

Similarly, it's not clear that the logic around interpreting the strings e.g. as groups needs to be shared across features, nor is it terribly complicated.
So the interface here seems like it could be much simpler:
```
struct PragmaMark {
  unsigned Line;
  string Text;
}
unique_ptr<PPCallbacks> collectPragmaMarksCallback(const SourceManager&, vector<PragmaMark> &Out);
```

I'd suggest putting this in CollectMacros.h (which would be misnamed but not terribly, or we could always rename) rather than adding a new header.


================
Comment at: clang-tools-extra/clangd/TextMarks.h:45
+      : SM(SM), Out(Out) {}
+  void PragmaMark(SourceLocation Loc, StringRef Trivia) override {
+    if (isInsideMainFile(Loc, SM)) {
----------------
please avoid puttng bodies inline unless trivial or hot


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105904/new/

https://reviews.llvm.org/D105904



More information about the cfe-commits mailing list