[PATCH] D105904: [clangd] Support `#pragma mark` in the outline

Wed Aug 4 09:50:10 PDT 2021

dgoldman marked 2 inline comments as done.
dgoldman added inline comments.

================
Comment at: clang-tools-extra/clangd/FindSymbols.cpp:664
+      }
+      // NextPragma is after us but before the next symbol. Our reign is over.
+      break;
----------------
kadircet wrote:
> I suppose this is not correct in cases like.
> 
> ```
> #pragma mark - Outer
> foo {
>   #pragma mark - Foo
> }
> bar {
>   # pragma mark - Bar
> }
> baz {
>   # pragma mark - Baz
> }
> ```
> 
> We'll first move `foo` under `Outer`, then start processing `bar` we notice that It comes after pragma `Foo`, and then also notice that pragma `Foo` is not contained in `bar` hence break. but all of `foo,bar,baz` should've been moved to be under `Outer` instead.
Yeah that's a good point, we'd have to do something like your suggested solution or maintain more state to consider the children that were moved.

================
Comment at: clang-tools-extra/clangd/FindSymbols.cpp:535
+/// by range.
+std::vector<DocumentSymbol> mergePragmas(std::vector<DocumentSymbol> &Syms,
+                                         std::vector<PragmaMarkSymbol> &Pragmas,
----------------
kadircet wrote:
> dgoldman wrote:
> > kadircet wrote:
> > > dgoldman wrote:
> > > > kadircet wrote:
> > > > > dgoldman wrote:
> > > > > > kadircet wrote:
> > > > > > > dgoldman wrote:
> > > > > > > > sammccall wrote:
> > > > > > > > > FWIW the flow control/how we make progress seem hard to follow here to me.
> > > > > > > > > 
> > > > > > > > > In particular I think I'm struggling with the statefulness of "is there an open mark group".
> > > > > > > > > 
> > > > > > > > > Possible simplifications:
> > > > > > > > >  - define a dummy root symbol, which seems clearer than the vector<symbols> + range
> > > > > > > > >  - avoid reverse-sorting the list of pragma symbols, and just consume from the front of an ArrayRef instead
> > > > > > > > >  - make the outer loop over pragmas, rather than symbols. It would first check if the pragma belongs directly here or not, and if so, loop over symbols to work out which should become children. This seems very likely to be efficient enough in practice (few pragmas, or most children are grouped into pragmas)
> > > > > > > > > define a dummy root symbol, which seems clearer than the vector<symbols> + range
> > > > > > > > 
> > > > > > > > I guess? Then we'd take in a `DocumentSymbol & and a ArrayRef<PragmaMarkSymbol> & (or just by value and then return it as well). The rest would be the same though
> > > > > > > > 
> > > > > > > > > In particular I think I'm struggling with the statefulness of "is there an open mark group".
> > > > > > > > 
> > > > > > > > We need to track the current open group if there is one in order to move children to it.
> > > > > > > > 
> > > > > > > > > make the outer loop over pragmas, rather than symbols. It would first check if the pragma belongs directly here or not, and if so, loop over symbols to work out which should become children. This seems very likely to be efficient enough in practice (few pragmas, or most children are grouped into pragmas)
> > > > > > > > 
> > > > > > > > The important thing here is knowing where the pragma mark ends - if it doesn't, it actually gets all of the children. So we'd have to peak at the next pragma mark, add all symbols before it to us as children, and then potentially recurse to nest it inside of a symbol. I'll try it out and see if it's simpler.
> > > > > > > > 
> > > > > > > > 
> > > > > > > ```
> > > > > > > while(Pragmas) {
> > > > > > > // We'll figure out where the Pragmas.front() should go.
> > > > > > > Pragma P = Pragmas.front();
> > > > > > > DocumentSymbol *Cur = Root;
> > > > > > > while(Cur->contains(P)) {
> > > > > > >   auto *OldCur = Cur;
> > > > > > >   for(auto *C : Cur->children) {
> > > > > > >      // We assume at most 1 child can contain the pragma (as pragmas are on a single line, and children have disjoint ranges)
> > > > > > >      if (C->contains(P)) {
> > > > > > >          Cur = C;
> > > > > > >          break;
> > > > > > >      }
> > > > > > >   }
> > > > > > >   // Cur is immediate parent of P
> > > > > > >   if (OldCur == Cur) {
> > > > > > >     // Just insert P into children if it is not a group and we are done.
> > > > > > >     // Otherwise we need to figure out when current pragma is terminated:
> > > > > > > // if next pragma is not contained in Cur, or is contained in one of the children, It is at the end of Cur, nest all the children that appear after P under the symbol node for P.
> > > > > > > // Otherwise nest all the children that appear after P but before next pragma under the symbol node for P.
> > > > > > > // Pop Pragmas and break
> > > > > > >   }
> > > > > > > }
> > > > > > > }
> > > > > > > ```
> > > > > > > 
> > > > > > > Does that make sense, i hope i am not missing something obvious? Complexity-wise in the worst case we'll go all the way down to a leaf once per pragma, since there will only be a handful of pragmas most of the time it shouldn't be too bad.
> > > > > > I've implemented your suggestion. I don't think it's simpler, but LMK, maybe it can be improved.
> > > > > oops, i was looking into an older revision and missed mergepragmas2, i think it looks quite similar to this one but we can probably get rid of the recursion as well and simplify a couple more cases
> > > > This makes sense,  I think that works for the most part besides dropping the recursion, specifically for
> > > > 
> > > > ```
> > > >       // Next pragma is contained in the Sym, it belongs there and doesn't
> > > >       // affect us at all.
> > > >       if (Sym.range.contains(NextPragma.DocSym.range)) {
> > > >         Sym.children = mergePragmas2(Sym.children, Pragmas, Sym.range);
> > > >         continue;
> > > >       }
> > > > ```
> > > > 
> > > > I guess we could explicitly forbid 3+ layers of nesting and handle it inline there? But I'm not sure it's worth the effort to rewrite all of this - the recursion shouldn't be deep and we avoid needing to shift vector elements over by recreating a new one.
> > > Sorry I don't follow why we can't get rid of the recursion in this case.
> > > 
> > > Two loop solution I described above literally tries to find the document symbol node, such that the current pragma is contained in that node && current pragma isn't contained in any of that node's children. Afterwards it inserts the pragma into that node and starts traversing the tree from root again for the next pragma.
> > > 
> > > Again I don't follow where the `3+ layers of nesting` constraint came from. But I do feel like the iterative version is somewhat easier to reason about (especially keeping track of what's happening with `pragmas.front()` and the way it bails out via `parentrange` check). Shifting of the vector is definitely unfortunate but I think it shouldn't imply big performance hits in practice as we are only shifting the children of a single node.
> > Yeah, I understand the first part,  I think specifically handling the group case after you discover where it needs to be inserted is a bit more complicated, something like the following:
> > 
> > 
> > ```
> >         // Pragma is a group, so we need to figure out where it terminates:
> >         // - If the next Pragma is not contained in Cur, P owns all of its
> >         //   parent's children which occur after P.
> >         // - If the next pragma is contained in Cur but actually belongs to one
> >         //   of the parent's children, we temporarily skip over it and look at
> >         //   the next pragma to decide where we end.
> >         // - Otherwise nest all of its parent's children which occur after P but
> >         //   before the next pragma.
> > ```
> > 
> > And yeah, shifting in the worst case is definitely worst (due to repeat shifting) although it shouldn't be too common in practice (things like a large @implementation block would probably have the most children).
> > we temporarily skip over it and look at the next pragma to decide where we end.
> 
> Ah you are right I was missing this part. As you suggested we'll need to loop over all the remaining pragmas, which will make the complexity `((quadratic in terms of # of pragmas) * (depth of document symbol tree))` I think it is still acceptable in practice. I don't think there'll be more than a dozen pragmas in a file. (moreover i think this problem exists with the current implementation you've provided, see my comment below)
> 
> Are you fine with reshaping the implementation in that way then? Because I still feel like the code will end up looking a lot more straightforward, even if not the most optimal and we can try to optimize the rest if it shows up in the latency graphs.
Implementing this now - there's one other edge case here, in that we don't want to treat `#pragma mark Blah` as a group. I guess I can maintain state similar to the first solution to solve that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105904/new/

https://reviews.llvm.org/D105904