[PATCH] D99086: [clang][Syntax] Optimize expandedTokens for token ranges.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 23 04:47:56 PDT 2021


sammccall added a comment.

Thanks for finding and working on this hotspot. (I'm curious *why* `isBeforeInTranslationUnit` is slow if you have any insight - it has sad worst-case but I thought we'd hit the one-element cache often enough in practice).

I don't see a better way to optimize this, main question is where to do it.
In favor of TokenBuffer initializer (this patch):

- it produces the simplest API
- no code duplicated between callers
- when a TokenBuffer gets reused and it's awkward for its users to share other state, the same cache can be reused

In favor of in the caller (e.g. SelectionTree)

- users that never call expandedTokens(SourceRange) pay nothing
- users that call expandedTokens(SourceRange) once pay `log(N) * isBeforeInTranslationUnit` instead of `N`, which may come out ahead for large files.

It's a bit unfortunate that TokenBuffer is enough of a grab-bag of features that we can't really reason about which structures to build :-(

Can we get the best of both worlds?
Ideally we'd build this cache only when it's profitable. But building on-demand messes with const (and it's unclear whether to do it on the first call).
What about making this optional via a non-const `TokenBuffer::indexExpandedTokens(); // Optimizes future calls to expandedTokens(SourceRange). Idempotent.`?
It's a slightly weird API, but not actually hard to use (I imagine clangd would always call it when building ParsedAST).



================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:372
+  // useful while finding expanded tokens in a 'token range'.
+  llvm::DenseMap<SourceLocation, unsigned> ExpandedTokIndex;
   llvm::DenseMap<FileID, MarkedFile> Files;
----------------
kadircet wrote:
> usaxena95 wrote:
> > kadircet wrote:
> > > this definitely makes sense, but most of the users of `TokenBuffers` are currently invoking `expandedTokens(Range)` only once, so i am not sure if this is a huge benefit on their-side, whereas they'll likely end up seeing some short-lived memory consumption (and an extra traversal of all the expanded tokens).
> > > 
> > > Any reasons for not moving this logic to the application? `expandedTokens()` endpoint will return all the tokens, no questions asked. Then the application can build whatever acceleration structure it wants. If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper?
> > > this definitely makes sense, but most of the users of TokenBuffers are currently invoking expandedTokens(Range) only once
> > I didn't get that users are invoking it only once. This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search which uses `isBeforeInTranslationUnit` (considerably slower). Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range.
> > 
> > Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer.
> > I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit.
> > My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself.
> > 
> > > Any reasons for not moving this logic to the application?
> > > If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper?
> > This can basically benefit all users of `expandedToken(Range)`. Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application.
> > Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases.
> > 
> > 
> > 
> > I didn't get that users are invoking it only once.
> 
> Sorry that was confusing, I was trying to say that on use cases like Hover/Go-To etc. clangd builds selection tree only once and you are right, that might actually involve multiple calls to `expandedTokens(Range)` but that shouldn't be the bottleneck of these features.
> 
> 
> > This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search
> 
> right, i was saying that all that precomputation is also probably unneeded, as normally features are only interested in a single token, which should nest properly in AST under normal circumstances and doesn't require calling expandedTokens for all possible ranges.
> 
> > Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range.
> 
> I was suggesting to move this into application because that felt counter-intuitive, as the benchmark you provided is actually running Hover(and other feautres) on every (spelled) token of the file. If you say that it is beneficial even for the normal hover case, sure let's go (but i'd be happier if you proved me wrong first 😅 )
> 
> > Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer.
> > I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit.
> > My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself.
> 
> That's what clangd does though (not sure if Tokenbuffers have any other users ATM). It literally builds a new ParsedAST after every edit(It also has a cache to skip some work if contents didn't change between requests). But usually all of those requests are user-initiated, so an extra latency of a couple ms on every request isn't really noticeable.
> Getting rid of that would still be great, but if it means slowing down the initial request for quite some time (due to extra traversal of all the expanded tokens, which might be a huge deal especially on files including some tablegen'd stuff, as they are usually part of the main file and not preamble).
> 
> > This can basically benefit all users of `expandedToken(Range)`.
> 
> As stated above, i am just worried that this might hurt the use cases in which application just builds a tokenbuffer and calls this on a few ranges. (maybe clangd is the odd one out, but i don't see any other users. there's `clang::syntax::computeReplacements` but it is only used in a test)
> 
> > Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application.
> 
> I don't follow the reasoning here exactly. The application that needs to operate on every token of the expanded file can call the wrapper rather than using TokenBuffers directly.
> 
> But... The most dominant application in clangd's case is SelectionTree and if it is going to perform this computation all the time (because it wants to support both batch and single invocation use-case :/ ). As you suggested this can be optional (or triggered with an explicit endpoint) but that's gonna be ugly, as it needs to be supplied when building a ParsedAST, ugh..
> 
> (Sorry for drawing a little circle here) I suppose it is cleanest the way you propose it then (unless your application is actually building ParsedAST's itself without any clangd abstraction like TUScheduler/ClangdServer).
> It would still be great if you could check for possible regressions on the common path (the easiest would be to just look at `ParsedAST::build` latencies on a couple files, i suppose `clang/lib/Serialization/ASTReader.cpp` is a good example for the pathological case, winning there would probably mean winning in all the other cases too, but it would be good to see a more "sane" file too. As anything after building the `ParsedAST` is guaranteed to be as fast as before.
> 
> > Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases.
> 
> I suppose to hit the slow case users should provide spelled tokens that did not expand to anything, which should be rare, so I don't think that'll be a huge problem.
Long thread here :-)

Kadir: one wrinkle that's maybe missed here is that clangd's SelectionTree calls this method *many* times to establish bounds of many AST nodes. I fully believe this is dominating SelectionTree build time (though it would be nice to see benchmarks).


================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:653
 
+    // Index ExpandedTokens for faster lookups by SourceLocation.
+    unsigned ExpandedIndex = 0;
----------------
I think it's worth reserve()ing the map size here


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99086/new/

https://reviews.llvm.org/D99086



More information about the cfe-commits mailing list