[PATCH] D99086: [clang][Syntax] Optimize expandedTokens for token ranges.

Mon Mar 22 12:07:38 PDT 2021

usaxena95 added inline comments.

================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:372
+  // useful while finding expanded tokens in a 'token range'.
+  llvm::DenseMap<SourceLocation, unsigned> ExpandedTokIndex;
   llvm::DenseMap<FileID, MarkedFile> Files;
----------------
kadircet wrote:
> this definitely makes sense, but most of the users of `TokenBuffers` are currently invoking `expandedTokens(Range)` only once, so i am not sure if this is a huge benefit on their-side, whereas they'll likely end up seeing some short-lived memory consumption (and an extra traversal of all the expanded tokens).
> 
> Any reasons for not moving this logic to the application? `expandedTokens()` endpoint will return all the tokens, no questions asked. Then the application can build whatever acceleration structure it wants. If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper?
> this definitely makes sense, but most of the users of TokenBuffers are currently invoking expandedTokens(Range) only once
I didn't get that users are invoking it only once. This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search which uses `isBeforeInTranslationUnit` (considerably slower). Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range.

Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer.
I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit.
My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself.

> Any reasons for not moving this logic to the application?
> If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper?
This can basically benefit all users of `expandedToken(Range)`. Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application.
Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99086/new/

https://reviews.llvm.org/D99086