[PATCH] D94424: [clangd] Make AST-based signals available to runWithPreamble.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jan 13 12:55:19 PST 2021


sammccall added a comment.

Thanks, this looks pretty solid!

I want to be pretty careful about APIs/naming/code organization here (even moreso than usual!) because TUScheduler is a big complex beast as it is, and it's easy to get lost.



================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:421
 
+  std::shared_ptr<const ASTSignals> getASTSignals() const;
+
----------------
In practice, this is always called together with getPossiblyStalePreamble() for runWithPreamble (2 callsites). The two feel fairly related, and they each lock the same mutex.

The other two callsites of getPossiblyStalePreamble() are measuring memory (which in principle should also get the ASTSignals, though in practice it's likely small enough not to matter), and building ASTs, which can ignore it.

I'd suggest combining them into the somewhat awkward signature
```shared_ptr<const PreambleData> getPossiblyStalePreamble(shared_ptr<const ASTSignals> *PossiblyStaleASTSignals = nullptr)```


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:423
+
+  void updateASTSignals(ParsedAST &AST);
+
----------------
why is this public?


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:841
 
+void ASTWorker::updateASTSignals(ParsedAST &AST) {
+  ASTSignals Signals;
----------------
This implementation doesn't belong in TUScheduler, which is all about managing lifecycles and threading, rather than inspecting ASTs.

Best suggestion I have is to create a separate header ASTSignals.h, which defines the ASTSignals struct and a `static ASTSignals ASTSignals::derive(const ParsedAST&)` function to extract them.

(It's pretty awkward to even directly depend on it, since this feels more of a "feature" like go-to-definition that's independent of to TUScheduler and injected by ClangdServer, rather than core things like ParsedAST that are layered below TUScheduler. But fundamentally we can't avoid this without somehow obfuscating the data structure we're storing).


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:851
+            Signals.Symbols[ID] += 1;
+          // FIXME: Do not count the primary namespace as a related namespace.
+          // Eg. clang::clangd:: for this file.
----------------
why not?


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:853
+          // Eg. clang::clangd:: for this file.
+          if (const auto *NSD = dyn_cast<NamespaceDecl>(ND->getDeclContext()))
+            if (!ND->isInAnonymousNamespace()) {
----------------
as mentioned above, you may want to do this only if the per-symbol-count was incremented from 0 to 1.
You'd have to ignore namespaces for symbols with no symbolID, but that doesn't matter.


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:854
+          if (const auto *NSD = dyn_cast<NamespaceDecl>(ND->getDeclContext()))
+            if (!ND->isInAnonymousNamespace()) {
+              std::string NS;
----------------
why not NSD->isAnonymous()?

(and generally prefer early continue rather than indenting the whole loop body)


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:855
+            if (!ND->isInAnonymousNamespace()) {
+              std::string NS;
+              llvm::raw_string_ostream OS(NS);
----------------
rather than checking/printing the namespace every time it's encountered, collect a `DenseMap<NamespaceDecl*, unsigned>` and then run through them once at the end?


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:857
+              llvm::raw_string_ostream OS(NS);
+              NSD->printQualifiedName(OS);
+              if (!OS.str().empty()) {
----------------
use printNamespaceScope() from AST.h, which correctly handles inline namespace, adds trailing "::" etc.


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:866
+  // read is completed. The last reader deletes the old ASTSignals.
+  std::lock_guard<std::mutex> Lock(Mutex);
+  LatestASTSignals = std::make_shared<ASTSignals>(std::move(Signals));
----------------
nit: to minimize the length of the locked section:

- move the new signals into the shared_ptr first (here I think you can just allocate it with make_shared in the first place). This avoids locking the allocation + move constructor.
- introduce a new scope for the locked section, and std::swap the shared_ptrs rather than moving. Together, this avoids locking the destructor of the old signals (it runs when the pointer you swapped it into goes out of scope, instead).



================
Comment at: clang-tools-extra/clangd/TUScheduler.h:38
 
+/// Signals derived from a valid AST of a file.
+struct ASTSignals {
----------------
Neither the comment or the name really explain what this is or what's unusual about it.
I can't think of a great name, but the comment should mention:
 - the purpose (provide information that can only be extracted from the AST to actions that can't access an AST)
 - the limitations (may be absent, always stale)
 - example usage (information about what declarations are used in a file affects code-completion ranking)


================
Comment at: clang-tools-extra/clangd/TUScheduler.h:41
+  /// Number of occurrences of each Symbol present in the file.
+  llvm::DenseMap<SymbolID, unsigned> Symbols;
+  /// Number of Symbols belonging to each namespace present in the file.
----------------
ReferencedSymbols?


================
Comment at: clang-tools-extra/clangd/TUScheduler.h:42
+  llvm::DenseMap<SymbolID, unsigned> Symbols;
+  /// Number of Symbols belonging to each namespace present in the file.
+  llvm::StringMap<unsigned> RelatedNamespaces;
----------------
I have a hard time parsing this comment (what does it mean for a namespace to be present in the file?) Maybe:

```
/// Namespaces whose symbols are used in the file, and the number of such symbols.
```

Looking at the implementation, it looks like it's rather the number of symbol *occurrences*. I suspect distinct symbols as your comment here suggests would be better though. If one symbol from a namespace is used 100 times, that should boost that symbol a lot but not necessarily the namespace a lot.


================
Comment at: clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp:738
+  Notification TaskRun;
+  S.runWithPreamble(
+      "ASTSignals", Foo, TUScheduler::Stale,
----------------
(untangling ASTSignals from TUScheduler should make this clearer to test directly, though you may still want to keep a simple smoke test here)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94424/new/

https://reviews.llvm.org/D94424



More information about the cfe-commits mailing list