[PATCH] D48441: [clangd] Incorporate transitive #includes into code complete proximity scoring.

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 22 03:32:24 PDT 2018


ioeric added a comment.

Looks great! Thanks for doing this!

The algorithm looks pretty efficient. It might still make sense to collect some performance numbers for real files with potentially large #include tree (we have plenty of those in our internal codebase :)



================
Comment at: clangd/CodeComplete.cpp:908
+  llvm::Optional<IncludeInserter> Inserter;  // Available during runWithSema.
+  llvm::Optional<URIDistance> FileProximity; // Initialized once Sema runs.
 
----------------
It might make sense to briefly explain somewhere how computational cost is distributed across the workflow.


================
Comment at: clangd/CodeComplete.cpp:938
+          SemaCCInput.FileName, SemaCCInput.Contents,
+          format::getLLVMStyle(), // XXX
+          SemaCCInput.Command.Directory,
----------------
nit: `Style`?


================
Comment at: clangd/FileDistance.cpp:72
+  // Fill in all ancestors between the query and the known.
+  for (unsigned I = 0; I < Ancestors.size(); ++I) {
+    if (Cost != kUnreachable)
----------------
nit: this loop would probably be easier to understand if iterated reversely.


================
Comment at: clangd/FileDistance.cpp:86
+    return R.first->getSecond();
+  if (auto U = clangd::URI::parse(URI)) {
+    LLVM_DEBUG(dbgs() << "distance(" << URI << ") = distance(" << U->body()
----------------
This is done for every index symbol, so I wonder if we could avoid parsing URIs by assuming some URI structures.


================
Comment at: clangd/FileDistance.h:56
+
+  FileDistance(llvm::StringMap<unsigned> Roots,
+               const FileDistanceOptions &Opts = {});
----------------
nit: It's unclear what the values of Roots are (it can be confused with tree roots or directory roots).


================
Comment at: clangd/Headers.cpp:28
+      : SM(SM), Out(Out) {
+    SM.getMainFileID();
+  }
----------------
What does this do?


================
Comment at: clangd/Headers.h:62
+  std::vector<Inclusion> MainFileIncludes;
+  llvm::StringMap<unsigned> includeDepth(llvm::StringRef MainFileName) const;
+
----------------
Does this return all transitive includes in the entire TU or just those from `MainFileName`? And what's `MainFileName` here? Is it the same as the main file in a TU? 


================
Comment at: clangd/Headers.h:67
+                     llvm::StringRef IncludedName,
+                     llvm::StringRef IncludedRealName);
+
----------------
The real name can be empty. How do we handle empty path?


================
Comment at: clangd/Headers.h:75
+  // The paths we want to expose are the RealPathName, so store those too.
+  unsigned fileNumber(llvm::StringRef Name);
+  llvm::StringMap<unsigned> NameToIndex;  // Values are file numbers.
----------------
Maybe `fileIndex`?


================
Comment at: clangd/Quality.cpp:233
 
+std::pair<float, unsigned> proximityScore(llvm::StringRef SymbolURI,
+                                          URIDistance *D) {
----------------
nit: `static`


================
Comment at: unittests/clangd/FileDistanceTests.cpp:36
+  // Ancestor (up+up+up+up)
+  EXPECT_EQ(D.distance("/"), 22u);
+  // Sibling (up+down)
----------------
I think this is an interesting case. IIUC, 22u is contributed by 4 ups (4*5) and 1 include (1*2, via `llvm/ADT/StringRef.h`). 

Suppose `a/b/c/d/e/f.cc` includes `base/x.h`, then the shortest path into directory `other/` is likely to go via `base/x.h`. This might become a problem if `base/x.h` is very commonly used. One solution is probably to penalize edits that hit the root.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D48441





More information about the cfe-commits mailing list