[PATCH] D53433: [clangd] auto-index stores symbols per-file instead of per-TU.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 30 03:15:58 PDT 2018


sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

Great stuff! Just nits (though a few of them; this is a tricky patch!).



================
Comment at: clangd/index/Background.cpp:122
 
+inline BackgroundIndex::FileDigest digest(StringRef Content) {
+  return SHA1::hash({(const uint8_t *)Content.data(), Content.size()});
----------------
static rather than inline?


================
Comment at: clangd/index/Background.cpp:135
+
+// Resolves URI to file paths with cache.
+StringRef BackgroundIndex::uriToPath(StringRef FileURI, StringRef HintPath) {
----------------
Might be cleanest to wrap the cache, the URISchemes&, the hint path, and this function into a tiny class


================
Comment at: clangd/index/Background.cpp:160
+                             const StringMap<FileDigest> &FilesToUpdate) {
+  // Partition symbols/references into files.
+  struct File {
----------------
Maybe add a comment "eventually SymbolCollector may do this for us to avoid all this copying"?


================
Comment at: clangd/index/Background.cpp:311
   SPAN_ATTACH(Tracer, "refs", int(Refs.numRefs()));
-  // FIXME: partition the symbols by file rather than TU, to avoid duplication.
-  IndexedSymbols.update(AbsolutePath,
-                        llvm::make_unique<SymbolSlab>(std::move(Symbols)),
-                        llvm::make_unique<RefSlab>(std::move(Refs)));
-  FileHash[AbsolutePath] = Hash;
+  update(AbsolutePath, std::move(Symbols), std::move(Refs), FilesToUpdate);
+  IndexedFileDigests[AbsolutePath] = Hash;
----------------
It looks like you never write the FilesToUpdate hashes back to the IndexedFileDigests, so we'll always update headers?


================
Comment at: clangd/index/Background.h:55
 
+  using FileDigest = decltype(llvm::SHA1::hash({}));
+  using FileDigests = llvm::StringMap<FileDigest>;
----------------
private.


================
Comment at: clangd/index/Background.h:75
+  FileDigests IndexedFileDigests; // Key is absolute file path.
+  llvm::StringMap<std::string> URIToPathCache;
 
----------------
There's no need for this to be global to the class, it's going to cause some hassle once there are multiple worker threads.

I think can a cache can be scoped to a single call to update() (and both it and uriToPath() can be hidden in the cpp file.)


================
Comment at: clangd/index/FileIndex.cpp:123
+  case DuplicateHandling::Merge: {
+    // Merge slabs into a single slab and only keep alive the merged.
+    DenseMap<SymbolID, Symbol> Merged;
----------------
comment is stale


================
Comment at: clangd/index/FileIndex.cpp:140
+  }
+  case DuplicateHandling::PickOne: {
+    for (const auto &Slab : SymbolSlabs)
----------------
since we're deduplicating above, we could deduplicate here too and remove the deduplicating logic from Dex (MemIndex gets it by mistake). That would avoid duplicated deduplication (!) in the Merge + Dex case, and seems a little cleaner.

Not something for this patch, but maybe add a FIXME.


================
Comment at: clangd/index/SymbolCollector.cpp:212
+  auto I = FilesToIndexCache->try_emplace(FID);
+  if (!I.second)
+    return I.first->second;
----------------
nit: simplify logic by inverting the if here and sharing the return?


================
Comment at: clangd/index/SymbolCollector.cpp:441
   std::string FileURI;
-  if (auto DeclLoc = getTokenLocation(MI->getDefinitionLoc(), SM, Opts,
-                                      PP->getLangOpts(), FileURI))
+  auto DefLoc = MI->getDefinitionLoc();
+  // FIXME: use the result to filter out symbols.
----------------
why is this hoisted above shouldIndexFile? or did you mean to use DefLoc below?


================
Comment at: clangd/index/SymbolCollector.cpp:442
+  auto DefLoc = MI->getDefinitionLoc();
+  // FIXME: use the result to filter out symbols.
+  shouldIndexFile(SM, SM.getFileID(Loc), Opts, &FilesToIndexCache);
----------------
just to check - this is basically trivial to do now but would need tests etc?
(fine to defer from this patch)


================
Comment at: clangd/index/SymbolCollector.h:78
+    /// `FileFilter(SM, FID)` is true. If not set, all files are indexed.
+    std::function<bool(const SourceManager &, FileID)> FileFilter = nullptr;
   };
----------------
I think we should explicitly test this in SymbolCollectorTests - I think it should be straightforward?


================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:28
   // a.h yields different symbols when included by A.cc vs B.cc.
   // Currently we store symbols for each TU, so we get both.
+  FS.Files[testPath("root/A.h")] = R"(
----------------
stale comment?


================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:29
   // Currently we store symbols for each TU, so we get both.
-  FS.Files[testPath("root/A.h")] = "void a_h(); void NAME(){}";
-  FS.Files[testPath("root/A.cc")] = "#include \"A.h\"";
-  FS.Files[testPath("root/B.cc")] = "#define NAME bar\n#include \"A.h\"";
-  BackgroundIndex Idx(Context::empty(), "", FS);
+  FS.Files[testPath("root/A.h")] = R"(
+      void common();
----------------
nit: R"cpp(


================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:33
+      #if A
+        class A_H {};
+      #else
----------------
naming here seems confusing: A_H seems to refer to the header, but then what does B_H refer to?


================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:40
+      "#include \"A.h\"\nvoid g() { (void)common; }";
+  FS.Files[testPath("root/B.cc")] =
+      "#define A 0\n#include \"A.h\"\nvoid f_b() { (void)common; }";
----------------
these should be raw strings now I think (sorry, the old test was already marginal)


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D53433





More information about the cfe-commits mailing list