[PATCH] D53433: [clangd] auto-index stores symbols per-file instead of per-TU.
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Oct 30 03:15:58 PDT 2018
sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.
Great stuff! Just nits (though a few of them; this is a tricky patch!).
================
Comment at: clangd/index/Background.cpp:122
+inline BackgroundIndex::FileDigest digest(StringRef Content) {
+ return SHA1::hash({(const uint8_t *)Content.data(), Content.size()});
----------------
static rather than inline?
================
Comment at: clangd/index/Background.cpp:135
+
+// Resolves URI to file paths with cache.
+StringRef BackgroundIndex::uriToPath(StringRef FileURI, StringRef HintPath) {
----------------
Might be cleanest to wrap the cache, the URISchemes&, the hint path, and this function into a tiny class
================
Comment at: clangd/index/Background.cpp:160
+ const StringMap<FileDigest> &FilesToUpdate) {
+ // Partition symbols/references into files.
+ struct File {
----------------
Maybe add a comment "eventually SymbolCollector may do this for us to avoid all this copying"?
================
Comment at: clangd/index/Background.cpp:311
SPAN_ATTACH(Tracer, "refs", int(Refs.numRefs()));
- // FIXME: partition the symbols by file rather than TU, to avoid duplication.
- IndexedSymbols.update(AbsolutePath,
- llvm::make_unique<SymbolSlab>(std::move(Symbols)),
- llvm::make_unique<RefSlab>(std::move(Refs)));
- FileHash[AbsolutePath] = Hash;
+ update(AbsolutePath, std::move(Symbols), std::move(Refs), FilesToUpdate);
+ IndexedFileDigests[AbsolutePath] = Hash;
----------------
It looks like you never write the FilesToUpdate hashes back to the IndexedFileDigests, so we'll always update headers?
================
Comment at: clangd/index/Background.h:55
+ using FileDigest = decltype(llvm::SHA1::hash({}));
+ using FileDigests = llvm::StringMap<FileDigest>;
----------------
private.
================
Comment at: clangd/index/Background.h:75
+ FileDigests IndexedFileDigests; // Key is absolute file path.
+ llvm::StringMap<std::string> URIToPathCache;
----------------
There's no need for this to be global to the class, it's going to cause some hassle once there are multiple worker threads.
I think can a cache can be scoped to a single call to update() (and both it and uriToPath() can be hidden in the cpp file.)
================
Comment at: clangd/index/FileIndex.cpp:123
+ case DuplicateHandling::Merge: {
+ // Merge slabs into a single slab and only keep alive the merged.
+ DenseMap<SymbolID, Symbol> Merged;
----------------
comment is stale
================
Comment at: clangd/index/FileIndex.cpp:140
+ }
+ case DuplicateHandling::PickOne: {
+ for (const auto &Slab : SymbolSlabs)
----------------
since we're deduplicating above, we could deduplicate here too and remove the deduplicating logic from Dex (MemIndex gets it by mistake). That would avoid duplicated deduplication (!) in the Merge + Dex case, and seems a little cleaner.
Not something for this patch, but maybe add a FIXME.
================
Comment at: clangd/index/SymbolCollector.cpp:212
+ auto I = FilesToIndexCache->try_emplace(FID);
+ if (!I.second)
+ return I.first->second;
----------------
nit: simplify logic by inverting the if here and sharing the return?
================
Comment at: clangd/index/SymbolCollector.cpp:441
std::string FileURI;
- if (auto DeclLoc = getTokenLocation(MI->getDefinitionLoc(), SM, Opts,
- PP->getLangOpts(), FileURI))
+ auto DefLoc = MI->getDefinitionLoc();
+ // FIXME: use the result to filter out symbols.
----------------
why is this hoisted above shouldIndexFile? or did you mean to use DefLoc below?
================
Comment at: clangd/index/SymbolCollector.cpp:442
+ auto DefLoc = MI->getDefinitionLoc();
+ // FIXME: use the result to filter out symbols.
+ shouldIndexFile(SM, SM.getFileID(Loc), Opts, &FilesToIndexCache);
----------------
just to check - this is basically trivial to do now but would need tests etc?
(fine to defer from this patch)
================
Comment at: clangd/index/SymbolCollector.h:78
+ /// `FileFilter(SM, FID)` is true. If not set, all files are indexed.
+ std::function<bool(const SourceManager &, FileID)> FileFilter = nullptr;
};
----------------
I think we should explicitly test this in SymbolCollectorTests - I think it should be straightforward?
================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:28
// a.h yields different symbols when included by A.cc vs B.cc.
// Currently we store symbols for each TU, so we get both.
+ FS.Files[testPath("root/A.h")] = R"(
----------------
stale comment?
================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:29
// Currently we store symbols for each TU, so we get both.
- FS.Files[testPath("root/A.h")] = "void a_h(); void NAME(){}";
- FS.Files[testPath("root/A.cc")] = "#include \"A.h\"";
- FS.Files[testPath("root/B.cc")] = "#define NAME bar\n#include \"A.h\"";
- BackgroundIndex Idx(Context::empty(), "", FS);
+ FS.Files[testPath("root/A.h")] = R"(
+ void common();
----------------
nit: R"cpp(
================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:33
+ #if A
+ class A_H {};
+ #else
----------------
naming here seems confusing: A_H seems to refer to the header, but then what does B_H refer to?
================
Comment at: unittests/clangd/BackgroundIndexTests.cpp:40
+ "#include \"A.h\"\nvoid g() { (void)common; }";
+ FS.Files[testPath("root/B.cc")] =
+ "#define A 0\n#include \"A.h\"\nvoid f_b() { (void)common; }";
----------------
these should be raw strings now I think (sorry, the old test was already marginal)
Repository:
rCTE Clang Tools Extra
https://reviews.llvm.org/D53433
More information about the cfe-commits
mailing list