[clang-tools-extra] dffa9df - [clangd] Shard preamble symbols in dynamic index

Mikael Holmén via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 17 05:17:00 PDT 2020


On Fri, 2020-04-17 at 15:02 +0300, Kadir Çetinkaya wrote:
> Thanks Mikael,
> 
> 66b54d586fa73499714e2bfef3cedffeabb08f34 should fix this.
> 

Yep, now it runs without complaints.

Thanks!
/Mikael

> On Fri, Apr 17, 2020 at 2:42 PM Mikael Holmén <
> mikael.holmen at ericsson.com> wrote:
> > Hi Kadir,
> > 
> > I noticed in our buildbots that run with sanitizers that we got a
> > new
> > complaint with this commit. It's still there on current top-of-tree 
> > now
> > so the fixes you pushed after this patch doesn't seem to address
> > this
> > one:
> > 
> > ==89727==ERROR: LeakSanitizer: detected memory leaks
> > 
> > Direct leak of 136 byte(s) in 1 object(s) allocated from:
> >     #0 0x5b9838 in operator new(unsigned long)
> > /repo/uabkaka/tmp/llvm/projects/compiler-
> > rt/lib/asan/asan_new_delete.cc:106
> >     #1 0xc50879 in make_unique<clang::clangd::RefSlab,
> > clang::clangd::RefSlab>
> > /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:3132:28
> >     #2 0xc50879 in refSlab /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../../clang-tools-extra/clangd/unittests/FileIndexTests.cpp:87
> >     #3 0xc50879 in clang::clangd::(anonymous
> > namespace)::FileShardedIndexTest_Sharding_Test::TestBody()
> > /repo/bbiswjenk/fem023-eiffel003/workspace/llvm/llvm-master-
> > sanitize-
> > asan/llvm/build-all-asan/../../clang-tools-
> > extra/clangd/unittests/FileIndexTests.cpp:535
> >     #4 0x1620190 in
> > HandleExceptionsInMethodIfSupported<testing::Test,
> > void> /repo/bbiswjenk/fem023-eiffel003/workspace/llvm/llvm-master-
> > sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc
> >     #5 0x1620190 in testing::Test::Run() /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc:2474
> >     #6 0x1623875 in testing::TestInfo::Run()
> > /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc:2656:11
> >     #7 0x1624cf0 in testing::TestCase::Run()
> > /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc:2774:28
> >     #8 0x1643714 in testing::internal::UnitTestImpl::RunAllTests()
> > /repo/bbiswjenk/fem023-eiffel003/workspace/llvm/llvm-master-
> > sanitize-
> > asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc:4649:43
> >     #9 0x16428c0 in
> > HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl
> > ,
> > bool> /repo/bbiswjenk/fem023-eiffel003/workspace/llvm/llvm-master-
> > sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc
> >     #10 0x16428c0 in testing::UnitTest::Run()
> > /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/src/gtest.cc:4257
> >     #11 0x16048f0 in RUN_ALL_TESTS /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/googletest/include/gtest/gtest.h:2233:46
> >     #12 0x16048f0 in main /repo/bbiswjenk/fem023-
> > eiffel003/workspace/llvm/llvm-master-sanitize-asan/llvm/build-all-
> > asan/../utils/unittest/UnitTestMain/TestMain.cpp:50
> >     #13 0x7f558b333544 in __libc_start_main
> > (/lib64/libc.so.6+0x22544)
> > 
> > SUMMARY: AddressSanitizer: 136 byte(s) leaked in 1 allocation(s).
> > 
> > /Mikael
> > 
> > On Wed, 2020-04-15 at 00:15 -0700, Kadir Cetinkaya via cfe-commits
> > wrote:
> > > Author: Kadir Cetinkaya
> > > Date: 2020-04-15T09:10:10+02:00
> > > New Revision: dffa9dfbda56820c02e357ad34c24ce8759b4d26
> > > 
> > > URL: 
> > > 
> > https://protect2.fireeye.com/v1/url?k=1ed901c9-4253d508-1ed94152-863d9bcb726f-37cce004cef08ce5&q=1&e=d51603fa-ef7d-4c66-99fd-b868e84ed659&u=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-project%2Fcommit%2Fdffa9dfbda56820c02e357ad34c24ce8759b4d26
> > > DIFF: 
> > > 
> > https://protect2.fireeye.com/v1/url?k=27955bfa-7b1f8f3b-27951b61-863d9bcb726f-307938d8468e38e8&q=1&e=d51603fa-ef7d-4c66-99fd-b868e84ed659&u=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-project%2Fcommit%2Fdffa9dfbda56820c02e357ad34c24ce8759b4d26.diff
> > > 
> > > LOG: [clangd] Shard preamble symbols in dynamic index
> > > 
> > > Summary:
> > > This reduces memory usage by dynamic index from more than 400MB
> > to
> > > 32MB
> > > when all files in clang-tools-extra/clangd/*.cpp are active in
> > > clangd.
> > > 
> > > Reviewers: sammccall
> > > 
> > > Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman,
> > usaxena95,
> > > cfe-commits
> > > 
> > > Tags: #clang
> > > 
> > > Differential Revision: https://reviews.llvm.org/D77732
> > > 
> > > Added: 
> > >     
> > > 
> > > Modified: 
> > >     clang-tools-extra/clangd/index/Background.cpp
> > >     clang-tools-extra/clangd/index/FileIndex.cpp
> > >     clang-tools-extra/clangd/index/FileIndex.h
> > >     clang-tools-extra/clangd/index/SymbolCollector.cpp
> > >     clang-tools-extra/clangd/unittests/FileIndexTests.cpp
> > >     clang-tools-extra/clangd/unittests/TestTU.cpp
> > > 
> > > Removed: 
> > >     
> > > 
> > > 
> > >
> > ###################################################################
> > ##
> > > ###########
> > > diff  --git a/clang-tools-extra/clangd/index/Background.cpp
> > b/clang-
> > > tools-extra/clangd/index/Background.cpp
> > > index 1c26dd48093e..4c5719d0526c 100644
> > > --- a/clang-tools-extra/clangd/index/Background.cpp
> > > +++ b/clang-tools-extra/clangd/index/Background.cpp
> > > @@ -61,51 +61,6 @@ namespace clang {
> > >  namespace clangd {
> > >  namespace {
> > >  
> > > -// Resolves URI to file paths with cache.
> > > -class URIToFileCache {
> > > -public:
> > > -  URIToFileCache(llvm::StringRef HintPath) : HintPath(HintPath)
> > {}
> > > -
> > > -  llvm::StringRef resolve(llvm::StringRef FileURI) {
> > > -    auto I = URIToPathCache.try_emplace(FileURI);
> > > -    if (I.second) {
> > > -      auto Path = URI::resolve(FileURI, HintPath);
> > > -      if (!Path) {
> > > -        elog("Failed to resolve URI {0}: {1}", FileURI,
> > > Path.takeError());
> > > -        assert(false && "Failed to resolve URI");
> > > -        return "";
> > > -      }
> > > -      I.first->second = *Path;
> > > -    }
> > > -    return I.first->second;
> > > -  }
> > > -
> > > -private:
> > > -  std::string HintPath;
> > > -  llvm::StringMap<std::string> URIToPathCache;
> > > -};
> > > -
> > > -// We keep only the node "U" and its edges. Any node other than
> > "U"
> > > will be
> > > -// empty in the resultant graph.
> > > -IncludeGraph getSubGraph(const URI &U, const IncludeGraph
> > > &FullGraph) {
> > > -  IncludeGraph IG;
> > > -
> > > -  std::string FileURI = U.toString();
> > > -  auto Entry = IG.try_emplace(FileURI).first;
> > > -  auto &Node = Entry->getValue();
> > > -  Node = FullGraph.lookup(Entry->getKey());
> > > -  Node.URI = Entry->getKey();
> > > -
> > > -  // URIs inside nodes must point into the keys of the same
> > > IncludeGraph.
> > > -  for (auto &Include : Node.DirectIncludes) {
> > > -    auto I = IG.try_emplace(Include).first;
> > > -    I->getValue().URI = I->getKey();
> > > -    Include = I->getKey();
> > > -  }
> > > -
> > > -  return IG;
> > > -}
> > > -
> > >  // We cannot use vfs->makeAbsolute because Cmd.FileName is
> > either
> > > absolute or
> > >  // relative to Cmd.Directory, which might not be the same as
> > current
> > > working
> > >  // directory.
> > > @@ -219,108 +174,44 @@ void BackgroundIndex::update(
> > >      llvm::StringRef MainFile, IndexFileIn Index,
> > >      const llvm::StringMap<ShardVersion> &ShardVersionsSnapshot,
> > >      bool HadErrors) {
> > > -  // Partition symbols/references into files.
> > > -  struct File {
> > > -    llvm::DenseSet<const Symbol *> Symbols;
> > > -    llvm::DenseSet<const Ref *> Refs;
> > > -    llvm::DenseSet<const Relation *> Relations;
> > > -    FileDigest Digest;
> > > -  };
> > > -  llvm::StringMap<File> Files;
> > > -  URIToFileCache URICache(MainFile);
> > > +  llvm::StringMap<FileDigest> FilesToUpdate;
> > >    for (const auto &IndexIt : *Index.Sources) {
> > >      const auto &IGN = IndexIt.getValue();
> > >      // Note that sources do not contain any information
> > regarding
> > > missing
> > >      // headers, since we don't even know what absolute path they
> > > should fall in.
> > > -    const auto AbsPath = URICache.resolve(IGN.URI);
> > > +    auto AbsPath = llvm::cantFail(URI::resolve(IGN.URI,
> > MainFile),
> > > +                                  "Failed to resovle URI");
> > >      const auto DigestIt = ShardVersionsSnapshot.find(AbsPath);
> > >      // File has 
> > > diff erent contents, or indexing was successful this time.
> > >      if (DigestIt == ShardVersionsSnapshot.end() ||
> > >          DigestIt->getValue().Digest != IGN.Digest ||
> > >          (DigestIt->getValue().HadErrors && !HadErrors))
> > > -      Files.try_emplace(AbsPath).first->getValue().Digest =
> > > IGN.Digest;
> > > -  }
> > > -  // This map is used to figure out where to store relations.
> > > -  llvm::DenseMap<SymbolID, File *> SymbolIDToFile;
> > > -  for (const auto &Sym : *Index.Symbols) {
> > > -    if (Sym.CanonicalDeclaration) {
> > > -      auto DeclPath =
> > > URICache.resolve(Sym.CanonicalDeclaration.FileURI);
> > > -      const auto FileIt = Files.find(DeclPath);
> > > -      if (FileIt != Files.end()) {
> > > -        FileIt->second.Symbols.insert(&Sym);
> > > -        SymbolIDToFile[Sym.ID] = &FileIt->second;
> > > -      }
> > > -    }
> > > -    // For symbols with 
> > > diff erent declaration and definition locations, we store
> > > -    // the full symbol in both the header file and the
> > > implementation file, so
> > > -    // that merging can tell the preferred symbols (from
> > canonical
> > > headers) from
> > > -    // other symbols (e.g. forward declarations).
> > > -    if (Sym.Definition &&
> > > -        Sym.Definition.FileURI !=
> > Sym.CanonicalDeclaration.FileURI)
> > > {
> > > -      auto DefPath = URICache.resolve(Sym.Definition.FileURI);
> > > -      const auto FileIt = Files.find(DefPath);
> > > -      if (FileIt != Files.end())
> > > -        FileIt->second.Symbols.insert(&Sym);
> > > -    }
> > > -  }
> > > -  llvm::DenseMap<const Ref *, SymbolID> RefToIDs;
> > > -  for (const auto &SymRefs : *Index.Refs) {
> > > -    for (const auto &R : SymRefs.second) {
> > > -      auto Path = URICache.resolve(R.Location.FileURI);
> > > -      const auto FileIt = Files.find(Path);
> > > -      if (FileIt != Files.end()) {
> > > -        auto &F = FileIt->getValue();
> > > -        RefToIDs[&R] = SymRefs.first;
> > > -        F.Refs.insert(&R);
> > > -      }
> > > -    }
> > > -  }
> > > -  for (const auto &Rel : *Index.Relations) {
> > > -    const auto FileIt = SymbolIDToFile.find(Rel.Subject);
> > > -    if (FileIt != SymbolIDToFile.end())
> > > -      FileIt->second->Relations.insert(&Rel);
> > > +      FilesToUpdate[AbsPath] = IGN.Digest;
> > >    }
> > >  
> > > -  // Build and store new slabs for each updated file.
> > > -  for (const auto &FileIt : Files) {
> > > -    llvm::StringRef Path = FileIt.getKey();
> > > -    SymbolSlab::Builder Syms;
> > > -    RefSlab::Builder Refs;
> > > -    RelationSlab::Builder Relations;
> > > -    for (const auto *S : FileIt.second.Symbols)
> > > -      Syms.insert(*S);
> > > -    for (const auto *R : FileIt.second.Refs)
> > > -      Refs.insert(RefToIDs[R], *R);
> > > -    for (const auto *Rel : FileIt.second.Relations)
> > > -      Relations.insert(*Rel);
> > > -    auto SS =
> > std::make_unique<SymbolSlab>(std::move(Syms).build());
> > > -    auto RS =
> > std::make_unique<RefSlab>(std::move(Refs).build());
> > > -    auto RelS =
> > > std::make_unique<RelationSlab>(std::move(Relations).build());
> > > -    auto IG = std::make_unique<IncludeGraph>(
> > > -        getSubGraph(URI::create(Path),
> > Index.Sources.getValue()));
> > > +  // Shard slabs into files.
> > > +  FileShardedIndex ShardedIndex(std::move(Index), MainFile);
> > >  
> > > -    // We need to store shards before updating the index, since
> > the
> > > latter
> > > -    // consumes slabs.
> > > -    // FIXME: Also skip serializing the shard if it is already
> > up-
> > > to-date.
> > > -    BackgroundIndexStorage *IndexStorage =
> > > IndexStorageFactory(Path);
> > > -    IndexFileOut Shard;
> > > -    Shard.Symbols = SS.get();
> > > -    Shard.Refs = RS.get();
> > > -    Shard.Relations = RelS.get();
> > > -    Shard.Sources = IG.get();
> > > +  // Build and store new slabs for each updated file.
> > > +  for (const auto &FileIt : FilesToUpdate) {
> > > +    PathRef Path = FileIt.first();
> > > +    auto IF = ShardedIndex.getShard(Path);
> > >  
> > >      // Only store command line hash for main files of the TU,
> > since
> > > our
> > >      // current model keeps only one version of a header file.
> > > -    if (Path == MainFile)
> > > -      Shard.Cmd = Index.Cmd.getPointer();
> > > +    if (Path != MainFile)
> > > +      IF.Cmd.reset();
> > >  
> > > -    if (auto Error = IndexStorage->storeShard(Path, Shard))
> > > +    // We need to store shards before updating the index, since
> > the
> > > latter
> > > +    // consumes slabs.
> > > +    // FIXME: Also skip serializing the shard if it is already
> > up-
> > > to-date.
> > > +    if (auto Error = IndexStorageFactory(Path)->storeShard(Path,
> > > IF))
> > >        elog("Failed to write background-index shard for file {0}:
> > > {1}", Path,
> > >             std::move(Error));
> > >  
> > >      {
> > >        std::lock_guard<std::mutex> Lock(ShardVersionsMu);
> > > -      auto Hash = FileIt.second.Digest;
> > > +      const auto &Hash = FileIt.getValue();
> > >        auto DigestIt = ShardVersions.try_emplace(Path);
> > >        ShardVersion &SV = DigestIt.first->second;
> > >        // Skip if file is already up to date, unless previous
> > index
> > > was broken
> > > @@ -333,8 +224,11 @@ void BackgroundIndex::update(
> > >        // This can override a newer version that is added in
> > another
> > > thread, if
> > >        // this thread sees the older version but finishes later.
> > This
> > > should be
> > >        // rare in practice.
> > > -      IndexedSymbols.update(Path, std::move(SS), std::move(RS),
> > > std::move(RelS),
> > > -                            Path == MainFile);
> > > +      IndexedSymbols.update(
> > > +          Path,
> > > std::make_unique<SymbolSlab>(std::move(*IF.Symbols)),
> > > +          std::make_unique<RefSlab>(std::move(*IF.Refs)),
> > > +         
> > std::make_unique<RelationSlab>(std::move(*IF.Relations)),
> > > +          Path == MainFile);
> > >      }
> > >    }
> > >  }
> > > 
> > > diff  --git a/clang-tools-extra/clangd/index/FileIndex.cpp
> > b/clang-
> > > tools-extra/clangd/index/FileIndex.cpp
> > > index 92e7316105c7..d644268b8da9 100644
> > > --- a/clang-tools-extra/clangd/index/FileIndex.cpp
> > > +++ b/clang-tools-extra/clangd/index/FileIndex.cpp
> > > @@ -10,11 +10,17 @@
> > >  #include "CollectMacros.h"
> > >  #include "Logger.h"
> > >  #include "ParsedAST.h"
> > > +#include "Path.h"
> > >  #include "SymbolCollector.h"
> > >  #include "index/CanonicalIncludes.h"
> > >  #include "index/Index.h"
> > >  #include "index/MemIndex.h"
> > >  #include "index/Merge.h"
> > > +#include "index/Ref.h"
> > > +#include "index/Relation.h"
> > > +#include "index/Serialization.h"
> > > +#include "index/Symbol.h"
> > > +#include "index/SymbolID.h"
> > >  #include "index/SymbolOrigin.h"
> > >  #include "index/dex/Dex.h"
> > >  #include "clang/AST/ASTContext.h"
> > > @@ -23,19 +29,24 @@
> > >  #include "clang/Lex/MacroInfo.h"
> > >  #include "clang/Lex/Preprocessor.h"
> > >  #include "llvm/ADT/DenseMap.h"
> > > -#include "llvm/ADT/DenseSet.h"
> > >  #include "llvm/ADT/STLExtras.h"
> > > +#include "llvm/ADT/StringMap.h"
> > >  #include "llvm/ADT/StringRef.h"
> > > +#include "llvm/Support/Error.h"
> > >  #include <memory>
> > > +#include <tuple>
> > > +#include <utility>
> > > +#include <vector>
> > >  
> > >  namespace clang {
> > >  namespace clangd {
> > > +namespace {
> > >  
> > > -static SlabTuple indexSymbols(ASTContext &AST,
> > > std::shared_ptr<Preprocessor> PP,
> > > -                              llvm::ArrayRef<Decl *>
> > DeclsToIndex,
> > > -                              const MainFileMacros
> > > *MacroRefsToIndex,
> > > -                              const CanonicalIncludes &Includes,
> > > -                              bool IsIndexMainAST,
> > llvm::StringRef
> > > Version) {
> > > +SlabTuple indexSymbols(ASTContext &AST,
> > > std::shared_ptr<Preprocessor> PP,
> > > +                       llvm::ArrayRef<Decl *> DeclsToIndex,
> > > +                       const MainFileMacros *MacroRefsToIndex,
> > > +                       const CanonicalIncludes &Includes, bool
> > > IsIndexMainAST,
> > > +                       llvm::StringRef Version) {
> > >    SymbolCollector::Options CollectorOpts;
> > >    CollectorOpts.CollectIncludePath = true;
> > >    CollectorOpts.Includes = &Includes;
> > > @@ -85,6 +96,131 @@ static SlabTuple indexSymbols(ASTContext
> > &AST,
> > > std::shared_ptr<Preprocessor> PP,
> > >                           std::move(Relations));
> > >  }
> > >  
> > > +// Resolves URI to file paths with cache.
> > > +class URIToFileCache {
> > > +public:
> > > +  URIToFileCache(PathRef HintPath) : HintPath(HintPath) {}
> > > +
> > > +  llvm::StringRef operator[](llvm::StringRef FileURI) {
> > > +    if (FileURI.empty())
> > > +      return "";
> > > +    auto I = URIToPathCache.try_emplace(FileURI);
> > > +    if (I.second) {
> > > +      I.first->second = llvm::cantFail(URI::resolve(FileURI,
> > > HintPath),
> > > +                                       "Failed to resolve URI");
> > > +    }
> > > +    return I.first->second;
> > > +  }
> > > +
> > > +private:
> > > +  PathRef HintPath;
> > > +  llvm::StringMap<std::string> URIToPathCache;
> > > +};
> > > +
> > > +// We keep only the node "U" and its edges. Any node other than
> > "U"
> > > will be
> > > +// empty in the resultant graph.
> > > +IncludeGraph getSubGraph(llvm::StringRef URI, const IncludeGraph
> > > &FullGraph) {
> > > +  IncludeGraph IG;
> > > +
> > > +  auto Entry = IG.try_emplace(URI).first;
> > > +  auto &Node = Entry->getValue();
> > > +  Node = FullGraph.lookup(Entry->getKey());
> > > +  Node.URI = Entry->getKey();
> > > +
> > > +  // URIs inside nodes must point into the keys of the same
> > > IncludeGraph.
> > > +  for (auto &Include : Node.DirectIncludes) {
> > > +    auto I = IG.try_emplace(Include).first;
> > > +    I->getValue().URI = I->getKey();
> > > +    Include = I->getKey();
> > > +  }
> > > +  return IG;
> > > +}
> > > +} // namespace
> > > +
> > > +FileShardedIndex::FileShardedIndex(IndexFileIn Input, PathRef
> > > HintPath)
> > > +    : Index(std::move(Input)) {
> > > +  URIToFileCache UriToFile(HintPath);
> > > +  // Used to build RelationSlabs.
> > > +  llvm::DenseMap<SymbolID, FileShard *> SymbolIDToFile;
> > > +
> > > +  // Attribute each Symbol to both their declaration and
> > definition
> > > locations.
> > > +  if (Index.Symbols) {
> > > +    for (const auto &S : *Index.Symbols) {
> > > +      auto File = UriToFile[S.CanonicalDeclaration.FileURI];
> > > +      auto It = Shards.try_emplace(File);
> > > +      It.first->getValue().Symbols.insert(&S);
> > > +      SymbolIDToFile[S.ID] = &It.first->getValue();
> > > +      // Only bother if definition file is 
> > > diff erent than declaration file.
> > > +      if (S.Definition &&
> > > +          S.Definition.FileURI !=
> > S.CanonicalDeclaration.FileURI) {
> > > +        auto File = UriToFile[S.Definition.FileURI];
> > > +        auto It = Shards.try_emplace(File);
> > > +        It.first->getValue().Symbols.insert(&S);
> > > +      }
> > > +    }
> > > +  }
> > > +  // Attribute references into each file they occured in.
> > > +  if (Index.Refs) {
> > > +    for (const auto &SymRefs : *Index.Refs) {
> > > +      for (const auto &R : SymRefs.second) {
> > > +        auto File = UriToFile[R.Location.FileURI];
> > > +        const auto It = Shards.try_emplace(File);
> > > +        It.first->getValue().Refs.insert(&R);
> > > +        RefToSymID[&R] = SymRefs.first;
> > > +      }
> > > +    }
> > > +  }
> > > +  // Attribute relations to the file declaraing their Subject as
> > > Object might
> > > +  // not have been indexed, see
> > SymbolCollector::processRelations
> > > for details.
> > > +  if (Index.Relations) {
> > > +    for (const auto &R : *Index.Relations) {
> > > +      auto *File = SymbolIDToFile.lookup(R.Subject);
> > > +      assert(File && "unknown subject in relation");
> > > +      File->Relations.insert(&R);
> > > +    }
> > > +  }
> > > +  // Store only the direct includes of a file in a shard.
> > > +  if (Index.Sources) {
> > > +    const auto &FullGraph = *Index.Sources;
> > > +    for (const auto &It : FullGraph) {
> > > +      auto File = UriToFile[It.first()];
> > > +      auto ShardIt = Shards.try_emplace(File);
> > > +      ShardIt.first->getValue().IG = getSubGraph(It.first(),
> > > FullGraph);
> > > +    }
> > > +  }
> > > +}
> > > +std::vector<PathRef> FileShardedIndex::getAllFiles() const {
> > > +  return {Shards.keys().begin(), Shards.keys().end()};
> > > +}
> > > +
> > > +IndexFileIn FileShardedIndex::getShard(PathRef File) const {
> > > +  auto It = Shards.find(File);
> > > +  assert(It != Shards.end() && "received unknown file");
> > > +
> > > +  IndexFileIn IF;
> > > +  IF.Sources = It->getValue().IG;
> > > +  IF.Cmd = Index.Cmd;
> > > +
> > > +  SymbolSlab::Builder SymB;
> > > +  for (const auto *S : It->getValue().Symbols)
> > > +    SymB.insert(*S);
> > > +  IF.Symbols = std::move(SymB).build();
> > > +
> > > +  RefSlab::Builder RefB;
> > > +  for (const auto *Ref : It->getValue().Refs) {
> > > +    auto SID = RefToSymID.lookup(Ref);
> > > +    RefB.insert(SID, *Ref);
> > > +  }
> > > +  IF.Refs = std::move(RefB).build();
> > > +
> > > +  RelationSlab::Builder RelB;
> > > +  for (const auto *Rel : It->getValue().Relations) {
> > > +    RelB.insert(*Rel);
> > > +  }
> > > +  IF.Relations = std::move(RelB).build();
> > > +  return IF;
> > > +}
> > > +
> > >  SlabTuple indexMainDecls(ParsedAST &AST) {
> > >    return indexSymbols(AST.getASTContext(),
> > AST.getPreprocessorPtr(),
> > >                        AST.getLocalTopLevelDecls(),
> > &AST.getMacros(),
> > > @@ -254,15 +390,24 @@ void FileIndex::updatePreamble(PathRef
> > Path,
> > > llvm::StringRef Version,
> > >                                 ASTContext &AST,
> > >                                 std::shared_ptr<Preprocessor> PP,
> > >                                 const CanonicalIncludes
> > &Includes) {
> > > -  auto Slabs = indexHeaderSymbols(Version, AST, std::move(PP),
> > > Includes);
> > > -  PreambleSymbols.update(
> > > -      Path,
> > > std::make_unique<SymbolSlab>(std::move(std::get<0>(Slabs))),
> > > -      std::make_unique<RefSlab>(),
> > > -     
> > std::make_unique<RelationSlab>(std::move(std::get<2>(Slabs))),
> > > -      /*CountReferences=*/false);
> > > +  IndexFileIn IF;
> > > +  std::tie(IF.Symbols, std::ignore, IF.Relations) =
> > > +      indexHeaderSymbols(Version, AST, std::move(PP), Includes);
> > > +  FileShardedIndex ShardedIndex(std::move(IF), Path);
> > > +  for (PathRef File : ShardedIndex.getAllFiles()) {
> > > +    auto IF = ShardedIndex.getShard(File);
> > > +    PreambleSymbols.update(
> > > +        File,
> > std::make_unique<SymbolSlab>(std::move(*IF.Symbols)),
> > > +        std::make_unique<RefSlab>(),
> > > +       
> > std::make_unique<RelationSlab>(std::move(*IF.Relations)),
> > > +        /*CountReferences=*/false);
> > > +  }
> > >    PreambleIndex.reset(
> > >        PreambleSymbols.buildIndex(UseDex ? IndexType::Heavy :
> > > IndexType::Light,
> > >                                   DuplicateHandling::PickOne));
> > > +  vlog("Build dynamic index for header symbols with estimated
> > memory
> > > usage of "
> > > +       "{0} bytes",
> > > +       PreambleIndex.estimateMemoryUsage());
> > >  }
> > >  
> > >  void FileIndex::updateMain(PathRef Path, ParsedAST &AST) {
> > > @@ -274,6 +419,9 @@ void FileIndex::updateMain(PathRef Path,
> > > ParsedAST &AST) {
> > >        /*CountReferences=*/true);
> > >    MainFileIndex.reset(
> > >        MainFileSymbols.buildIndex(IndexType::Light,
> > > DuplicateHandling::Merge));
> > > +  vlog("Build dynamic index for main-file symbols with estimated
> > > memory usage "
> > > +       "of {0} bytes",
> > > +       MainFileIndex.estimateMemoryUsage());
> > >  }
> > >  
> > >  } // namespace clangd
> > > 
> > > diff  --git a/clang-tools-extra/clangd/index/FileIndex.h b/clang-
> > > tools-extra/clangd/index/FileIndex.h
> > > index ec44be9b89c2..539f232a7523 100644
> > > --- a/clang-tools-extra/clangd/index/FileIndex.h
> > > +++ b/clang-tools-extra/clangd/index/FileIndex.h
> > > @@ -15,14 +15,24 @@
> > >  #ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_FILEINDEX_H
> > >  #define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_FILEINDEX_H
> > >  
> > > +#include "Headers.h"
> > >  #include "Index.h"
> > >  #include "MemIndex.h"
> > >  #include "Merge.h"
> > >  #include "Path.h"
> > >  #include "index/CanonicalIncludes.h"
> > > +#include "index/Ref.h"
> > > +#include "index/Relation.h"
> > > +#include "index/Serialization.h"
> > >  #include "index/Symbol.h"
> > >  #include "clang/Lex/Preprocessor.h"
> > > +#include "clang/Tooling/CompilationDatabase.h"
> > > +#include "llvm/ADT/DenseSet.h"
> > > +#include "llvm/ADT/Optional.h"
> > > +#include "llvm/ADT/StringMap.h"
> > > +#include "llvm/ADT/StringRef.h"
> > >  #include <memory>
> > > +#include <vector>
> > >  
> > >  namespace clang {
> > >  class ASTContext;
> > > @@ -109,17 +119,15 @@ class FileIndex : public MergedIndex {
> > >  private:
> > >    bool UseDex; // FIXME: this should be always on.
> > >  
> > > -  // Contains information from each file's preamble only.
> > > -  // These are large, but update fairly infrequently (preambles
> > are
> > > stable).
> > > +  // Contains information from each file's preamble only.
> > Symbols
> > > and relations
> > > +  // are sharded per declaration file to deduplicate multiple
> > > symbols and reduce
> > > +  // memory usage.
> > >    // Missing information:
> > >    //  - symbol refs (these are always "from the main file")
> > >    //  - definition locations in the main file
> > >    //
> > > -  // FIXME: Because the preambles for 
> > > diff erent TUs have large overlap and
> > > -  // FileIndex doesn't deduplicate, this uses lots of extra RAM.
> > > -  // The biggest obstacle in fixing this: the obvious approach
> > of
> > > partitioning
> > > -  // by declaring file (rather than main file) fails if headers
> > > provide
> > > -  // 
> > > diff erent symbols based on preprocessor state.
> > > +  // Note that we store only one version of a header, hence
> > symbols
> > > appearing in
> > > +  // 
> > > diff erent PP states will be missing.
> > >    FileSymbols PreambleSymbols;
> > >    SwapIndex PreambleIndex;
> > >  
> > > @@ -146,6 +154,43 @@ SlabTuple indexHeaderSymbols(llvm::StringRef
> > > Version, ASTContext &AST,
> > >                               std::shared_ptr<Preprocessor> PP,
> > >                               const CanonicalIncludes &Includes);
> > >  
> > > +/// Takes slabs coming from a TU (multiple files) and shards
> > them
> > > per
> > > +/// declaration location.
> > > +struct FileShardedIndex {
> > > +  /// \p HintPath is used to convert file URIs stored in symbols
> > > into absolute
> > > +  /// paths.
> > > +  explicit FileShardedIndex(IndexFileIn Input, PathRef
> > HintPath);
> > > +
> > > +  /// Returns absolute paths for all files that has a shard.
> > > +  std::vector<PathRef> getAllFiles() const;
> > > +
> > > +  /// Generates index shard for the \p File. Note that this
> > function
> > > results in
> > > +  /// a copy of all the relevant data.
> > > +  /// Returned index will always have Symbol/Refs/Relation Slabs
> > > set, even if
> > > +  /// they are empty.
> > > +  IndexFileIn getShard(PathRef File) const;
> > > +
> > > +private:
> > > +  // Contains all the information that belongs to a single file.
> > > +  struct FileShard {
> > > +    // Either declared or defined in the file.
> > > +    llvm::DenseSet<const Symbol *> Symbols;
> > > +    // Reference occurs in the file.
> > > +    llvm::DenseSet<const Ref *> Refs;
> > > +    // Subject is declared in the file.
> > > +    llvm::DenseSet<const Relation *> Relations;
> > > +    // Contains edges for only the direct includes.
> > > +    IncludeGraph IG;
> > > +  };
> > > +
> > > +  // Keeps all the information alive.
> > > +  const IndexFileIn Index;
> > > +  // Mapping from absolute paths to slab information.
> > > +  llvm::StringMap<FileShard> Shards;
> > > +  // Used to build RefSlabs.
> > > +  llvm::DenseMap<const Ref *, SymbolID> RefToSymID;
> > > +};
> > > +
> > >  } // namespace clangd
> > >  } // namespace clang
> > >  
> > > 
> > > diff  --git a/clang-tools-extra/clangd/index/SymbolCollector.cpp
> > > b/clang-tools-extra/clangd/index/SymbolCollector.cpp
> > > index 06190914ee8d..471061672107 100644
> > > --- a/clang-tools-extra/clangd/index/SymbolCollector.cpp
> > > +++ b/clang-tools-extra/clangd/index/SymbolCollector.cpp
> > > @@ -283,6 +283,18 @@ bool SymbolCollector::handleDeclOccurrence(
> > >    if (!ID)
> > >      return true;
> > >  
> > > +  // ND is the canonical (i.e. first) declaration. If it's in
> > the
> > > main file
> > > +  // (which is not a header), then no public declaration was
> > > visible, so assume
> > > +  // it's main-file only.
> > > +  bool IsMainFileOnly =
> > > +      SM.isWrittenInMainFile(SM.getExpansionLoc(ND-
> > >getBeginLoc()))
> > > &&
> > > +      !isHeaderFile(SM.getFileEntryForID(SM.getMainFileID())-
> > > >getName(),
> > > +                    ASTCtx->getLangOpts());
> > > +  // In C, printf is a redecl of an implicit builtin! So check
> > OrigD
> > > instead.
> > > +  if (ASTNode.OrigD->isImplicit() ||
> > > +      !shouldCollectSymbol(*ND, *ASTCtx, Opts, IsMainFileOnly))
> > > +    return true;
> > > +
> > >    // Note: we need to process relations for all decl
> > occurrences,
> > > including
> > >    // refs, because the indexing code only populates relations
> > for
> > > specific
> > >    // occurrences. For example, RelationBaseOf is only populated
> > for
> > > the
> > > @@ -297,17 +309,6 @@ bool SymbolCollector::handleDeclOccurrence(
> > >    if (IsOnlyRef && !CollectRef)
> > >      return true;
> > >  
> > > -  // ND is the canonical (i.e. first) declaration. If it's in
> > the
> > > main file
> > > -  // (which is not a header), then no public declaration was
> > > visible, so assume
> > > -  // it's main-file only.
> > > -  bool IsMainFileOnly =
> > > -      SM.isWrittenInMainFile(SM.getExpansionLoc(ND-
> > >getBeginLoc()))
> > > &&
> > > -      !isHeaderFile(SM.getFileEntryForID(SM.getMainFileID())-
> > > >getName(),
> > > -                    ASTCtx->getLangOpts());
> > > -  // In C, printf is a redecl of an implicit builtin! So check
> > OrigD
> > > instead.
> > > -  if (ASTNode.OrigD->isImplicit() ||
> > > -      !shouldCollectSymbol(*ND, *ASTCtx, Opts, IsMainFileOnly))
> > > -    return true;
> > >    // Do not store references to main-file symbols.
> > >    // Unlike other fields, e.g. Symbols (which use spelling
> > > locations), we use
> > >    // file locations for references (as it aligns the behavior of
> > > clangd's
> > > 
> > > diff  --git a/clang-tools-
> > extra/clangd/unittests/FileIndexTests.cpp
> > > b/clang-tools-extra/clangd/unittests/FileIndexTests.cpp
> > > index 5203fb67307c..dc39ad2acf25 100644
> > > --- a/clang-tools-extra/clangd/unittests/FileIndexTests.cpp
> > > +++ b/clang-tools-extra/clangd/unittests/FileIndexTests.cpp
> > > @@ -9,13 +9,19 @@
> > >  #include "AST.h"
> > >  #include "Annotations.h"
> > >  #include "Compiler.h"
> > > +#include "Headers.h"
> > >  #include "ParsedAST.h"
> > >  #include "SyncAPI.h"
> > >  #include "TestFS.h"
> > >  #include "TestTU.h"
> > > +#include "URI.h"
> > >  #include "index/CanonicalIncludes.h"
> > >  #include "index/FileIndex.h"
> > >  #include "index/Index.h"
> > > +#include "index/Ref.h"
> > > +#include "index/Relation.h"
> > > +#include "index/Serialization.h"
> > > +#include "index/Symbol.h"
> > >  #include "clang/Frontend/CompilerInvocation.h"
> > >  #include "clang/Frontend/Utils.h"
> > >  #include "clang/Index/IndexSymbol.h"
> > > @@ -23,6 +29,7 @@
> > >  #include "clang/Tooling/CompilationDatabase.h"
> > >  #include "gmock/gmock.h"
> > >  #include "gtest/gtest.h"
> > > +#include <utility>
> > >  
> > >  using ::testing::_;
> > >  using ::testing::AllOf;
> > > @@ -151,8 +158,9 @@ void update(FileIndex &M, llvm::StringRef
> > > Basename, llvm::StringRef Code) {
> > >    File.HeaderFilename = (Basename + ".h").str();
> > >    File.HeaderCode = std::string(Code);
> > >    auto AST = File.build();
> > > -  M.updatePreamble(File.Filename, /*Version=*/"null",
> > > AST.getASTContext(),
> > > -                   AST.getPreprocessorPtr(),
> > > AST.getCanonicalIncludes());
> > > +  M.updatePreamble(testPath(File.Filename), /*Version=*/"null",
> > > +                   AST.getASTContext(),
> > AST.getPreprocessorPtr(),
> > > +                   AST.getCanonicalIncludes());
> > >  }
> > >  
> > >  TEST(FileIndexTest, CustomizedURIScheme) {
> > > @@ -393,8 +401,9 @@ TEST(FileIndexTest, Relations) {
> > >    TU.HeaderCode = "class A {}; class B : public A {};";
> > >    auto AST = TU.build();
> > >    FileIndex Index;
> > > -  Index.updatePreamble(TU.Filename, /*Version=*/"null",
> > > AST.getASTContext(),
> > > -                       AST.getPreprocessorPtr(),
> > > AST.getCanonicalIncludes());
> > > +  Index.updatePreamble(testPath(TU.Filename),
> > /*Version=*/"null",
> > > +                       AST.getASTContext(),
> > > AST.getPreprocessorPtr(),
> > > +                       AST.getCanonicalIncludes());
> > >    SymbolID A = findSymbol(TU.headerSymbols(), "A").ID;
> > >    uint32_t Results = 0;
> > >    RelationsRequest Req;
> > > @@ -477,6 +486,128 @@ TEST(FileSymbolsTest,
> > > CountReferencesWithRefSlabs) {
> > >                             AllOf(QName("2"), NumReferences(1u)),
> > >                             AllOf(QName("3"),
> > NumReferences(1u))));
> > >  }
> > > +
> > > +TEST(FileIndexTest, StalePreambleSymbolsDeleted) {
> > > +  FileIndex M;
> > > +  TestTU File;
> > > +  File.HeaderFilename = "a.h";
> > > +
> > > +  File.Filename = "f1.cpp";
> > > +  File.HeaderCode = "int a;";
> > > +  auto AST = File.build();
> > > +  M.updatePreamble(testPath(File.Filename), /*Version=*/"null",
> > > +                   AST.getASTContext(),
> > AST.getPreprocessorPtr(),
> > > +                   AST.getCanonicalIncludes());
> > > +  EXPECT_THAT(runFuzzyFind(M, ""),
> > > UnorderedElementsAre(QName("a")));
> > > +
> > > +  File.Filename = "f2.cpp";
> > > +  File.HeaderCode = "int b;";
> > > +  AST = File.build();
> > > +  M.updatePreamble(testPath(File.Filename), /*Version=*/"null",
> > > +                   AST.getASTContext(),
> > AST.getPreprocessorPtr(),
> > > +                   AST.getCanonicalIncludes());
> > > +  EXPECT_THAT(runFuzzyFind(M, ""),
> > > UnorderedElementsAre(QName("b")));
> > > +}
> > > +
> > > +TEST(FileShardedIndexTest, Sharding) {
> > > +  auto AHeaderUri = URI::create(testPath("a.h")).toString();
> > > +  auto BHeaderUri = URI::create(testPath("b.h")).toString();
> > > +  auto BSourceUri = URI::create(testPath("b.cc")).toString();
> > > +
> > > +  auto Sym1 = symbol("1");
> > > +  Sym1.CanonicalDeclaration.FileURI = AHeaderUri.c_str();
> > > +
> > > +  auto Sym2 = symbol("2");
> > > +  Sym2.CanonicalDeclaration.FileURI = BHeaderUri.c_str();
> > > +  Sym2.Definition.FileURI = BSourceUri.c_str();
> > > +
> > > +  IndexFileIn IF;
> > > +  {
> > > +    SymbolSlab::Builder B;
> > > +    // Should be stored in only a.h
> > > +    B.insert(Sym1);
> > > +    // Should be stored in both b.h and b.cc
> > > +    B.insert(Sym2);
> > > +    IF.Symbols = std::move(B).build();
> > > +  }
> > > +  {
> > > +    // Should be stored in b.cc
> > > +    IF.Refs = std::move(*refSlab(Sym1.ID,
> > > BSourceUri.c_str()).release());
> > > +  }
> > > +  {
> > > +    RelationSlab::Builder B;
> > > +    // Should be stored in a.h
> > > +    B.insert(Relation{Sym1.ID, RelationKind::BaseOf, Sym2.ID});
> > > +    // Should be stored in b.h
> > > +    B.insert(Relation{Sym2.ID, RelationKind::BaseOf, Sym1.ID});
> > > +    IF.Relations = std::move(B).build();
> > > +  }
> > > +
> > > +  IF.Sources.emplace();
> > > +  IncludeGraph &IG = *IF.Sources;
> > > +  {
> > > +    // b.cc includes b.h
> > > +    auto &Node = IG[BSourceUri];
> > > +    Node.DirectIncludes = {BHeaderUri};
> > > +    Node.URI = BSourceUri;
> > > +  }
> > > +  {
> > > +    // b.h includes a.h
> > > +    auto &Node = IG[BHeaderUri];
> > > +    Node.DirectIncludes = {AHeaderUri};
> > > +    Node.URI = BHeaderUri;
> > > +  }
> > > +  {
> > > +    // a.h includes nothing.
> > > +    auto &Node = IG[AHeaderUri];
> > > +    Node.DirectIncludes = {};
> > > +    Node.URI = AHeaderUri;
> > > +  }
> > > +
> > > +  IF.Cmd = tooling::CompileCommand(testRoot(), "b.cc",
> > {"clang"},
> > > "out");
> > > +
> > > +  FileShardedIndex ShardedIndex(std::move(IF),
> > testPath("b.cc"));
> > > +  ASSERT_THAT(
> > > +      ShardedIndex.getAllFiles(),
> > > +      UnorderedElementsAre(testPath("a.h"), testPath("b.h"),
> > > testPath("b.cc")));
> > > +
> > > +  {
> > > +    auto Shard = ShardedIndex.getShard(testPath("a.h"));
> > > +    EXPECT_THAT(Shard.Symbols.getValue(),
> > > UnorderedElementsAre(QName("1")));
> > > +    EXPECT_THAT(Shard.Refs.getValue(), IsEmpty());
> > > +    EXPECT_THAT(
> > > +        Shard.Relations.getValue(),
> > > +        UnorderedElementsAre(Relation{Sym1.ID,
> > RelationKind::BaseOf,
> > > Sym2.ID}));
> > > +    ASSERT_THAT(Shard.Sources.getValue().keys(),
> > > +                UnorderedElementsAre(AHeaderUri));
> > > +    EXPECT_THAT(Shard.Sources-
> > >lookup(AHeaderUri).DirectIncludes,
> > > IsEmpty());
> > > +    EXPECT_TRUE(Shard.Cmd.hasValue());
> > > +  }
> > > +  {
> > > +    auto Shard = ShardedIndex.getShard(testPath("b.h"));
> > > +    EXPECT_THAT(Shard.Symbols.getValue(),
> > > UnorderedElementsAre(QName("2")));
> > > +    EXPECT_THAT(Shard.Refs.getValue(), IsEmpty());
> > > +    EXPECT_THAT(
> > > +        Shard.Relations.getValue(),
> > > +        UnorderedElementsAre(Relation{Sym2.ID,
> > RelationKind::BaseOf,
> > > Sym1.ID}));
> > > +    ASSERT_THAT(Shard.Sources.getValue().keys(),
> > > +                UnorderedElementsAre(BHeaderUri, AHeaderUri));
> > > +    EXPECT_THAT(Shard.Sources-
> > >lookup(BHeaderUri).DirectIncludes,
> > > +                UnorderedElementsAre(AHeaderUri));
> > > +    EXPECT_TRUE(Shard.Cmd.hasValue());
> > > +  }
> > > +  {
> > > +    auto Shard = ShardedIndex.getShard(testPath("b.cc"));
> > > +    EXPECT_THAT(Shard.Symbols.getValue(),
> > > UnorderedElementsAre(QName("2")));
> > > +    EXPECT_THAT(Shard.Refs.getValue(),
> > > UnorderedElementsAre(Pair(Sym1.ID, _)));
> > > +    EXPECT_THAT(Shard.Relations.getValue(), IsEmpty());
> > > +    ASSERT_THAT(Shard.Sources.getValue().keys(),
> > > +                UnorderedElementsAre(BSourceUri, BHeaderUri));
> > > +    EXPECT_THAT(Shard.Sources-
> > >lookup(BSourceUri).DirectIncludes,
> > > +                UnorderedElementsAre(BHeaderUri));
> > > +    EXPECT_TRUE(Shard.Cmd.hasValue());
> > > +  }
> > > +}
> > >  } // namespace
> > >  } // namespace clangd
> > >  } // namespace clang
> > > 
> > > diff  --git a/clang-tools-extra/clangd/unittests/TestTU.cpp
> > b/clang-
> > > tools-extra/clangd/unittests/TestTU.cpp
> > > index fe3e8f01cca8..640d067261bd 100644
> > > --- a/clang-tools-extra/clangd/unittests/TestTU.cpp
> > > +++ b/clang-tools-extra/clangd/unittests/TestTU.cpp
> > > @@ -116,9 +116,10 @@ SymbolSlab TestTU::headerSymbols() const {
> > >  std::unique_ptr<SymbolIndex> TestTU::index() const {
> > >    auto AST = build();
> > >    auto Idx = std::make_unique<FileIndex>(/*UseDex=*/true);
> > > -  Idx->updatePreamble(Filename, /*Version=*/"null",
> > > AST.getASTContext(),
> > > -                      AST.getPreprocessorPtr(),
> > > AST.getCanonicalIncludes());
> > > -  Idx->updateMain(Filename, AST);
> > > +  Idx->updatePreamble(testPath(Filename), /*Version=*/"null",
> > > +                      AST.getASTContext(),
> > AST.getPreprocessorPtr(),
> > > +                      AST.getCanonicalIncludes());
> > > +  Idx->updateMain(testPath(Filename), AST);
> > >    return std::move(Idx);
> > >  }
> > >  
> > > 
> > > 
> > >         
> > > _______________________________________________
> > > cfe-commits mailing list
> > > cfe-commits at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


More information about the cfe-commits mailing list