[clang-tools-extra] [clangd] [C++20] [Modules] Add scanning cache (PR #125988)

kadir çetinkaya via cfe-commits cfe-commits at lists.llvm.org
Thu Feb 13 04:37:19 PST 2025


================
@@ -380,30 +381,114 @@ llvm::SmallVector<StringRef> getAllRequiredModules(ProjectModules &MDB,
   return ModuleNames;
 }
 
+class CachingProjectModules : public ProjectModules {
+public:
+  CachingProjectModules(const GlobalCompilationDatabase &CDB) : CDB(CDB) {}
+
+  std::vector<std::string> getRequiredModules(PathRef File) override {
+    std::unique_ptr<ProjectModules> MDB = CDB.getProjectModules(File);
+    if (!MDB) {
+      elog("Failed to get Project Modules information for {0}", File);
+      return {};
+    }
+    return MDB->getRequiredModules(File);
+  }
+
+  std::string getModuleNameForSource(PathRef File) override {
+    std::unique_ptr<ProjectModules> MDB = CDB.getProjectModules(File);
+    if (!MDB) {
+      elog("Failed to get Project Modules information for {0}", File);
+      return {};
+    }
+    return MDB->getModuleNameForSource(File);
+  }
+
+  void setCommandMangler(CommandMangler M) override {
+    // GlobalCompilationDatabase::getProjectModules() will set mangler
+    // for the underlying ProjectModules.
+  }
+
+  std::string getSourceForModuleName(llvm::StringRef ModuleName,
+                                     PathRef RequiredSrcFile) override {
+    std::string CachedResult;
+    {
+      std::lock_guard<std::mutex> Lock(CacheMutex);
+      auto Iter = ModuleNameToSourceCache.find(ModuleName);
+      if (Iter != ModuleNameToSourceCache.end())
+        CachedResult = Iter->second;
+    }
+
+    std::unique_ptr<ProjectModules> MDB =
+        CDB.getProjectModules(RequiredSrcFile);
+    if (!MDB) {
+      elog("Failed to get Project Modules information for {0}",
+           RequiredSrcFile);
+      return {};
+    }
+
+    // Verify Cached Result by seeing if the source declaring the same module
+    // as we query.
+    if (!CachedResult.empty()) {
+      std::string ModuleNameOfCachedSource =
+          MDB->getModuleNameForSource(CachedResult);
+      if (ModuleNameOfCachedSource == ModuleName)
+        return CachedResult;
+      else {
+        // Cached Result is invalid. Clear it.
+
+        std::lock_guard<std::mutex> Lock(CacheMutex);
+        ModuleNameToSourceCache.erase(ModuleName);
+      }
+    }
+
+    auto Result = MDB->getSourceForModuleName(ModuleName, RequiredSrcFile);
----------------
kadircet wrote:

with this naive caching strategy, we'll end up scanning the project multiple times for each preamble build, until cache is warmed up to its fullest. whereas previous approach would perform only 1 project scan per build.

i think for this to be useful we need to improve the caching strategy a little bit. what about having a `CachingProjectModules::populateCacheForModules(llvm::ArrayRef<StringRef> Modules, PathRef ActiveFile);`  and calling that from `ModulesBuilder::ModulesBuilderImpl::getOrBuildModuleFile` with `ReqModuleNames`.

This way in `populateCacheForModules` we can instantiate a `ProjectModules` from `CDB` once, make sure all the required modules are cached/up-to-date with performing at most one global scan of the project. Afterwards the following `getSourceForModuleName` will almost always hit the cache without any extra scans (modulo race conditions like a user adding new deps as we're building a module or changing the primary-interface file for a module).

https://github.com/llvm/llvm-project/pull/125988


More information about the cfe-commits mailing list