[clang] [Modules] Add -cc1 -flate-module-map-file to load module maps after PCMs (PR #88893)

via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 16 06:43:32 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang-modules

Author: Ilya Biryukov (ilya-biryukov)

<details>
<summary>Changes</summary>

If the same module map is passed to multiple compilation actions that build PCMs and later load them, we currently create a new FileID for it every time a PCM gets built.

This is not very problematic in terms of performance, but it causes us to waste source location space when those PCMs get loaded together: the module map will take source location space in each of the PCMs, effectively multiplying the space it takes by the number of PCMs that we load.

PR #<!-- -->83660 has inadvertenly caused more PCMs to be loaded in our internal builds and some actions tipped over the source location limit.

To provide a workaround, introduce an option to load module maps after PCMs and ensure we reuse existing FileID if it's available. This allows to reduce the source location space taken by module maps because we will use an existing FileID from some loaded PCM instead of creating a new one each time.

There are probably some holes in the approach for more complicated uses. In particular:
- It is unclear how the module map shadowing rules are affected.
- Code replaying AST files is not aware of the new option and will keep using `fmodule-map-file`, it is unclear whether this acutally causes any issues
- ModuleDepCollector does not distinguish betwee late and non-late module maps.

The new flag is `-cc1` for now to allow providing a quick workaround for our builds in the wake of #<!-- -->83660 without figuring out all the details for a more sophisticated solution. I plan to start a follow up discussion on Discourse for a broader problem
of source location space exhaustion in modular builds.

---
Full diff: https://github.com/llvm/llvm-project/pull/88893.diff


7 Files Affected:

- (modified) clang/include/clang/Driver/Options.td (+6) 
- (modified) clang/include/clang/Frontend/FrontendOptions.h (+3) 
- (modified) clang/lib/Frontend/FrontendAction.cpp (+11-3) 
- (modified) clang/lib/Frontend/Rewrite/FrontendActions.cpp (+1) 
- (modified) clang/lib/Lex/ModuleMap.cpp (+4-1) 
- (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+2) 
- (added) clang/test/Modules/late-module-maps.cpp (+93) 


``````````diff
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index e24626913add76..d3fdc054076751 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3179,6 +3179,12 @@ def fmodule_map_file : Joined<["-"], "fmodule-map-file=">,
   MetaVarName<"<file>">,
   HelpText<"Load this module map file">,
   MarshallingInfoStringVector<FrontendOpts<"ModuleMapFiles">>;
+def flate_module_map_file : Joined<["-"], "flate-module-map-file=">,
+  Group<f_Group>,
+  Visibility<[CC1Option]>,
+  MetaVarName<"<file>">,
+  HelpText<"Load this module map file after all modules">,
+  MarshallingInfoStringVector<FrontendOpts<"LateModuleMapFiles">>;
 def fmodule_file : Joined<["-"], "fmodule-file=">,
   Group<i_Group>,
   Visibility<[ClangOption, CC1Option, CLOption]>,
diff --git a/clang/include/clang/Frontend/FrontendOptions.h b/clang/include/clang/Frontend/FrontendOptions.h
index a738c1f3757682..cc45e68899c7e8 100644
--- a/clang/include/clang/Frontend/FrontendOptions.h
+++ b/clang/include/clang/Frontend/FrontendOptions.h
@@ -536,6 +536,9 @@ class FrontendOptions {
   /// The list of module map files to load before processing the input.
   std::vector<std::string> ModuleMapFiles;
 
+  /// \brief The list of module map files to load after ModuleFile.
+  std::vector<std::string> LateModuleMapFiles;
+
   /// The list of additional prebuilt module files to load before
   /// processing the input.
   std::vector<std::string> ModuleFiles;
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index b7c9967316f0b8..14745584c6ccaa 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -919,14 +919,17 @@ bool FrontendAction::BeginSourceFile(CompilerInstance &CI,
   if (!BeginSourceFileAction(CI))
     return false;
 
-  // If we were asked to load any module map files, do so now.
-  for (const auto &Filename : CI.getFrontendOpts().ModuleMapFiles) {
+  auto LoadModuleMap = [&](StringRef Filename) {
     if (auto File = CI.getFileManager().getOptionalFileRef(Filename))
       CI.getPreprocessor().getHeaderSearchInfo().loadModuleMapFile(
           *File, /*IsSystem*/false);
     else
       CI.getDiagnostics().Report(diag::err_module_map_not_found) << Filename;
-  }
+  };
+  // If we were asked to load any module map files, do so now.
+  for (StringRef Filename : CI.getFrontendOpts().ModuleMapFiles)
+    LoadModuleMap(Filename);
+  // Note that late module maps will be loaded after modules.
 
   // If compiling implementation of a module, load its module map file now.
   (void)CI.getPreprocessor().getCurrentModuleImplementation();
@@ -1038,6 +1041,11 @@ bool FrontendAction::BeginSourceFile(CompilerInstance &CI,
           diag::warn_eagerly_load_for_standard_cplusplus_modules);
   }
 
+  // Some module maps are loaded after modules to allow avoiding redundant
+  // processing if they were processed by modules.
+  for (StringRef Filename : CI.getFrontendOpts().LateModuleMapFiles)
+    LoadModuleMap(Filename);
+
   // If there is a layout overrides file, attach an external AST source that
   // provides the layouts from that file.
   if (!CI.getFrontendOpts().OverrideRecordLayoutsFile.empty() &&
diff --git a/clang/lib/Frontend/Rewrite/FrontendActions.cpp b/clang/lib/Frontend/Rewrite/FrontendActions.cpp
index cf5a9437e89e6c..f6e7aea06535de 100644
--- a/clang/lib/Frontend/Rewrite/FrontendActions.cpp
+++ b/clang/lib/Frontend/Rewrite/FrontendActions.cpp
@@ -255,6 +255,7 @@ class RewriteIncludesAction::RewriteImportsListener : public ASTReaderListener {
         Filename, InputKind(Language::Unknown, InputKind::Precompiled));
     Instance.getFrontendOpts().ModuleFiles.clear();
     Instance.getFrontendOpts().ModuleMapFiles.clear();
+    Instance.getFrontendOpts().LateModuleMapFiles.clear();
     // Don't recursively rewrite imports. We handle them all at the top level.
     Instance.getPreprocessorOutputOpts().RewriteImports = false;
 
diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index eed7eca2e73562..232420b58f6004 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -3130,7 +3130,10 @@ bool ModuleMap::parseModuleMapFile(FileEntryRef File, bool IsSystem,
   if (ID.isInvalid()) {
     auto FileCharacter =
         IsSystem ? SrcMgr::C_System_ModuleMap : SrcMgr::C_User_ModuleMap;
-    ID = SourceMgr.createFileID(File, ExternModuleLoc, FileCharacter);
+    if (ExternModuleLoc.isValid())
+      ID = SourceMgr.createFileID(File, ExternModuleLoc, FileCharacter);
+    else // avoid creating redundant FileIDs, they take sloc space in PCMs.
+      ID = SourceMgr.getOrCreateFileID(File, FileCharacter);
   }
 
   assert(Target && "Missing target information");
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index e19f19b2528c15..a45a8e5267389c 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -242,6 +242,8 @@ ModuleDepCollector::getInvocationAdjustedForModuleBuildWithoutOutputs(
   // Remove directly passed modulemap files. They will get added back if they
   // were actually used.
   CI.getMutFrontendOpts().ModuleMapFiles.clear();
+  // Late module maps can only be readded into ModuleMapFiles.
+  CI.getMutFrontendOpts().LateModuleMapFiles.clear();
 
   auto DepModuleMapFiles = collectModuleMapFiles(Deps.ClangModuleDeps);
   for (StringRef ModuleMapFile : Deps.ModuleMapFileDeps) {
diff --git a/clang/test/Modules/late-module-maps.cpp b/clang/test/Modules/late-module-maps.cpp
new file mode 100644
index 00000000000000..2ca4af3ae203a0
--- /dev/null
+++ b/clang/test/Modules/late-module-maps.cpp
@@ -0,0 +1,93 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+// We need to check we don't waste source location space for the same file
+// (i.e. base.modulemap) when it's passed to multiple PCM file.
+//
+// *** First, try to use normal module map files.
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN:   -fmodule-map-file=%t/base.modulemap -fmodule-map-file=%t/a.modulemap \
+// RUN:   -fmodule-name=a -xc++ -emit-module -o %t/a.pcm %t/a.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN:   -fmodule-map-file=%t/base.modulemap -fmodule-map-file=%t/a.modulemap -fmodule-map-file=%t/b.modulemap \
+// RUN:   -fmodule-file=%t/a.pcm \
+// RUN:   -fmodule-name=b -xc++ -emit-module -o %t/b.pcm %t/b.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN:   -fmodule-map-file=%t/base.modulemap -fmodule-map-file=%t/a.modulemap -fmodule-map-file=%t/b.modulemap \
+// RUN:   -fmodule-file=%t/a.pcm -fmodule-file=%t/b.pcm \
+// RUN:   -fsyntax-only -print-stats %t/use.cpp 2>&1 \
+// RUN:      | FileCheck %t/use.cpp
+
+// *** Switch to -flate-module-map-file and check it produces less loaded SLO entries.
+// RUN: rm %t/*.pcm
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN:   -flate-module-map-file=%t/base.modulemap -flate-module-map-file=%t/a.modulemap \
+// RUN:   -fmodule-name=a -xc++ -emit-module -o %t/a.pcm %t/a.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN:   -flate-module-map-file=%t/base.modulemap -flate-module-map-file=%t/a.modulemap -flate-module-map-file=%t/b.modulemap \
+// RUN:   -fmodule-file=%t/a.pcm \
+// RUN:   -fmodule-name=b -xc++ -emit-module -o %t/b.pcm %t/b.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN:   -flate-module-map-file=%t/base.modulemap -flate-module-map-file=%t/a.modulemap -flate-module-map-file=%t/b.modulemap \
+// RUN:   -fmodule-file=%t/a.pcm -fmodule-file=%t/b.pcm \
+// RUN:   -fsyntax-only -print-stats %t/use.cpp 2>&1 \
+// RUN:      | FileCheck --check-prefix=CHECK-LATE %t/use.cpp
+
+//--- use.cpp
+// This is a very shaky check, it would be nice if it was more directly looking at which files were loaded.
+// We load 2 SLocEntries less with flate-module-map-file, they correspond to savings from base.modulemap and a.modulemap
+// reusing the FileID in a.pcm and b.pcm.
+// We also have 3 less local SLocEntries, because we get to reuse FileIDs all module maps from PCMs.
+//
+// CHECK: Source Manager Stats:
+// CHECK: 7 local SLocEntries
+// CHECK: 13 loaded SLocEntries
+
+// CHECK-LATE: Source Manager Stats:
+// CHECK-LATE: 4 local SLocEntries
+// CHECK-LATE: 11 loaded SLocEntries
+#include "a.h"
+#include "b.h"
+#include "assert.h"
+
+int main() {
+    return a() + b();
+}
+
+//--- base.modulemap
+module "base" {
+    textual header "assert.h"
+}
+
+//--- a.modulemap
+module "a" {
+    header "a.h"
+    use "base"
+}
+
+//--- b.modulemap
+module "b" {
+    header "b.h"
+    use "a"
+    use "base"
+}
+
+
+//--- assert.h
+#define ASSERT
+
+//--- a.h
+#include "assert.h"
+inline int a() { return 1; }
+
+//--- b.h
+#include "a.h"
+#include "assert.h"
+
+inline int b() { return a() + 1; }
+

``````````

</details>


https://github.com/llvm/llvm-project/pull/88893


More information about the cfe-commits mailing list