[clang] [clang][modules] Avoid allocations when reading blob paths (PR #113984)

via cfe-commits cfe-commits at lists.llvm.org
Mon Oct 28 17:11:18 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang-modules

Author: Jan Svoboda (jansvoboda11)

<details>
<summary>Changes</summary>

When reading a path from a bitstream blob, `ASTReader` performs up to three allocations:
 1. Conversion of the `StringRef` blob into `std::string` to conform to the `ASTReader::ResolveImportedPath()` API that takes `std::string &`.
 2. Concatenation of the module file prefix directory and the relative path into a fresh `SmallString<128>` buffer in `ASTReader::ResolveImportedPath()`.
 3. Propagating the result out of `ASTReader::ResolveImportedPath()` by calling `std::string::assign()` on the out-parameter.

 This patch makes is so that we avoid allocations altogether (amortized) by:
 1. Avoiding conversion of the `StringRef` blob into `std::string` and changing the `ASTReader::ResolveImportedPath()` API.
 2. Using one "global" buffer to hold the concatenation.
 3. Returning `StringRef` pointing into the buffer and being careful to not store this anywhere where it can outlive the underlying global buffer contents.

Note that in some places of the bitstream we don't store paths as blobs, but rather as records that get VBR-encoded. This makes the allocation in (1) unavoidable. I plan to fix this in a follow-up PR by changing the PCM format.

Moreover, there are some data structures (e.g. `serialization::InputFileInfo`) that store deserialized and resolved paths as `std::string`. If we don't access them frequently, it would be more efficient to store just the unresolved `StringRef` and resolve them on demand (within some kind of shared buffer to prevent allocations).

 This PR alone improves `clang-scan-deps` performance on my workload by _TBD_%.

---
Full diff: https://github.com/llvm/llvm-project/pull/113984.diff


2 Files Affected:

- (modified) clang/include/clang/Serialization/ASTReader.h (+15-2) 
- (modified) clang/lib/Serialization/ASTReader.cpp (+29-41) 


``````````diff
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 070c1c9a54f48c..3c8bef484d0133 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -1341,9 +1341,22 @@ class ASTReader
   serialization::InputFile getInputFile(ModuleFile &F, unsigned ID,
                                         bool Complain = true);
 
+  /// Buffer we use as temporary storage backing resolved paths.
+  SmallString<256> PathBuf;
+
 public:
-  void ResolveImportedPath(ModuleFile &M, std::string &Filename);
-  static void ResolveImportedPath(std::string &Filename, StringRef Prefix);
+  StringRef ResolveImportedPath(StringRef Path, ModuleFile &M) {
+    return ResolveImportedPath(PathBuf, Path, M);
+  }
+  StringRef ResolveImportedPath(StringRef Path, StringRef Prefix) {
+    return ResolveImportedPath(PathBuf, Path, Prefix);
+  }
+  static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
+                                       StringRef Path, ModuleFile &M) {
+    return ResolveImportedPath(Buffer, Path, M.BaseDirectory);
+  }
+  static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
+                                       StringRef Path, StringRef Prefix);
 
   /// Returns the first key declaration for the given declaration. This
   /// is one that is formerly-canonical (or still canonical) and whose module
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 8d8f9378cfeabe..19f7f2be7f9011 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2050,8 +2050,7 @@ const FileEntry *HeaderFileInfoTrait::getFile(const internal_key_type &Key) {
     return nullptr;
   }
 
-  std::string Resolved = std::string(Key.Filename);
-  Reader.ResolveImportedPath(M, Resolved);
+  StringRef Resolved = Reader.ResolveImportedPath(Key.Filename, M);
   if (auto File = FileMgr.getOptionalFileRef(Resolved))
     return *File;
   return nullptr;
@@ -2150,9 +2149,9 @@ HeaderFileInfoTrait::ReadData(internal_key_ref key, const unsigned char *d,
     ModuleMap &ModMap =
         Reader.getPreprocessor().getHeaderSearchInfo().getModuleMap();
 
-    std::string Filename = std::string(key.Filename);
+    StringRef Filename = key.Filename;
     if (key.Imported)
-      Reader.ResolveImportedPath(M, Filename);
+      Filename = Reader.ResolveImportedPath(Filename, M);
     if (auto FE = FileMgr.getOptionalFileRef(Filename)) {
       // FIXME: NameAsWritten
       Module::Header H = {std::string(key.Filename), "", *FE};
@@ -2520,11 +2519,10 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
   std::tie(R.FilenameAsRequested, R.Filename) = [&]() {
     uint16_t AsRequestedLength = Record[7];
 
-    std::string NameAsRequested = Blob.substr(0, AsRequestedLength).str();
-    std::string Name = Blob.substr(AsRequestedLength).str();
-
-    ResolveImportedPath(F, NameAsRequested);
-    ResolveImportedPath(F, Name);
+    std::string NameAsRequested =
+        ResolveImportedPath(Blob.substr(0, AsRequestedLength), F).str();
+    std::string Name =
+        ResolveImportedPath(Blob.substr(AsRequestedLength), F).str();
 
     if (Name.empty())
       Name = NameAsRequested;
@@ -2753,20 +2751,15 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
 /// If we are loading a relocatable PCH or module file, and the filename
 /// is not an absolute path, add the system or module root to the beginning of
 /// the file name.
-void ASTReader::ResolveImportedPath(ModuleFile &M, std::string &Filename) {
-  // Resolve relative to the base directory, if we have one.
-  if (!M.BaseDirectory.empty())
-    return ResolveImportedPath(Filename, M.BaseDirectory);
-}
-
-void ASTReader::ResolveImportedPath(std::string &Filename, StringRef Prefix) {
-  if (Filename.empty() || llvm::sys::path::is_absolute(Filename) ||
-      Filename == "<built-in>" || Filename == "<command line>")
-    return;
+StringRef ASTReader::ResolveImportedPath(SmallVectorImpl<char> &Buffer,
+                                         StringRef Path, StringRef Prefix) {
+  if (Prefix.empty() || Path.empty() || llvm::sys::path::is_absolute(Path) ||
+      Path == "<built-in>" || Path == "<command line>")
+    return Path;
 
-  SmallString<128> Buffer;
-  llvm::sys::path::append(Buffer, Prefix, Filename);
-  Filename.assign(Buffer.begin(), Buffer.end());
+  Buffer.clear();
+  llvm::sys::path::append(Buffer, Prefix, Path);
+  return {Buffer.data(), Buffer.size()};
 }
 
 static bool isDiagnosedResult(ASTReader::ASTReadResult ARR, unsigned Caps) {
@@ -3194,8 +3187,8 @@ ASTReader::ReadControlBlock(ModuleFile &F,
     case ORIGINAL_FILE:
       F.OriginalSourceFileID = FileID::get(Record[0]);
       F.ActualOriginalSourceFileName = std::string(Blob);
-      F.OriginalSourceFileName = F.ActualOriginalSourceFileName;
-      ResolveImportedPath(F, F.OriginalSourceFileName);
+      F.OriginalSourceFileName =
+          ResolveImportedPath(F.ActualOriginalSourceFileName, F).str();
       break;
 
     case ORIGINAL_FILE_ID:
@@ -5484,6 +5477,7 @@ bool ASTReader::readASTFileControlBlock(
   RecordData Record;
   std::string ModuleDir;
   bool DoneWithControlBlock = false;
+  SmallString<256> PathBuf;
   while (!DoneWithControlBlock) {
     Expected<llvm::BitstreamEntry> MaybeEntry = Stream.advance();
     if (!MaybeEntry) {
@@ -5566,8 +5560,8 @@ bool ASTReader::readASTFileControlBlock(
       break;
     case MODULE_MAP_FILE: {
       unsigned Idx = 0;
-      auto Path = ReadString(Record, Idx);
-      ResolveImportedPath(Path, ModuleDir);
+      std::string PathStr = ReadString(Record, Idx);
+      StringRef Path = ResolveImportedPath(PathBuf, PathStr, ModuleDir);
       Listener.ReadModuleMapFile(Path);
       break;
     }
@@ -5615,8 +5609,7 @@ bool ASTReader::readASTFileControlBlock(
           break;
         case INPUT_FILE:
           bool Overridden = static_cast<bool>(Record[3]);
-          std::string Filename = std::string(Blob);
-          ResolveImportedPath(Filename, ModuleDir);
+          StringRef Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
           shouldContinue = Listener.visitInputFile(
               Filename, isSystemFile, Overridden, /*IsExplicitModule*/false);
           break;
@@ -5653,8 +5646,9 @@ bool ASTReader::readASTFileControlBlock(
         // Skip Size, ModTime and Signature
         Idx += 1 + 1 + ASTFileSignature::size;
         std::string ModuleName = ReadString(Record, Idx);
-        std::string Filename = ReadString(Record, Idx);
-        ResolveImportedPath(Filename, ModuleDir);
+        std::string FilenameStr = ReadString(Record, Idx);
+        StringRef Filename =
+            ResolveImportedPath(PathBuf, FilenameStr, ModuleDir);
         Listener.visitImport(ModuleName, Filename);
       }
       break;
@@ -5908,8 +5902,7 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       // FIXME: This doesn't work for framework modules as `Filename` is the
       //        name as written in the module file and does not include
       //        `Headers/`, so this path will never exist.
-      std::string Filename = std::string(Blob);
-      ResolveImportedPath(F, Filename);
+      StringRef Filename = ResolveImportedPath(Blob, F);
       if (auto Umbrella = PP.getFileManager().getOptionalFileRef(Filename)) {
         if (!CurrentModule->getUmbrellaHeaderAsWritten()) {
           // FIXME: NameAsWritten
@@ -5938,16 +5931,14 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       break;
 
     case SUBMODULE_TOPHEADER: {
-      std::string HeaderName(Blob);
-      ResolveImportedPath(F, HeaderName);
+      StringRef HeaderName = ResolveImportedPath(Blob, F);
       CurrentModule->addTopHeaderFilename(HeaderName);
       break;
     }
 
     case SUBMODULE_UMBRELLA_DIR: {
       // See comments in SUBMODULE_UMBRELLA_HEADER
-      std::string Dirname = std::string(Blob);
-      ResolveImportedPath(F, Dirname);
+      StringRef Dirname = ResolveImportedPath(Blob, F);
       if (auto Umbrella =
               PP.getFileManager().getOptionalDirectoryRef(Dirname)) {
         if (!CurrentModule->getUmbrellaDirAsWritten()) {
@@ -9605,16 +9596,13 @@ std::string ASTReader::ReadString(const RecordDataImpl &Record, unsigned &Idx) {
 std::string ASTReader::ReadPath(ModuleFile &F, const RecordData &Record,
                                 unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  ResolveImportedPath(F, Filename);
-  return Filename;
+  return ResolveImportedPath(Filename, F).str();
 }
 
 std::string ASTReader::ReadPath(StringRef BaseDirectory,
                                 const RecordData &Record, unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  if (!BaseDirectory.empty())
-    ResolveImportedPath(Filename, BaseDirectory);
-  return Filename;
+  return ResolveImportedPath(Filename, BaseDirectory).str();
 }
 
 VersionTuple ASTReader::ReadVersionTuple(const RecordData &Record,

``````````

</details>


https://github.com/llvm/llvm-project/pull/113984


More information about the cfe-commits mailing list