[clang] [clang][modules] Avoid allocations when reading blob paths (PR #113984)

Jan Svoboda via cfe-commits cfe-commits at lists.llvm.org
Thu Oct 31 06:28:45 PDT 2024


https://github.com/jansvoboda11 updated https://github.com/llvm/llvm-project/pull/113984

>From c11ea47908e93fedf83021377f904d296802e627 Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan_svoboda at apple.com>
Date: Wed, 30 Oct 2024 16:52:42 -0700
Subject: [PATCH 1/6] [clang][modules] De-duplicate some logic in
 `HeaderFileInfoTrait`

---
 clang/lib/Serialization/ASTReader.cpp        | 29 ++++++++------------
 clang/lib/Serialization/ASTReaderInternals.h |  2 +-
 2 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 8d8f9378cfeabe..692fb8a91840c0 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2042,19 +2042,15 @@ ASTReader::getGlobalPreprocessedEntityID(ModuleFile &M,
   return LocalID + I->second;
 }
 
-const FileEntry *HeaderFileInfoTrait::getFile(const internal_key_type &Key) {
+OptionalFileEntryRef
+HeaderFileInfoTrait::getFile(const internal_key_type &Key) {
   FileManager &FileMgr = Reader.getFileManager();
-  if (!Key.Imported) {
-    if (auto File = FileMgr.getOptionalFileRef(Key.Filename))
-      return *File;
-    return nullptr;
-  }
+  if (!Key.Imported)
+    return FileMgr.getOptionalFileRef(Key.Filename);
 
   std::string Resolved = std::string(Key.Filename);
   Reader.ResolveImportedPath(M, Resolved);
-  if (auto File = FileMgr.getOptionalFileRef(Resolved))
-    return *File;
-  return nullptr;
+  return FileMgr.getOptionalFileRef(Resolved);
 }
 
 unsigned HeaderFileInfoTrait::ComputeHash(internal_key_ref ikey) {
@@ -2080,8 +2076,8 @@ bool HeaderFileInfoTrait::EqualKey(internal_key_ref a, internal_key_ref b) {
     return true;
 
   // Determine whether the actual files are equivalent.
-  const FileEntry *FEA = getFile(a);
-  const FileEntry *FEB = getFile(b);
+  OptionalFileEntryRef FEA = getFile(a);
+  OptionalFileEntryRef FEB = getFile(b);
   return FEA && FEA == FEB;
 }
 
@@ -2112,12 +2108,13 @@ HeaderFileInfoTrait::ReadData(internal_key_ref key, const unsigned char *d,
   HeaderFileInfo HFI;
   unsigned Flags = *d++;
 
+  OptionalFileEntryRef FE;
   bool Included = (Flags >> 6) & 0x01;
   if (Included)
-    if (const FileEntry *FE = getFile(key))
+    if ((FE = getFile(key)))
       // Not using \c Preprocessor::markIncluded(), since that would attempt to
       // deserialize this header file info again.
-      Reader.getPreprocessor().getIncludedFiles().insert(FE);
+      Reader.getPreprocessor().getIncludedFiles().insert(*FE);
 
   // FIXME: Refactor with mergeHeaderFileInfo in HeaderSearch.cpp.
   HFI.isImport |= (Flags >> 5) & 0x01;
@@ -2146,14 +2143,10 @@ HeaderFileInfoTrait::ReadData(internal_key_ref key, const unsigned char *d,
     // implicit module import.
     SubmoduleID GlobalSMID = Reader.getGlobalSubmoduleID(M, LocalSMID);
     Module *Mod = Reader.getSubmodule(GlobalSMID);
-    FileManager &FileMgr = Reader.getFileManager();
     ModuleMap &ModMap =
         Reader.getPreprocessor().getHeaderSearchInfo().getModuleMap();
 
-    std::string Filename = std::string(key.Filename);
-    if (key.Imported)
-      Reader.ResolveImportedPath(M, Filename);
-    if (auto FE = FileMgr.getOptionalFileRef(Filename)) {
+    if (FE || (FE = getFile(key))) {
       // FIXME: NameAsWritten
       Module::Header H = {std::string(key.Filename), "", *FE};
       ModMap.addHeader(Mod, H, HeaderRole, /*Imported=*/true);
diff --git a/clang/lib/Serialization/ASTReaderInternals.h b/clang/lib/Serialization/ASTReaderInternals.h
index 536b19f91691eb..4ece1cacc91414 100644
--- a/clang/lib/Serialization/ASTReaderInternals.h
+++ b/clang/lib/Serialization/ASTReaderInternals.h
@@ -278,7 +278,7 @@ class HeaderFileInfoTrait {
   data_type ReadData(internal_key_ref,const unsigned char *d, unsigned DataLen);
 
 private:
-  const FileEntry *getFile(const internal_key_type &Key);
+  OptionalFileEntryRef getFile(const internal_key_type &Key);
 };
 
 /// The on-disk hash table used for known header files.

>From 037ca2561a325449a933f848bb79b69dbf59f710 Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan_svoboda at apple.com>
Date: Mon, 28 Oct 2024 16:59:13 -0700
Subject: [PATCH 2/6] [clang][modules] Avoid allocations when reading blob
 paths

When reading paths from bitstream blobs, `ASTReader` always performs up to three allocations:
 1. Convert the `StringRef` blob to `std::string` right away.
 2. Concatenate the module file prefix directory and the relative path within `ASTReader::ResolveImportedPath()`.
 3. Call `std::string::assign()` on the allocated blob with the concatenation result.

 This patch makes is so that we avoid allocations altogether (amortized). This works by:
 1. Avoiding conversion of the `StringRef` blob into `std::string`.
 2. Keeping one "global" buffer to perform the concatenation with.
 3. Returning `StringRef` pointing into the global buffer and being careful to not store this anywhere where it can outlive the underlying global buffer allocation.

 Note that in some places, we don't store paths as blobs, but rather as records that get VBR-encoded. This makes the allocation in (1) unavoidable. I plan to fix this in a follow-up PR by changing the PCM format to store those as offsets into the blob part of the record.

 Similarly, there are some data structures that store the deserialized paths as `std::string`. If we don't access them frequently, it might be better to store just the unresolved `StringRef` and resolve it on demand, again using some kind of shared buffer to prevent allocations.

 This improves `clang-scan-deps` performance on my workload by <TBD>%.
---
 clang/include/clang/Serialization/ASTReader.h | 17 ++++-
 clang/lib/Serialization/ASTReader.cpp         | 66 ++++++++-----------
 2 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 070c1c9a54f48c..3c8bef484d0133 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -1341,9 +1341,22 @@ class ASTReader
   serialization::InputFile getInputFile(ModuleFile &F, unsigned ID,
                                         bool Complain = true);
 
+  /// Buffer we use as temporary storage backing resolved paths.
+  SmallString<256> PathBuf;
+
 public:
-  void ResolveImportedPath(ModuleFile &M, std::string &Filename);
-  static void ResolveImportedPath(std::string &Filename, StringRef Prefix);
+  StringRef ResolveImportedPath(StringRef Path, ModuleFile &M) {
+    return ResolveImportedPath(PathBuf, Path, M);
+  }
+  StringRef ResolveImportedPath(StringRef Path, StringRef Prefix) {
+    return ResolveImportedPath(PathBuf, Path, Prefix);
+  }
+  static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
+                                       StringRef Path, ModuleFile &M) {
+    return ResolveImportedPath(Buffer, Path, M.BaseDirectory);
+  }
+  static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
+                                       StringRef Path, StringRef Prefix);
 
   /// Returns the first key declaration for the given declaration. This
   /// is one that is formerly-canonical (or still canonical) and whose module
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 692fb8a91840c0..e64eabc82ec63e 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2048,8 +2048,7 @@ HeaderFileInfoTrait::getFile(const internal_key_type &Key) {
   if (!Key.Imported)
     return FileMgr.getOptionalFileRef(Key.Filename);
 
-  std::string Resolved = std::string(Key.Filename);
-  Reader.ResolveImportedPath(M, Resolved);
+  StringRef Resolved = Reader.ResolveImportedPath(Key.Filename, M);
   return FileMgr.getOptionalFileRef(Resolved);
 }
 
@@ -2513,11 +2512,10 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
   std::tie(R.FilenameAsRequested, R.Filename) = [&]() {
     uint16_t AsRequestedLength = Record[7];
 
-    std::string NameAsRequested = Blob.substr(0, AsRequestedLength).str();
-    std::string Name = Blob.substr(AsRequestedLength).str();
-
-    ResolveImportedPath(F, NameAsRequested);
-    ResolveImportedPath(F, Name);
+    std::string NameAsRequested =
+        ResolveImportedPath(Blob.substr(0, AsRequestedLength), F).str();
+    std::string Name =
+        ResolveImportedPath(Blob.substr(AsRequestedLength), F).str();
 
     if (Name.empty())
       Name = NameAsRequested;
@@ -2746,20 +2744,15 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
 /// If we are loading a relocatable PCH or module file, and the filename
 /// is not an absolute path, add the system or module root to the beginning of
 /// the file name.
-void ASTReader::ResolveImportedPath(ModuleFile &M, std::string &Filename) {
-  // Resolve relative to the base directory, if we have one.
-  if (!M.BaseDirectory.empty())
-    return ResolveImportedPath(Filename, M.BaseDirectory);
-}
-
-void ASTReader::ResolveImportedPath(std::string &Filename, StringRef Prefix) {
-  if (Filename.empty() || llvm::sys::path::is_absolute(Filename) ||
-      Filename == "<built-in>" || Filename == "<command line>")
-    return;
+StringRef ASTReader::ResolveImportedPath(SmallVectorImpl<char> &Buffer,
+                                         StringRef Path, StringRef Prefix) {
+  if (Prefix.empty() || Path.empty() || llvm::sys::path::is_absolute(Path) ||
+      Path == "<built-in>" || Path == "<command line>")
+    return Path;
 
-  SmallString<128> Buffer;
-  llvm::sys::path::append(Buffer, Prefix, Filename);
-  Filename.assign(Buffer.begin(), Buffer.end());
+  Buffer.clear();
+  llvm::sys::path::append(Buffer, Prefix, Path);
+  return {Buffer.data(), Buffer.size()};
 }
 
 static bool isDiagnosedResult(ASTReader::ASTReadResult ARR, unsigned Caps) {
@@ -3187,8 +3180,8 @@ ASTReader::ReadControlBlock(ModuleFile &F,
     case ORIGINAL_FILE:
       F.OriginalSourceFileID = FileID::get(Record[0]);
       F.ActualOriginalSourceFileName = std::string(Blob);
-      F.OriginalSourceFileName = F.ActualOriginalSourceFileName;
-      ResolveImportedPath(F, F.OriginalSourceFileName);
+      F.OriginalSourceFileName =
+          ResolveImportedPath(F.ActualOriginalSourceFileName, F).str();
       break;
 
     case ORIGINAL_FILE_ID:
@@ -5477,6 +5470,7 @@ bool ASTReader::readASTFileControlBlock(
   RecordData Record;
   std::string ModuleDir;
   bool DoneWithControlBlock = false;
+  SmallString<256> PathBuf;
   while (!DoneWithControlBlock) {
     Expected<llvm::BitstreamEntry> MaybeEntry = Stream.advance();
     if (!MaybeEntry) {
@@ -5559,8 +5553,8 @@ bool ASTReader::readASTFileControlBlock(
       break;
     case MODULE_MAP_FILE: {
       unsigned Idx = 0;
-      auto Path = ReadString(Record, Idx);
-      ResolveImportedPath(Path, ModuleDir);
+      std::string PathStr = ReadString(Record, Idx);
+      StringRef Path = ResolveImportedPath(PathBuf, PathStr, ModuleDir);
       Listener.ReadModuleMapFile(Path);
       break;
     }
@@ -5608,8 +5602,7 @@ bool ASTReader::readASTFileControlBlock(
           break;
         case INPUT_FILE:
           bool Overridden = static_cast<bool>(Record[3]);
-          std::string Filename = std::string(Blob);
-          ResolveImportedPath(Filename, ModuleDir);
+          StringRef Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
           shouldContinue = Listener.visitInputFile(
               Filename, isSystemFile, Overridden, /*IsExplicitModule*/false);
           break;
@@ -5646,8 +5639,9 @@ bool ASTReader::readASTFileControlBlock(
         // Skip Size, ModTime and Signature
         Idx += 1 + 1 + ASTFileSignature::size;
         std::string ModuleName = ReadString(Record, Idx);
-        std::string Filename = ReadString(Record, Idx);
-        ResolveImportedPath(Filename, ModuleDir);
+        std::string FilenameStr = ReadString(Record, Idx);
+        StringRef Filename =
+            ResolveImportedPath(PathBuf, FilenameStr, ModuleDir);
         Listener.visitImport(ModuleName, Filename);
       }
       break;
@@ -5901,8 +5895,7 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       // FIXME: This doesn't work for framework modules as `Filename` is the
       //        name as written in the module file and does not include
       //        `Headers/`, so this path will never exist.
-      std::string Filename = std::string(Blob);
-      ResolveImportedPath(F, Filename);
+      StringRef Filename = ResolveImportedPath(Blob, F);
       if (auto Umbrella = PP.getFileManager().getOptionalFileRef(Filename)) {
         if (!CurrentModule->getUmbrellaHeaderAsWritten()) {
           // FIXME: NameAsWritten
@@ -5931,16 +5924,14 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       break;
 
     case SUBMODULE_TOPHEADER: {
-      std::string HeaderName(Blob);
-      ResolveImportedPath(F, HeaderName);
+      StringRef HeaderName = ResolveImportedPath(Blob, F);
       CurrentModule->addTopHeaderFilename(HeaderName);
       break;
     }
 
     case SUBMODULE_UMBRELLA_DIR: {
       // See comments in SUBMODULE_UMBRELLA_HEADER
-      std::string Dirname = std::string(Blob);
-      ResolveImportedPath(F, Dirname);
+      StringRef Dirname = ResolveImportedPath(Blob, F);
       if (auto Umbrella =
               PP.getFileManager().getOptionalDirectoryRef(Dirname)) {
         if (!CurrentModule->getUmbrellaDirAsWritten()) {
@@ -9598,16 +9589,13 @@ std::string ASTReader::ReadString(const RecordDataImpl &Record, unsigned &Idx) {
 std::string ASTReader::ReadPath(ModuleFile &F, const RecordData &Record,
                                 unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  ResolveImportedPath(F, Filename);
-  return Filename;
+  return ResolveImportedPath(Filename, F).str();
 }
 
 std::string ASTReader::ReadPath(StringRef BaseDirectory,
                                 const RecordData &Record, unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  if (!BaseDirectory.empty())
-    ResolveImportedPath(Filename, BaseDirectory);
-  return Filename;
+  return ResolveImportedPath(Filename, BaseDirectory).str();
 }
 
 VersionTuple ASTReader::ReadVersionTuple(const RecordData &Record,

>From f1aa4d006ff8a5ff707cedad841acc7058ffab67 Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan_svoboda at apple.com>
Date: Tue, 29 Oct 2024 13:26:34 -0700
Subject: [PATCH 3/6] Manage ownership and assert on overlapping lifetimes

---
 clang/include/clang/Serialization/ASTReader.h | 39 ++++++++++++++++---
 clang/lib/Serialization/ASTReader.cpp         | 26 ++++++-------
 2 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 3c8bef484d0133..6da1a56d1321d9 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -50,6 +50,7 @@
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Bitstream/BitstreamReader.h"
 #include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/SaveAndRestore.h"
 #include "llvm/Support/Timer.h"
 #include "llvm/Support/VersionTuple.h"
 #include <cassert>
@@ -1342,19 +1343,47 @@ class ASTReader
                                         bool Complain = true);
 
   /// Buffer we use as temporary storage backing resolved paths.
-  SmallString<256> PathBuf;
+  std::optional<SmallString<256>> PathBuf{{}};
+
+  /// A RAII wrapper around \c StringRef that temporarily takes ownership of the
+  /// underlying buffer and gives it back on destruction.
+  class TemporarilyOwnedStringRef {
+    StringRef String;
+    llvm::SaveAndRestore<std::optional<SmallString<256>>> TemporaryLoan;
+
+  public:
+    TemporarilyOwnedStringRef(StringRef S, std::optional<SmallString<256>> &Buf)
+        : String(S), TemporaryLoan(Buf, {}) {}
+
+    /// Returns the wrapped \c StringRef that must be outlived by \c this.
+    const StringRef *operator->() const { return &String; }
+    /// Returns the wrapped \c StringRef that must be outlived by \c this.
+    const StringRef &operator*() const { return String; }
+  };
 
 public:
-  StringRef ResolveImportedPath(StringRef Path, ModuleFile &M) {
-    return ResolveImportedPath(PathBuf, Path, M);
+  /// Resolve \c Path in the context of module file \c M. The return value must
+  /// be destroyed before another call to \c ResolveImportPath.
+  TemporarilyOwnedStringRef ResolveImportedPath(StringRef Path, ModuleFile &M) {
+    return ResolveImportedPath(Path, M.BaseDirectory);
   }
-  StringRef ResolveImportedPath(StringRef Path, StringRef Prefix) {
-    return ResolveImportedPath(PathBuf, Path, Prefix);
+  /// Resolve \c Path in the context of the \c Prefix directory. The return
+  /// value must be destroyed before another call to \c ResolveImportPath.
+  TemporarilyOwnedStringRef ResolveImportedPath(StringRef Path,
+                                                StringRef Prefix) {
+    assert(PathBuf && "Multiple overlapping calls to ResolveImportedPath");
+    StringRef ResolvedPath = ResolveImportedPath(*PathBuf, Path, Prefix);
+    return {ResolvedPath, PathBuf};
   }
+
+  /// Resolve \c Path in the context of module file \c M. The \c Buffer must
+  /// outlive the returned \c StringRef.
   static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
                                        StringRef Path, ModuleFile &M) {
     return ResolveImportedPath(Buffer, Path, M.BaseDirectory);
   }
+  /// Resolve \c Path in the context of the \c Prefix directory. The \c Buffer
+  /// must outlive the returned \c StringRef.
   static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
                                        StringRef Path, StringRef Prefix);
 
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index e64eabc82ec63e..68680ece503164 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2048,8 +2048,8 @@ HeaderFileInfoTrait::getFile(const internal_key_type &Key) {
   if (!Key.Imported)
     return FileMgr.getOptionalFileRef(Key.Filename);
 
-  StringRef Resolved = Reader.ResolveImportedPath(Key.Filename, M);
-  return FileMgr.getOptionalFileRef(Resolved);
+  auto Resolved = Reader.ResolveImportedPath(Key.Filename, M);
+  return FileMgr.getOptionalFileRef(*Resolved);
 }
 
 unsigned HeaderFileInfoTrait::ComputeHash(internal_key_ref ikey) {
@@ -2513,9 +2513,9 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
     uint16_t AsRequestedLength = Record[7];
 
     std::string NameAsRequested =
-        ResolveImportedPath(Blob.substr(0, AsRequestedLength), F).str();
+        ResolveImportedPath(Blob.substr(0, AsRequestedLength), F)->str();
     std::string Name =
-        ResolveImportedPath(Blob.substr(AsRequestedLength), F).str();
+        ResolveImportedPath(Blob.substr(AsRequestedLength), F)->str();
 
     if (Name.empty())
       Name = NameAsRequested;
@@ -3181,7 +3181,7 @@ ASTReader::ReadControlBlock(ModuleFile &F,
       F.OriginalSourceFileID = FileID::get(Record[0]);
       F.ActualOriginalSourceFileName = std::string(Blob);
       F.OriginalSourceFileName =
-          ResolveImportedPath(F.ActualOriginalSourceFileName, F).str();
+          ResolveImportedPath(F.ActualOriginalSourceFileName, F)->str();
       break;
 
     case ORIGINAL_FILE_ID:
@@ -5895,8 +5895,8 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       // FIXME: This doesn't work for framework modules as `Filename` is the
       //        name as written in the module file and does not include
       //        `Headers/`, so this path will never exist.
-      StringRef Filename = ResolveImportedPath(Blob, F);
-      if (auto Umbrella = PP.getFileManager().getOptionalFileRef(Filename)) {
+      auto Filename = ResolveImportedPath(Blob, F);
+      if (auto Umbrella = PP.getFileManager().getOptionalFileRef(*Filename)) {
         if (!CurrentModule->getUmbrellaHeaderAsWritten()) {
           // FIXME: NameAsWritten
           ModMap.setUmbrellaHeaderAsWritten(CurrentModule, *Umbrella, Blob, "");
@@ -5924,16 +5924,16 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       break;
 
     case SUBMODULE_TOPHEADER: {
-      StringRef HeaderName = ResolveImportedPath(Blob, F);
-      CurrentModule->addTopHeaderFilename(HeaderName);
+      auto HeaderName = ResolveImportedPath(Blob, F);
+      CurrentModule->addTopHeaderFilename(*HeaderName);
       break;
     }
 
     case SUBMODULE_UMBRELLA_DIR: {
       // See comments in SUBMODULE_UMBRELLA_HEADER
-      StringRef Dirname = ResolveImportedPath(Blob, F);
+      auto Dirname = ResolveImportedPath(Blob, F);
       if (auto Umbrella =
-              PP.getFileManager().getOptionalDirectoryRef(Dirname)) {
+              PP.getFileManager().getOptionalDirectoryRef(*Dirname)) {
         if (!CurrentModule->getUmbrellaDirAsWritten()) {
           // FIXME: NameAsWritten
           ModMap.setUmbrellaDirAsWritten(CurrentModule, *Umbrella, Blob, "");
@@ -9589,13 +9589,13 @@ std::string ASTReader::ReadString(const RecordDataImpl &Record, unsigned &Idx) {
 std::string ASTReader::ReadPath(ModuleFile &F, const RecordData &Record,
                                 unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  return ResolveImportedPath(Filename, F).str();
+  return ResolveImportedPath(Filename, F)->str();
 }
 
 std::string ASTReader::ReadPath(StringRef BaseDirectory,
                                 const RecordData &Record, unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  return ResolveImportedPath(Filename, BaseDirectory).str();
+  return ResolveImportedPath(Filename, BaseDirectory)->str();
 }
 
 VersionTuple ASTReader::ReadVersionTuple(const RecordData &Record,

>From cc83e1dbcc4dcf7db2dafae50b4983f0b505fdf6 Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan_svoboda at apple.com>
Date: Wed, 30 Oct 2024 08:45:40 -0700
Subject: [PATCH 4/6] 256 -> 0

---
 clang/include/clang/Serialization/ASTReader.h | 6 +++---
 clang/lib/Serialization/ASTReader.cpp         | 3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 6da1a56d1321d9..b67beadf2e7318 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -1343,16 +1343,16 @@ class ASTReader
                                         bool Complain = true);
 
   /// Buffer we use as temporary storage backing resolved paths.
-  std::optional<SmallString<256>> PathBuf{{}};
+  std::optional<SmallString<0>> PathBuf;
 
   /// A RAII wrapper around \c StringRef that temporarily takes ownership of the
   /// underlying buffer and gives it back on destruction.
   class TemporarilyOwnedStringRef {
     StringRef String;
-    llvm::SaveAndRestore<std::optional<SmallString<256>>> TemporaryLoan;
+    llvm::SaveAndRestore<std::optional<SmallString<0>>> TemporaryLoan;
 
   public:
-    TemporarilyOwnedStringRef(StringRef S, std::optional<SmallString<256>> &Buf)
+    TemporarilyOwnedStringRef(StringRef S, std::optional<SmallString<0>> &Buf)
         : String(S), TemporaryLoan(Buf, {}) {}
 
     /// Returns the wrapped \c StringRef that must be outlived by \c this.
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 68680ece503164..75aa554bf88152 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10500,6 +10500,9 @@ ASTReader::ASTReader(Preprocessor &PP, InMemoryModuleCache &ModuleCache,
       UseGlobalIndex(UseGlobalIndex), CurrSwitchCaseStmts(&SwitchCaseStmts) {
   SourceMgr.setExternalSLocEntrySource(this);
 
+  PathBuf.emplace();
+  PathBuf->reserve(256);
+
   for (const auto &Ext : Extensions) {
     auto BlockName = Ext->getExtensionMetadata().BlockName;
     auto Known = ModuleFileExtensions.find(BlockName);

>From dab2bca74b71f832a532fedc7850cc72182dd86e Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan_svoboda at apple.com>
Date: Wed, 30 Oct 2024 16:25:16 -0700
Subject: [PATCH 5/6] Use capacity to express optionality, check lifetime even
 in static functions, prevent using rvalue-ref

---
 clang/include/clang/Serialization/ASTReader.h | 66 +++++++--------
 clang/lib/Serialization/ASTReader.cpp         | 83 ++++++++++++-------
 2 files changed, 84 insertions(+), 65 deletions(-)

diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index b67beadf2e7318..9c274adc59a207 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -1342,50 +1342,48 @@ class ASTReader
   serialization::InputFile getInputFile(ModuleFile &F, unsigned ID,
                                         bool Complain = true);
 
-  /// Buffer we use as temporary storage backing resolved paths.
-  std::optional<SmallString<0>> PathBuf;
+  /// The buffer used as the temporary backing storage for resolved paths.
+  SmallString<0> PathBuf;
 
-  /// A RAII wrapper around \c StringRef that temporarily takes ownership of the
-  /// underlying buffer and gives it back on destruction.
+  /// A wrapper around StringRef that temporarily borrows the underlying buffer.
   class TemporarilyOwnedStringRef {
     StringRef String;
-    llvm::SaveAndRestore<std::optional<SmallString<0>>> TemporaryLoan;
+    llvm::SaveAndRestore<SmallString<0>> UnderlyingBuffer;
 
   public:
-    TemporarilyOwnedStringRef(StringRef S, std::optional<SmallString<0>> &Buf)
-        : String(S), TemporaryLoan(Buf, {}) {}
+    TemporarilyOwnedStringRef(StringRef S, SmallString<0> &UnderlyingBuffer)
+        : String(S), UnderlyingBuffer(UnderlyingBuffer, {}) {}
 
-    /// Returns the wrapped \c StringRef that must be outlived by \c this.
-    const StringRef *operator->() const { return &String; }
-    /// Returns the wrapped \c StringRef that must be outlived by \c this.
-    const StringRef &operator*() const { return String; }
+    /// Return the wrapped \c StringRef that must be outlived by \c this.
+    const StringRef *operator->() const & { return &String; }
+    const StringRef &operator*() const & { return String; }
+
+    /// Make it harder to get a \c StringRef that outlives \c this.
+    const StringRef *operator->() && = delete;
+    const StringRef &operator*() && = delete;
   };
 
 public:
-  /// Resolve \c Path in the context of module file \c M. The return value must
-  /// be destroyed before another call to \c ResolveImportPath.
-  TemporarilyOwnedStringRef ResolveImportedPath(StringRef Path, ModuleFile &M) {
-    return ResolveImportedPath(Path, M.BaseDirectory);
-  }
-  /// Resolve \c Path in the context of the \c Prefix directory. The return
-  /// value must be destroyed before another call to \c ResolveImportPath.
-  TemporarilyOwnedStringRef ResolveImportedPath(StringRef Path,
-                                                StringRef Prefix) {
-    assert(PathBuf && "Multiple overlapping calls to ResolveImportedPath");
-    StringRef ResolvedPath = ResolveImportedPath(*PathBuf, Path, Prefix);
-    return {ResolvedPath, PathBuf};
-  }
+  /// Get the buffer for resolving paths.
+  SmallString<0> &getPathBuf() { return PathBuf; }
 
-  /// Resolve \c Path in the context of module file \c M. The \c Buffer must
-  /// outlive the returned \c StringRef.
-  static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
-                                       StringRef Path, ModuleFile &M) {
-    return ResolveImportedPath(Buffer, Path, M.BaseDirectory);
-  }
-  /// Resolve \c Path in the context of the \c Prefix directory. The \c Buffer
-  /// must outlive the returned \c StringRef.
-  static StringRef ResolveImportedPath(SmallVectorImpl<char> &Buffer,
-                                       StringRef Path, StringRef Prefix);
+  /// Resolve \c Path in the context of module file \c M. The return value
+  /// must go out of scope before the next call to \c ResolveImportedPath.
+  static TemporarilyOwnedStringRef
+  ResolveImportedPath(SmallString<0> &Buf, StringRef Path, ModuleFile &ModF);
+  /// Resolve \c Path in the context of the \c Prefix directory. The return
+  /// value must go out of scope before the next call to \c ResolveImportedPath.
+  static TemporarilyOwnedStringRef
+  ResolveImportedPath(SmallString<0> &Buf, StringRef Path, StringRef Prefix);
+
+  /// Resolve \c Path in the context of module file \c M.
+  static std::string ResolveImportedPathAndAllocate(SmallString<0> &Buf,
+                                                    StringRef Path,
+                                                    ModuleFile &ModF);
+  /// Resolve \c Path in the context of the \c Prefix directory.
+  static std::string ResolveImportedPathAndAllocate(SmallString<0> &Buf,
+                                                    StringRef Path,
+                                                    StringRef Prefix);
 
   /// Returns the first key declaration for the given declaration. This
   /// is one that is formerly-canonical (or still canonical) and whose module
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 75aa554bf88152..cc95a48cc57045 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2048,7 +2048,8 @@ HeaderFileInfoTrait::getFile(const internal_key_type &Key) {
   if (!Key.Imported)
     return FileMgr.getOptionalFileRef(Key.Filename);
 
-  auto Resolved = Reader.ResolveImportedPath(Key.Filename, M);
+  auto Resolved =
+      ASTReader::ResolveImportedPath(Reader.getPathBuf(), Key.Filename, M);
   return FileMgr.getOptionalFileRef(*Resolved);
 }
 
@@ -2512,10 +2513,12 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
   std::tie(R.FilenameAsRequested, R.Filename) = [&]() {
     uint16_t AsRequestedLength = Record[7];
 
+    StringRef NameAsRequestedRef = Blob.substr(0, AsRequestedLength);
+    StringRef NameRef = Blob.substr(AsRequestedLength);
+
     std::string NameAsRequested =
-        ResolveImportedPath(Blob.substr(0, AsRequestedLength), F)->str();
-    std::string Name =
-        ResolveImportedPath(Blob.substr(AsRequestedLength), F)->str();
+        ResolveImportedPathAndAllocate(PathBuf, NameAsRequestedRef, F);
+    std::string Name = ResolveImportedPathAndAllocate(PathBuf, NameRef, F);
 
     if (Name.empty())
       Name = NameAsRequested;
@@ -2741,18 +2744,38 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
   return IF;
 }
 
-/// If we are loading a relocatable PCH or module file, and the filename
-/// is not an absolute path, add the system or module root to the beginning of
-/// the file name.
-StringRef ASTReader::ResolveImportedPath(SmallVectorImpl<char> &Buffer,
-                                         StringRef Path, StringRef Prefix) {
+ASTReader::TemporarilyOwnedStringRef
+ASTReader::ResolveImportedPath(SmallString<0> &Buf, StringRef Path,
+                               ModuleFile &ModF) {
+  return ResolveImportedPath(Buf, Path, ModF.BaseDirectory);
+}
+
+ASTReader::TemporarilyOwnedStringRef
+ASTReader::ResolveImportedPath(SmallString<0> &Buf, StringRef Path,
+                               StringRef Prefix) {
+  assert(Buf.capacity() != 0 && "Overlapping ResolveImportedPath calls");
+
   if (Prefix.empty() || Path.empty() || llvm::sys::path::is_absolute(Path) ||
       Path == "<built-in>" || Path == "<command line>")
-    return Path;
+    return {Path, Buf};
 
-  Buffer.clear();
-  llvm::sys::path::append(Buffer, Prefix, Path);
-  return {Buffer.data(), Buffer.size()};
+  Buf.clear();
+  llvm::sys::path::append(Buf, Prefix, Path);
+  StringRef ResolvedPath{Buf.data(), Buf.size()};
+  return {ResolvedPath, Buf};
+}
+
+std::string ASTReader::ResolveImportedPathAndAllocate(SmallString<0> &Buf,
+                                                      StringRef P,
+                                                      ModuleFile &ModF) {
+  return ResolveImportedPathAndAllocate(Buf, P, ModF.BaseDirectory);
+}
+
+std::string ASTReader::ResolveImportedPathAndAllocate(SmallString<0> &Buf,
+                                                      StringRef P,
+                                                      StringRef Prefix) {
+  auto ResolvedPath = ResolveImportedPath(Buf, P, Prefix);
+  return ResolvedPath->str();
 }
 
 static bool isDiagnosedResult(ASTReader::ASTReadResult ARR, unsigned Caps) {
@@ -3180,8 +3203,8 @@ ASTReader::ReadControlBlock(ModuleFile &F,
     case ORIGINAL_FILE:
       F.OriginalSourceFileID = FileID::get(Record[0]);
       F.ActualOriginalSourceFileName = std::string(Blob);
-      F.OriginalSourceFileName =
-          ResolveImportedPath(F.ActualOriginalSourceFileName, F)->str();
+      F.OriginalSourceFileName = ResolveImportedPathAndAllocate(
+          PathBuf, F.ActualOriginalSourceFileName, F);
       break;
 
     case ORIGINAL_FILE_ID:
@@ -5470,7 +5493,8 @@ bool ASTReader::readASTFileControlBlock(
   RecordData Record;
   std::string ModuleDir;
   bool DoneWithControlBlock = false;
-  SmallString<256> PathBuf;
+  SmallString<0> PathBuf;
+  PathBuf.reserve(256);
   while (!DoneWithControlBlock) {
     Expected<llvm::BitstreamEntry> MaybeEntry = Stream.advance();
     if (!MaybeEntry) {
@@ -5554,8 +5578,8 @@ bool ASTReader::readASTFileControlBlock(
     case MODULE_MAP_FILE: {
       unsigned Idx = 0;
       std::string PathStr = ReadString(Record, Idx);
-      StringRef Path = ResolveImportedPath(PathBuf, PathStr, ModuleDir);
-      Listener.ReadModuleMapFile(Path);
+      auto Path = ResolveImportedPath(PathBuf, PathStr, ModuleDir);
+      Listener.ReadModuleMapFile(*Path);
       break;
     }
     case INPUT_FILE_OFFSETS: {
@@ -5602,9 +5626,9 @@ bool ASTReader::readASTFileControlBlock(
           break;
         case INPUT_FILE:
           bool Overridden = static_cast<bool>(Record[3]);
-          StringRef Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
+          auto Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
           shouldContinue = Listener.visitInputFile(
-              Filename, isSystemFile, Overridden, /*IsExplicitModule*/false);
+              *Filename, isSystemFile, Overridden, /*IsExplicitModule*/false);
           break;
         }
         if (!shouldContinue)
@@ -5640,9 +5664,8 @@ bool ASTReader::readASTFileControlBlock(
         Idx += 1 + 1 + ASTFileSignature::size;
         std::string ModuleName = ReadString(Record, Idx);
         std::string FilenameStr = ReadString(Record, Idx);
-        StringRef Filename =
-            ResolveImportedPath(PathBuf, FilenameStr, ModuleDir);
-        Listener.visitImport(ModuleName, Filename);
+        auto Filename = ResolveImportedPath(PathBuf, FilenameStr, ModuleDir);
+        Listener.visitImport(ModuleName, *Filename);
       }
       break;
     }
@@ -5895,7 +5918,7 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       // FIXME: This doesn't work for framework modules as `Filename` is the
       //        name as written in the module file and does not include
       //        `Headers/`, so this path will never exist.
-      auto Filename = ResolveImportedPath(Blob, F);
+      auto Filename = ResolveImportedPath(PathBuf, Blob, F);
       if (auto Umbrella = PP.getFileManager().getOptionalFileRef(*Filename)) {
         if (!CurrentModule->getUmbrellaHeaderAsWritten()) {
           // FIXME: NameAsWritten
@@ -5924,14 +5947,14 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
       break;
 
     case SUBMODULE_TOPHEADER: {
-      auto HeaderName = ResolveImportedPath(Blob, F);
+      auto HeaderName = ResolveImportedPath(PathBuf, Blob, F);
       CurrentModule->addTopHeaderFilename(*HeaderName);
       break;
     }
 
     case SUBMODULE_UMBRELLA_DIR: {
       // See comments in SUBMODULE_UMBRELLA_HEADER
-      auto Dirname = ResolveImportedPath(Blob, F);
+      auto Dirname = ResolveImportedPath(PathBuf, Blob, F);
       if (auto Umbrella =
               PP.getFileManager().getOptionalDirectoryRef(*Dirname)) {
         if (!CurrentModule->getUmbrellaDirAsWritten()) {
@@ -9588,14 +9611,13 @@ std::string ASTReader::ReadString(const RecordDataImpl &Record, unsigned &Idx) {
 
 std::string ASTReader::ReadPath(ModuleFile &F, const RecordData &Record,
                                 unsigned &Idx) {
-  std::string Filename = ReadString(Record, Idx);
-  return ResolveImportedPath(Filename, F)->str();
+  return ReadPath(F.BaseDirectory, Record, Idx);
 }
 
 std::string ASTReader::ReadPath(StringRef BaseDirectory,
                                 const RecordData &Record, unsigned &Idx) {
   std::string Filename = ReadString(Record, Idx);
-  return ResolveImportedPath(Filename, BaseDirectory)->str();
+  return ResolveImportedPathAndAllocate(PathBuf, Filename, BaseDirectory);
 }
 
 VersionTuple ASTReader::ReadVersionTuple(const RecordData &Record,
@@ -10500,8 +10522,7 @@ ASTReader::ASTReader(Preprocessor &PP, InMemoryModuleCache &ModuleCache,
       UseGlobalIndex(UseGlobalIndex), CurrSwitchCaseStmts(&SwitchCaseStmts) {
   SourceMgr.setExternalSLocEntrySource(this);
 
-  PathBuf.emplace();
-  PathBuf->reserve(256);
+  PathBuf.reserve(256);
 
   for (const auto &Ext : Extensions) {
     auto BlockName = Ext->getExtensionMetadata().BlockName;

>From 0939461508815c1791851def260f700c28989054 Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan at svoboda.ai>
Date: Thu, 31 Oct 2024 06:28:25 -0700
Subject: [PATCH 6/6] clang-format

---
 clang/lib/Serialization/ASTReader.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index cc95a48cc57045..d383b4f29a5264 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -5628,7 +5628,7 @@ bool ASTReader::readASTFileControlBlock(
           bool Overridden = static_cast<bool>(Record[3]);
           auto Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
           shouldContinue = Listener.visitInputFile(
-              *Filename, isSystemFile, Overridden, /*IsExplicitModule*/false);
+              *Filename, isSystemFile, Overridden, /*IsExplicitModule=*/false);
           break;
         }
         if (!shouldContinue)



More information about the cfe-commits mailing list