[clang] [serialization] no transitive decl change (PR #91914)

Chuanqi Xu via cfe-commits cfe-commits at lists.llvm.org
Sun May 12 22:39:22 PDT 2024


https://github.com/ChuanqiXu9 created https://github.com/llvm/llvm-project/pull/91914

Following of https://github.com/llvm/llvm-project/pull/86912

#### Motivation Example

The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases.

That said, for the most simple example,

```
// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged.

While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations.

#### Design details

The design of the patch is similar to https://github.com/llvm/llvm-project/pull/86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. 

A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want.

Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts.

#### Overhead

As https://github.com/llvm/llvm-project/pull/86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

>From ea53cb5687dd5f3597457fb4d2d62c52c2cb2771 Mon Sep 17 00:00:00 2001
From: Chuanqi Xu <yedeng.yd at linux.alibaba.com>
Date: Fri, 10 May 2024 15:36:31 +0800
Subject: [PATCH] [serialization] no transitive decl change

---
 clang/include/clang/AST/DeclBase.h            |  17 +-
 clang/include/clang/AST/DeclID.h              |  23 ++-
 .../include/clang/Serialization/ASTBitCodes.h |   6 +
 clang/include/clang/Serialization/ASTReader.h |  36 ++--
 .../include/clang/Serialization/ModuleFile.h  |  18 +-
 .../clang/Serialization/ModuleManager.h       |   2 +-
 clang/lib/AST/DeclBase.cpp                    |  34 +++-
 clang/lib/Serialization/ASTReader.cpp         | 159 ++++++++++--------
 clang/lib/Serialization/ASTReaderDecl.cpp     |  12 +-
 clang/lib/Serialization/ASTWriter.cpp         |   7 +-
 clang/lib/Serialization/ModuleFile.cpp        |   3 +-
 .../Modules/no-transitive-decls-change.cppm   | 112 ++++++++++++
 12 files changed, 282 insertions(+), 147 deletions(-)
 create mode 100644 clang/test/Modules/no-transitive-decls-change.cppm

diff --git a/clang/include/clang/AST/DeclBase.h b/clang/include/clang/AST/DeclBase.h
index e43e812cd9455..4bdf27aa99405 100644
--- a/clang/include/clang/AST/DeclBase.h
+++ b/clang/include/clang/AST/DeclBase.h
@@ -701,10 +701,7 @@ class alignas(8) Decl {
 
   /// Set the owning module ID.  This may only be called for
   /// deserialized Decls.
-  void setOwningModuleID(unsigned ID) {
-    assert(isFromASTFile() && "Only works on a deserialized declaration");
-    *((unsigned*)this - 2) = ID;
-  }
+  void setOwningModuleID(unsigned ID);
 
 public:
   /// Determine the availability of the given declaration.
@@ -777,19 +774,11 @@ class alignas(8) Decl {
 
   /// Retrieve the global declaration ID associated with this
   /// declaration, which specifies where this Decl was loaded from.
-  GlobalDeclID getGlobalID() const {
-    if (isFromASTFile())
-      return (*((const GlobalDeclID *)this - 1));
-    return GlobalDeclID();
-  }
+  GlobalDeclID getGlobalID() const;
 
   /// Retrieve the global ID of the module that owns this particular
   /// declaration.
-  unsigned getOwningModuleID() const {
-    if (isFromASTFile())
-      return *((const unsigned*)this - 2);
-    return 0;
-  }
+  unsigned getOwningModuleID() const;
 
 private:
   Module *getOwningModuleSlow() const;
diff --git a/clang/include/clang/AST/DeclID.h b/clang/include/clang/AST/DeclID.h
index 614ba06b63860..a6e4b31f3a6fb 100644
--- a/clang/include/clang/AST/DeclID.h
+++ b/clang/include/clang/AST/DeclID.h
@@ -19,6 +19,8 @@
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/ADT/iterator.h"
 
+#include <climits>
+
 namespace clang {
 
 /// Predefined declaration IDs.
@@ -107,12 +109,16 @@ class DeclIDBase {
   ///
   /// DeclID should only be used directly in serialization. All other users
   /// should use LocalDeclID or GlobalDeclID.
-  using DeclID = uint32_t;
+  using DeclID = uint64_t;
 
 protected:
   DeclIDBase() : ID(PREDEF_DECL_NULL_ID) {}
   explicit DeclIDBase(DeclID ID) : ID(ID) {}
 
+  explicit DeclIDBase(unsigned LocalID, unsigned ModuleFileIndex) {
+    ID = (DeclID)LocalID | ((DeclID)ModuleFileIndex << 32);
+  }
+
 public:
   DeclID get() const { return ID; }
 
@@ -124,6 +130,15 @@ class DeclIDBase {
 
   bool isInvalid() const { return ID == PREDEF_DECL_NULL_ID; }
 
+  unsigned getModuleFileIndex() const { return ID >> 32; }
+
+  unsigned getLocalDeclIndex() const {
+    // Implement it directly instead of calling `llvm::maskTrailingOnes` since
+    // we don't want `MathExtras.h` to be inclued here.
+    const unsigned Bits = CHAR_BIT * sizeof(DeclID);
+    return ID & (DeclID(-1) >> (Bits - 32));
+  }
+
   friend bool operator==(const DeclIDBase &LHS, const DeclIDBase &RHS) {
     return LHS.ID == RHS.ID;
   }
@@ -156,6 +171,9 @@ class LocalDeclID : public DeclIDBase {
   LocalDeclID(PredefinedDeclIDs ID) : Base(ID) {}
   explicit LocalDeclID(DeclID ID) : Base(ID) {}
 
+  explicit LocalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
+      : Base(LocalID, ModuleFileIndex) {}
+
   LocalDeclID &operator++() {
     ++ID;
     return *this;
@@ -175,6 +193,9 @@ class GlobalDeclID : public DeclIDBase {
   GlobalDeclID() : Base() {}
   explicit GlobalDeclID(DeclID ID) : Base(ID) {}
 
+  explicit GlobalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
+      : Base(LocalID, ModuleFileIndex) {}
+
   // For DeclIDIterator<GlobalDeclID> to be able to convert a GlobalDeclID
   // to a LocalDeclID.
   explicit operator LocalDeclID() const { return LocalDeclID(this->ID); }
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index ae9521e427099..cbc3e2cae71be 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -258,6 +258,12 @@ class DeclOffset {
   }
 };
 
+// The unaligned decl ID used in the Blobs of bistreams.
+using unalighed_decl_id_t =
+    llvm::support::detail::packed_endian_specific_integral<
+        serialization::DeclID, llvm::endianness::native,
+        llvm::support::unaligned>;
+
 /// The number of predefined preprocessed entity IDs.
 const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1;
 
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index e24fa121528f3..1f05c1b3dfe24 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -501,12 +501,6 @@ class ASTReader
   /// = I + 1 has already been loaded.
   llvm::PagedVector<Decl *> DeclsLoaded;
 
-  using GlobalDeclMapType = ContinuousRangeMap<GlobalDeclID, ModuleFile *, 4>;
-
-  /// Mapping from global declaration IDs to the module in which the
-  /// declaration resides.
-  GlobalDeclMapType GlobalDeclMap;
-
   using FileOffset = std::pair<ModuleFile *, uint64_t>;
   using FileOffsetsTy = SmallVector<FileOffset, 2>;
   using DeclUpdateOffsetsMap = llvm::DenseMap<GlobalDeclID, FileOffsetsTy>;
@@ -589,10 +583,11 @@ class ASTReader
 
   struct FileDeclsInfo {
     ModuleFile *Mod = nullptr;
-    ArrayRef<LocalDeclID> Decls;
+    ArrayRef<serialization::unalighed_decl_id_t> Decls;
 
     FileDeclsInfo() = default;
-    FileDeclsInfo(ModuleFile *Mod, ArrayRef<LocalDeclID> Decls)
+    FileDeclsInfo(ModuleFile *Mod,
+                  ArrayRef<serialization::unalighed_decl_id_t> Decls)
         : Mod(Mod), Decls(Decls) {}
   };
 
@@ -601,11 +596,7 @@ class ASTReader
 
   /// An array of lexical contents of a declaration context, as a sequence of
   /// Decl::Kind, DeclID pairs.
-  using unalighed_decl_id_t =
-      llvm::support::detail::packed_endian_specific_integral<
-          serialization::DeclID, llvm::endianness::native,
-          llvm::support::unaligned>;
-  using LexicalContents = ArrayRef<unalighed_decl_id_t>;
+  using LexicalContents = ArrayRef<serialization::unalighed_decl_id_t>;
 
   /// Map from a DeclContext to its lexical contents.
   llvm::DenseMap<const DeclContext*, std::pair<ModuleFile*, LexicalContents>>
@@ -1486,10 +1477,11 @@ class ASTReader
                                unsigned ClientLoadCapabilities);
 
 public:
-  class ModuleDeclIterator : public llvm::iterator_adaptor_base<
-                                 ModuleDeclIterator, const LocalDeclID *,
-                                 std::random_access_iterator_tag, const Decl *,
-                                 ptrdiff_t, const Decl *, const Decl *> {
+  class ModuleDeclIterator
+      : public llvm::iterator_adaptor_base<
+            ModuleDeclIterator, const serialization::unalighed_decl_id_t *,
+            std::random_access_iterator_tag, const Decl *, ptrdiff_t,
+            const Decl *, const Decl *> {
     ASTReader *Reader = nullptr;
     ModuleFile *Mod = nullptr;
 
@@ -1497,11 +1489,11 @@ class ASTReader
     ModuleDeclIterator() : iterator_adaptor_base(nullptr) {}
 
     ModuleDeclIterator(ASTReader *Reader, ModuleFile *Mod,
-                       const LocalDeclID *Pos)
+                       const serialization::unalighed_decl_id_t *Pos)
         : iterator_adaptor_base(Pos), Reader(Reader), Mod(Mod) {}
 
     value_type operator*() const {
-      return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, *I));
+      return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, (LocalDeclID)*I));
     }
 
     value_type operator->() const { return **this; }
@@ -1541,6 +1533,9 @@ class ASTReader
              StringRef Arg2 = StringRef(), StringRef Arg3 = StringRef()) const;
   void Error(llvm::Error &&Err) const;
 
+  /// Translate a \param GlobalDeclID to the index of DeclsLoaded array.
+  unsigned translateGlobalDeclIDToIndex(GlobalDeclID ID) const;
+
 public:
   /// Load the AST file and validate its contents against the given
   /// Preprocessor.
@@ -1912,7 +1907,8 @@ class ASTReader
 
   /// Retrieve the module file that owns the given declaration, or NULL
   /// if the declaration is not from a module file.
-  ModuleFile *getOwningModuleFile(const Decl *D);
+  ModuleFile *getOwningModuleFile(const Decl *D) const;
+  ModuleFile *getOwningModuleFile(GlobalDeclID ID) const;
 
   /// Returns the source location for the decl \p ID.
   SourceLocation getSourceLocationForDeclID(GlobalDeclID ID);
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 8e0fce1fd48ce..414db34522307 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -454,23 +454,11 @@ class ModuleFile {
   /// by the declaration ID (-1).
   const DeclOffset *DeclOffsets = nullptr;
 
-  /// Base declaration ID for declarations local to this module.
-  serialization::DeclID BaseDeclID = 0;
-
-  /// Remapping table for declaration IDs in this module.
-  ContinuousRangeMap<serialization::DeclID, int, 2> DeclRemap;
-
-  /// Mapping from the module files that this module file depends on
-  /// to the base declaration ID for that module as it is understood within this
-  /// module.
-  ///
-  /// This is effectively a reverse global-to-local mapping for declaration
-  /// IDs, so that we can interpret a true global ID (for this translation unit)
-  /// as a local ID (for this module file).
-  llvm::DenseMap<ModuleFile *, serialization::DeclID> GlobalToLocalDeclIDs;
+  /// Base declaration index in ASTReader for declarations local to this module.
+  unsigned BaseDeclIndex = 0;
 
   /// Array of file-level DeclIDs sorted by file.
-  const LocalDeclID *FileSortedDecls = nullptr;
+  const serialization::unalighed_decl_id_t *FileSortedDecls = nullptr;
   unsigned NumFileSortedDecls = 0;
 
   /// Array of category list location information within this
diff --git a/clang/include/clang/Serialization/ModuleManager.h b/clang/include/clang/Serialization/ModuleManager.h
index 3bd379acf7eda..097a1c16a0573 100644
--- a/clang/include/clang/Serialization/ModuleManager.h
+++ b/clang/include/clang/Serialization/ModuleManager.h
@@ -46,7 +46,7 @@ namespace serialization {
 /// Manages the set of modules loaded by an AST reader.
 class ModuleManager {
   /// The chain of AST files, in the order in which we started to load
-  /// them (this order isn't really useful for anything).
+  /// them.
   SmallVector<std::unique_ptr<ModuleFile>, 2> Chain;
 
   /// The chain of non-module PCH files. The first entry is the one named
diff --git a/clang/lib/AST/DeclBase.cpp b/clang/lib/AST/DeclBase.cpp
index 03e1055251c24..225a0015f26ae 100644
--- a/clang/lib/AST/DeclBase.cpp
+++ b/clang/lib/AST/DeclBase.cpp
@@ -74,18 +74,16 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Context,
                          GlobalDeclID ID, std::size_t Extra) {
   // Allocate an extra 8 bytes worth of storage, which ensures that the
   // resulting pointer will still be 8-byte aligned.
-  static_assert(sizeof(unsigned) * 2 >= alignof(Decl),
-                "Decl won't be misaligned");
+  static_assert(sizeof(uint64_t) >= alignof(Decl), "Decl won't be misaligned");
   void *Start = Context.Allocate(Size + Extra + 8);
   void *Result = (char*)Start + 8;
 
-  unsigned *PrefixPtr = (unsigned *)Result - 2;
+  uint64_t *PrefixPtr = (uint64_t *)Result - 1;
 
-  // Zero out the first 4 bytes; this is used to store the owning module ID.
-  PrefixPtr[0] = 0;
+  *PrefixPtr = ID.get();
 
-  // Store the global declaration ID in the second 4 bytes.
-  PrefixPtr[1] = ID.get();
+  // We leave the upper 16 bits to store the module IDs.
+  assert(*PrefixPtr < llvm::maskTrailingOnes<uint64_t>(48));
 
   return Result;
 }
@@ -111,6 +109,28 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Ctx,
   return ::operator new(Size + Extra, Ctx);
 }
 
+GlobalDeclID Decl::getGlobalID() const {
+  if (!isFromASTFile())
+    return GlobalDeclID();
+  uint64_t ID = *((const uint64_t *)this - 1);
+  return GlobalDeclID(ID & llvm::maskTrailingOnes<uint64_t>(48));
+}
+
+unsigned Decl::getOwningModuleID() const {
+  if (!isFromASTFile())
+    return 0;
+
+  uint64_t ID = *((const uint64_t *)this - 1);
+  return ID >> 48;
+}
+
+void Decl::setOwningModuleID(unsigned ID) {
+  assert(isFromASTFile() && "Only works on a deserialized declaration");
+  uint64_t *IDAddress = (uint64_t *)this - 1;
+  assert(!((*IDAddress) >> 48) && "We should only set the module ID once");
+  *IDAddress |= (uint64_t)ID << 48;
+}
+
 Module *Decl::getOwningModuleSlow() const {
   assert(isFromASTFile() && "Not from AST file?");
   return getASTContext().getExternalSource()->getModule(getOwningModuleID());
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 78e4df440641e..b093a32ae5596 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -1656,7 +1656,7 @@ bool ASTReader::ReadSLocEntry(int ID) {
 
     unsigned NumFileDecls = Record[7];
     if (NumFileDecls && ContextObj) {
-      const LocalDeclID *FirstDecl = F->FileSortedDecls + Record[6];
+      const unalighed_decl_id_t *FirstDecl = F->FileSortedDecls + Record[6];
       assert(F->FileSortedDecls && "FILE_SORTED_DECLS not encountered yet ?");
       FileDeclIDs[FID] =
           FileDeclsInfo(F, llvm::ArrayRef(FirstDecl, NumFileDecls));
@@ -3376,26 +3376,11 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
             "duplicate DECL_OFFSET record in AST file");
       F.DeclOffsets = (const DeclOffset *)Blob.data();
       F.LocalNumDecls = Record[0];
-      unsigned LocalBaseDeclID = Record[1];
-      F.BaseDeclID = getTotalNumDecls();
-
-      if (F.LocalNumDecls > 0) {
-        // Introduce the global -> local mapping for declarations within this
-        // module.
-        GlobalDeclMap.insert(std::make_pair(
-            GlobalDeclID(getTotalNumDecls() + NUM_PREDEF_DECL_IDS), &F));
-
-        // Introduce the local -> global mapping for declarations within this
-        // module.
-        F.DeclRemap.insertOrReplace(
-          std::make_pair(LocalBaseDeclID, F.BaseDeclID - LocalBaseDeclID));
-
-        // Introduce the global -> local mapping for declarations within this
-        // module.
-        F.GlobalToLocalDeclIDs[&F] = LocalBaseDeclID;
+      F.BaseDeclIndex = getTotalNumDecls();
 
+      if (F.LocalNumDecls > 0)
         DeclsLoaded.resize(DeclsLoaded.size() + F.LocalNumDecls);
-      }
+
       break;
     }
 
@@ -3630,7 +3615,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case FILE_SORTED_DECLS:
-      F.FileSortedDecls = (const LocalDeclID *)Blob.data();
+      F.FileSortedDecls = (const unalighed_decl_id_t *)Blob.data();
       F.NumFileSortedDecls = Record[0];
       break;
 
@@ -4057,7 +4042,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
   RemapBuilder PreprocessedEntityRemap(F.PreprocessedEntityRemap);
   RemapBuilder SubmoduleRemap(F.SubmoduleRemap);
   RemapBuilder SelectorRemap(F.SelectorRemap);
-  RemapBuilder DeclRemap(F.DeclRemap);
   RemapBuilder TypeRemap(F.TypeRemap);
 
   auto &ImportedModuleVector = F.DependentModules;
@@ -4096,8 +4080,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
     uint32_t SelectorIDOffset =
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
-    uint32_t DeclIDOffset =
-        endian::readNext<uint32_t, llvm::endianness::little>(Data);
     uint32_t TypeIndexOffset =
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
 
@@ -4115,11 +4097,7 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
               PreprocessedEntityRemap);
     mapOffset(SubmoduleIDOffset, OM->BaseSubmoduleID, SubmoduleRemap);
     mapOffset(SelectorIDOffset, OM->BaseSelectorID, SelectorRemap);
-    mapOffset(DeclIDOffset, OM->BaseDeclID, DeclRemap);
     mapOffset(TypeIndexOffset, OM->BaseTypeIndex, TypeRemap);
-
-    // Global -> local mappings.
-    F.GlobalToLocalDeclIDs[OM] = DeclIDOffset;
   }
 }
 
@@ -7643,18 +7621,25 @@ CXXBaseSpecifier *ASTReader::GetExternalCXXBaseSpecifiers(uint64_t Offset) {
 
 GlobalDeclID ASTReader::getGlobalDeclID(ModuleFile &F,
                                         LocalDeclID LocalID) const {
-  DeclID ID = LocalID.get();
-  if (ID < NUM_PREDEF_DECL_IDS)
-    return GlobalDeclID(ID);
+  if (LocalID.get() < NUM_PREDEF_DECL_IDS)
+    return GlobalDeclID(LocalID.get());
+
+  unsigned OwningModuleFileIndex = LocalID.getModuleFileIndex();
+  DeclID ID = LocalID.getLocalDeclIndex();
 
   if (!F.ModuleOffsetMap.empty())
     ReadModuleOffsetMap(F);
 
-  ContinuousRangeMap<DeclID, int, 2>::iterator I =
-      F.DeclRemap.find(ID - NUM_PREDEF_DECL_IDS);
-  assert(I != F.DeclRemap.end() && "Invalid index into decl index remap");
+  ModuleFile *OwningModuleFile =
+      OwningModuleFileIndex == 0
+          ? &F
+          : F.DependentModules[OwningModuleFileIndex - 1];
+
+  if (OwningModuleFileIndex == 0)
+    ID -= NUM_PREDEF_DECL_IDS;
 
-  return GlobalDeclID(ID + I->second);
+  uint64_t NewModuleFileIndex = OwningModuleFile->Index + 1;
+  return GlobalDeclID(ID, NewModuleFileIndex);
 }
 
 bool ASTReader::isDeclIDFromModule(GlobalDeclID ID, ModuleFile &M) const {
@@ -7662,31 +7647,33 @@ bool ASTReader::isDeclIDFromModule(GlobalDeclID ID, ModuleFile &M) const {
   if (ID.get() < NUM_PREDEF_DECL_IDS)
     return false;
 
-  return ID.get() - NUM_PREDEF_DECL_IDS >= M.BaseDeclID &&
-         ID.get() - NUM_PREDEF_DECL_IDS < M.BaseDeclID + M.LocalNumDecls;
+  unsigned ModuleFileIndex = ID.getModuleFileIndex();
+  return M.Index == ModuleFileIndex - 1;
+}
+
+ModuleFile *ASTReader::getOwningModuleFile(GlobalDeclID ID) const {
+  // Predefined decls aren't from any module.
+  if (ID.get() < NUM_PREDEF_DECL_IDS)
+    return nullptr;
+
+  uint64_t ModuleFileIndex = ID.getModuleFileIndex();
+  assert(ModuleFileIndex && "Untranslated Local Decl?");
+
+  return &getModuleManager()[ModuleFileIndex - 1];
 }
 
-ModuleFile *ASTReader::getOwningModuleFile(const Decl *D) {
+ModuleFile *ASTReader::getOwningModuleFile(const Decl *D) const {
   if (!D->isFromASTFile())
     return nullptr;
-  GlobalDeclMapType::const_iterator I =
-      GlobalDeclMap.find(GlobalDeclID(D->getGlobalID()));
-  assert(I != GlobalDeclMap.end() && "Corrupted global declaration map");
-  return I->second;
+
+  return getOwningModuleFile(GlobalDeclID(D->getGlobalID()));
 }
 
 SourceLocation ASTReader::getSourceLocationForDeclID(GlobalDeclID ID) {
   if (ID.get() < NUM_PREDEF_DECL_IDS)
     return SourceLocation();
 
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
-
-  if (Index > DeclsLoaded.size()) {
-    Error("declaration ID out-of-range for AST file");
-    return SourceLocation();
-  }
-
-  if (Decl *D = DeclsLoaded[Index])
+  if (Decl *D = GetExistingDecl(ID))
     return D->getLocation();
 
   SourceLocation Loc;
@@ -7753,8 +7740,19 @@ static Decl *getPredefinedDecl(ASTContext &Context, PredefinedDeclIDs ID) {
   llvm_unreachable("PredefinedDeclIDs unknown enum value");
 }
 
+unsigned ASTReader::translateGlobalDeclIDToIndex(GlobalDeclID GlobalID) const {
+  ModuleFile *OwningModuleFile = getOwningModuleFile(GlobalID);
+  if (!OwningModuleFile) {
+    assert(GlobalID.get() < NUM_PREDEF_DECL_IDS && "Untransalted Global ID?");
+    return GlobalID.get();
+  }
+
+  return OwningModuleFile->BaseDeclIndex + GlobalID.getLocalDeclIndex();
+}
+
 Decl *ASTReader::GetExistingDecl(GlobalDeclID ID) {
   assert(ContextObj && "reading decl with no AST context");
+
   if (ID.get() < NUM_PREDEF_DECL_IDS) {
     Decl *D = getPredefinedDecl(*ContextObj, (PredefinedDeclIDs)ID);
     if (D) {
@@ -7767,7 +7765,7 @@ Decl *ASTReader::GetExistingDecl(GlobalDeclID ID) {
     return D;
   }
 
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
+  unsigned Index = translateGlobalDeclIDToIndex(ID);
 
   if (Index >= DeclsLoaded.size()) {
     assert(0 && "declaration ID out-of-range for AST file");
@@ -7782,7 +7780,7 @@ Decl *ASTReader::GetDecl(GlobalDeclID ID) {
   if (ID.get() < NUM_PREDEF_DECL_IDS)
     return GetExistingDecl(ID);
 
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
+  unsigned Index = translateGlobalDeclIDToIndex(ID);
 
   if (Index >= DeclsLoaded.size()) {
     assert(0 && "declaration ID out-of-range for AST file");
@@ -7801,20 +7799,31 @@ Decl *ASTReader::GetDecl(GlobalDeclID ID) {
 
 LocalDeclID ASTReader::mapGlobalIDToModuleFileGlobalID(ModuleFile &M,
                                                        GlobalDeclID GlobalID) {
-  DeclID ID = GlobalID.get();
-  if (ID < NUM_PREDEF_DECL_IDS)
+  if (GlobalID.get() < NUM_PREDEF_DECL_IDS)
+    return LocalDeclID(GlobalID.get());
+
+  if (!M.ModuleOffsetMap.empty())
+    ReadModuleOffsetMap(M);
+
+  ModuleFile *Owner = getOwningModuleFile(GlobalID);
+  DeclID ID = GlobalID.getLocalDeclIndex();
+
+  if (Owner == &M) {
+    ID += NUM_PREDEF_DECL_IDS;
     return LocalDeclID(ID);
+  }
 
-  GlobalDeclMapType::const_iterator I = GlobalDeclMap.find(GlobalID);
-  assert(I != GlobalDeclMap.end() && "Corrupted global declaration map");
-  ModuleFile *Owner = I->second;
+  uint64_t OrignalModuleFileIndex = 0;
+  for (unsigned I = 0; I < M.DependentModules.size(); I++)
+    if (M.DependentModules[I] == Owner) {
+      OrignalModuleFileIndex = I + 1;
+      break;
+    }
 
-  llvm::DenseMap<ModuleFile *, DeclID>::iterator Pos =
-      M.GlobalToLocalDeclIDs.find(Owner);
-  if (Pos == M.GlobalToLocalDeclIDs.end())
+  if (!OrignalModuleFileIndex)
     return LocalDeclID();
 
-  return LocalDeclID(ID - Owner->BaseDeclID + Pos->second);
+  return LocalDeclID(ID, OrignalModuleFileIndex);
 }
 
 GlobalDeclID ASTReader::ReadDeclID(ModuleFile &F, const RecordData &Record,
@@ -7893,32 +7902,34 @@ void ASTReader::FindExternalLexicalDecls(
 
 namespace {
 
-class DeclIDComp {
+class UnalignedDeclIDComp {
   ASTReader &Reader;
   ModuleFile &Mod;
 
 public:
-  DeclIDComp(ASTReader &Reader, ModuleFile &M) : Reader(Reader), Mod(M) {}
+  UnalignedDeclIDComp(ASTReader &Reader, ModuleFile &M)
+      : Reader(Reader), Mod(M) {}
 
-  bool operator()(LocalDeclID L, LocalDeclID R) const {
+  bool operator()(unalighed_decl_id_t L, unalighed_decl_id_t R) const {
     SourceLocation LHS = getLocation(L);
     SourceLocation RHS = getLocation(R);
     return Reader.getSourceManager().isBeforeInTranslationUnit(LHS, RHS);
   }
 
-  bool operator()(SourceLocation LHS, LocalDeclID R) const {
+  bool operator()(SourceLocation LHS, unalighed_decl_id_t R) const {
     SourceLocation RHS = getLocation(R);
     return Reader.getSourceManager().isBeforeInTranslationUnit(LHS, RHS);
   }
 
-  bool operator()(LocalDeclID L, SourceLocation RHS) const {
+  bool operator()(unalighed_decl_id_t L, SourceLocation RHS) const {
     SourceLocation LHS = getLocation(L);
     return Reader.getSourceManager().isBeforeInTranslationUnit(LHS, RHS);
   }
 
-  SourceLocation getLocation(LocalDeclID ID) const {
+  SourceLocation getLocation(unalighed_decl_id_t ID) const {
     return Reader.getSourceManager().getFileLoc(
-            Reader.getSourceLocationForDeclID(Reader.getGlobalDeclID(Mod, ID)));
+        Reader.getSourceLocationForDeclID(
+            Reader.getGlobalDeclID(Mod, (LocalDeclID)ID)));
   }
 };
 
@@ -7941,8 +7952,8 @@ void ASTReader::FindFileRegionDecls(FileID File,
     BeginLoc = SM.getLocForStartOfFile(File).getLocWithOffset(Offset);
   SourceLocation EndLoc = BeginLoc.getLocWithOffset(Length);
 
-  DeclIDComp DIDComp(*this, *DInfo.Mod);
-  ArrayRef<LocalDeclID>::iterator BeginIt =
+  UnalignedDeclIDComp DIDComp(*this, *DInfo.Mod);
+  ArrayRef<unalighed_decl_id_t>::iterator BeginIt =
       llvm::lower_bound(DInfo.Decls, BeginLoc, DIDComp);
   if (BeginIt != DInfo.Decls.begin())
     --BeginIt;
@@ -7951,17 +7962,18 @@ void ASTReader::FindFileRegionDecls(FileID File,
   // to backtrack until we find it otherwise we will fail to report that the
   // region overlaps with an objc container.
   while (BeginIt != DInfo.Decls.begin() &&
-         GetDecl(getGlobalDeclID(*DInfo.Mod, *BeginIt))
+         GetDecl(getGlobalDeclID(*DInfo.Mod, (LocalDeclID)(*BeginIt)))
              ->isTopLevelDeclInObjCContainer())
     --BeginIt;
 
-  ArrayRef<LocalDeclID>::iterator EndIt =
+  ArrayRef<unalighed_decl_id_t>::iterator EndIt =
       llvm::upper_bound(DInfo.Decls, EndLoc, DIDComp);
   if (EndIt != DInfo.Decls.end())
     ++EndIt;
 
-  for (ArrayRef<LocalDeclID>::iterator DIt = BeginIt; DIt != EndIt; ++DIt)
-    Decls.push_back(GetDecl(getGlobalDeclID(*DInfo.Mod, *DIt)));
+  for (ArrayRef<unalighed_decl_id_t>::iterator DIt = BeginIt; DIt != EndIt;
+       ++DIt)
+    Decls.push_back(GetDecl(getGlobalDeclID(*DInfo.Mod, (LocalDeclID)(*DIt))));
 }
 
 bool
@@ -8168,7 +8180,6 @@ LLVM_DUMP_METHOD void ASTReader::dump() {
   dumpModuleIDMap("Global bit offset map", GlobalBitOffsetsMap);
   dumpModuleIDMap("Global source location entry map", GlobalSLocEntryMap);
   dumpModuleIDMap("Global type map", GlobalTypeMap);
-  dumpModuleIDMap("Global declaration map", GlobalDeclMap);
   dumpModuleIDMap("Global identifier map", GlobalIdentifierMap);
   dumpModuleIDMap("Global macro map", GlobalMacroMap);
   dumpModuleIDMap("Global submodule map", GlobalSubmoduleMap);
diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 089ede4f49260..7682345f5a804 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -3242,11 +3242,10 @@ bool ASTReader::isConsumerInterestedIn(Decl *D) {
 /// Get the correct cursor and offset for loading a declaration.
 ASTReader::RecordLocation ASTReader::DeclCursorForID(GlobalDeclID ID,
                                                      SourceLocation &Loc) {
-  GlobalDeclMapType::iterator I = GlobalDeclMap.find(ID);
-  assert(I != GlobalDeclMap.end() && "Corrupted global declaration map");
-  ModuleFile *M = I->second;
-  const DeclOffset &DOffs =
-      M->DeclOffsets[ID.get() - M->BaseDeclID - NUM_PREDEF_DECL_IDS];
+  ModuleFile *M = getOwningModuleFile(ID);
+  assert(M);
+  unsigned LocalDeclIndex = ID.getLocalDeclIndex();
+  const DeclOffset &DOffs = M->DeclOffsets[LocalDeclIndex];
   Loc = ReadSourceLocation(*M, DOffs.getRawLoc());
   return RecordLocation(M, DOffs.getBitOffset(M->DeclsBlockStartOffset));
 }
@@ -3789,7 +3788,6 @@ void ASTReader::markIncompleteDeclChain(Decl *D) {
 
 /// Read the declaration at the given offset from the AST file.
 Decl *ASTReader::ReadDeclRecord(GlobalDeclID ID) {
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
   SourceLocation DeclLoc;
   RecordLocation Loc = DeclCursorForID(ID, DeclLoc);
   llvm::BitstreamCursor &DeclsCursor = Loc.F->DeclsCursor;
@@ -4120,7 +4118,7 @@ Decl *ASTReader::ReadDeclRecord(GlobalDeclID ID) {
   }
 
   assert(D && "Unknown declaration reading AST file");
-  LoadedDecl(Index, D);
+  LoadedDecl(translateGlobalDeclIDToIndex(ID), D);
   // Set the DeclContext before doing any deserialization, to make sure internal
   // calls to Decl::getASTContext() by Decl's methods will find the
   // TranslationUnitDecl without crashing.
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index e7b9050165bb7..9f5ff2c781c2d 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -3357,12 +3357,10 @@ void ASTWriter::WriteTypeDeclOffsets() {
   Abbrev = std::make_shared<BitCodeAbbrev>();
   Abbrev->Add(BitCodeAbbrevOp(DECL_OFFSET));
   Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // # of declarations
-  Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // base decl ID
   Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // declarations block
   unsigned DeclOffsetAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
   {
-    RecordData::value_type Record[] = {DECL_OFFSET, DeclOffsets.size(),
-                                       FirstDeclID.get() - NUM_PREDEF_DECL_IDS};
+    RecordData::value_type Record[] = {DECL_OFFSET, DeclOffsets.size()};
     Stream.EmitRecordWithBlob(DeclOffsetAbbrev, Record, bytes(DeclOffsets));
   }
 }
@@ -5415,7 +5413,6 @@ ASTFileSignature ASTWriter::WriteASTCore(Sema &SemaRef, StringRef isysroot,
                           M.NumPreprocessedEntities);
         writeBaseIDOrNone(M.BaseSubmoduleID, M.LocalNumSubmodules);
         writeBaseIDOrNone(M.BaseSelectorID, M.LocalNumSelectors);
-        writeBaseIDOrNone(M.BaseDeclID, M.LocalNumDecls);
         writeBaseIDOrNone(M.BaseTypeIndex, M.LocalNumTypes);
       }
     }
@@ -6596,13 +6593,11 @@ void ASTWriter::ReaderInitialized(ASTReader *Reader) {
 
   // Note, this will get called multiple times, once one the reader starts up
   // and again each time it's done reading a PCH or module.
-  FirstDeclID = LocalDeclID(NUM_PREDEF_DECL_IDS + Chain->getTotalNumDecls());
   FirstTypeID = NUM_PREDEF_TYPE_IDS + Chain->getTotalNumTypes();
   FirstIdentID = NUM_PREDEF_IDENT_IDS + Chain->getTotalNumIdentifiers();
   FirstMacroID = NUM_PREDEF_MACRO_IDS + Chain->getTotalNumMacros();
   FirstSubmoduleID = NUM_PREDEF_SUBMODULE_IDS + Chain->getTotalNumSubmodules();
   FirstSelectorID = NUM_PREDEF_SELECTOR_IDS + Chain->getTotalNumSelectors();
-  NextDeclID = FirstDeclID;
   NextTypeID = FirstTypeID;
   NextIdentID = FirstIdentID;
   NextMacroID = FirstMacroID;
diff --git a/clang/lib/Serialization/ModuleFile.cpp b/clang/lib/Serialization/ModuleFile.cpp
index 2c42d33a8f5dd..f64a59bd94891 100644
--- a/clang/lib/Serialization/ModuleFile.cpp
+++ b/clang/lib/Serialization/ModuleFile.cpp
@@ -87,7 +87,6 @@ LLVM_DUMP_METHOD void ModuleFile::dump() {
                << "  Number of types: " << LocalNumTypes << '\n';
   dumpLocalRemap("Type index local -> global map", TypeRemap);
 
-  llvm::errs() << "  Base decl ID: " << BaseDeclID << '\n'
+  llvm::errs() << "  Base decl index: " << BaseDeclIndex << '\n'
                << "  Number of decls: " << LocalNumDecls << '\n';
-  dumpLocalRemap("Decl ID local -> global map", DeclRemap);
 }
diff --git a/clang/test/Modules/no-transitive-decls-change.cppm b/clang/test/Modules/no-transitive-decls-change.cppm
new file mode 100644
index 0000000000000..42ac061bc90b3
--- /dev/null
+++ b/clang/test/Modules/no-transitive-decls-change.cppm
@@ -0,0 +1,112 @@
+// Testing that changing a declaration in an unused module file won't change 
+// the BMI of the current module file.
+//
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+//
+// RUN: %clang_cc1 -std=c++20 %t/m-partA.cppm -emit-reduced-module-interface -o %t/m-partA.pcm
+// RUN: %clang_cc1 -std=c++20 %t/m-partA.v1.cppm -emit-reduced-module-interface -o \
+// RUN:     %t/m-partA.v1.pcm
+// RUN: %clang_cc1 -std=c++20 %t/m-partB.cppm -emit-reduced-module-interface -o %t/m-partB.pcm
+// RUN: %clang_cc1 -std=c++20 %t/m.cppm -emit-reduced-module-interface -o %t/m.pcm \
+// RUN:     -fmodule-file=m:partA=%t/m-partA.pcm -fmodule-file=m:partB=%t/m-partB.pcm
+// RUN: %clang_cc1 -std=c++20 %t/m.cppm -emit-reduced-module-interface -o %t/m.v1.pcm \
+// RUN:     -fmodule-file=m:partA=%t/m-partA.v1.pcm -fmodule-file=m:partB=%t/m-partB.pcm
+//
+// RUN: %clang_cc1 -std=c++20 %t/useBOnly.cppm -emit-reduced-module-interface -o %t/useBOnly.pcm \
+// RUN:     -fmodule-file=m=%t/m.pcm -fmodule-file=m:partA=%t/m-partA.pcm \
+// RUN:     -fmodule-file=m:partB=%t/m-partB.pcm
+// RUN: %clang_cc1 -std=c++20 %t/useBOnly.cppm -emit-reduced-module-interface -o %t/useBOnly.v1.pcm \
+// RUN:     -fmodule-file=m=%t/m.v1.pcm -fmodule-file=m:partA=%t/m-partA.v1.pcm \
+// RUN:     -fmodule-file=m:partB=%t/m-partB.pcm
+// Since useBOnly only uses partB from module M, the change in partA shouldn't affect
+// useBOnly.
+// RUN: diff %t/useBOnly.pcm %t/useBOnly.v1.pcm &> /dev/null
+
+//--- m-partA.cppm
+export module m:partA;
+
+namespace A_Impl {
+    inline int getAImpl() {
+        return 43;
+    }
+
+    inline int getA2Impl() {
+        return 43;
+    }
+}
+
+namespace A {
+    using A_Impl::getAImpl;
+}
+
+export inline int getA() {
+    return 43;
+}
+
+export inline int getA2(int) {
+    return 88;
+}
+
+//--- m-partA.v1.cppm
+export module m:partA;
+
+namespace A_Impl {
+    inline int getAImpl() {
+        return 43;
+    }
+
+    inline int getA2Impl() {
+        return 43;
+    }
+}
+
+namespace A {
+    using A_Impl::getAImpl;
+    // Adding a new declaration without introducing a new declaration name.
+    using A_Impl::getA2Impl;
+}
+
+inline int getA() {
+    return 43;
+}
+
+inline int getA2(int) {
+    return 88;
+}
+
+// Now we add a new declaration without introducing new identifier and new types.
+// The consuming module which didn't use m:partA completely is expected to be
+// not changed.
+inline int getA(int) {
+    return 88;
+}
+
+//--- m-partB.cppm
+export module m:partB;
+
+export inline int getB() {
+    return 430;
+}
+
+//--- m.cppm
+export module m;
+export import :partA;
+export import :partB;
+
+//--- useBOnly.cppm
+export module useBOnly;
+import m;
+
+export inline int get() {
+    return getB();
+}
+
+//--- useAOnly.cppm
+export module useAOnly;
+import m;
+
+export inline int get() {
+    A<int> a;
+    return a.getValue();
+}



More information about the cfe-commits mailing list