[clang] [Modules] No transitive source location change (PR #86912)

via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 28 01:02:18 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang

Author: Chuanqi Xu (ChuanqiXu9)

<details>
<summary>Changes</summary>

This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @<!-- -->Bigcheese in the tokyo's WG21 meeting.

The idea comes from @<!-- -->jyknight posted on LLVM discourse. That for:

```
// A.cppm
export module A;
...

// B.cppm
export module B;
import A;
...

//--- C.cppm
export module C;
import C;
```

Almost every time A.cppm changes, we need to recompile `C`. Due to we think the source location is significant to the semantics.  But it may be good if we can avoid recompiling `C` if the change from `A` wouldn't change the BMI of B.

# Motivation Example

This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test.

```
//--- A.cppm
export module A;
export template <class T>
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- A.v1.cppm
export module A;

export template <class T>
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- B.cppm
export module B;
import A;

export int funcB() {
    return funcA();
}

//--- C.cppm
export module C;
import A;
export void testD() {
    C<int> c;
    c.func();
}
```

Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` has an additional blank line. Then the test shows that two BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. 

However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and `C.v1.pcm` can and should differ.

# Internal perspective of status quo

To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them.

For source locations, we encoded them as:

```
|
|
| _____ base offset of an imported module
|
|
|
|_____ base offset of another imported module
|
|
|
|
| ___ 0
```

As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset.

For example, for,

```
// a.cppm
export module a;
...

// b.cppm
export module b;
import a;
...
```

Assuming the offset of a source location (let's name the location as `S`) in a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example.  Then when we want to get the location in the current source location space for `S`, we can get it simply by adding `45` to `90` to `135`. Finally we can get the source location for `S` in module B as `135`.

And when we want to write module `b`, we would also write the source location of `S` as `135` directly in the BMI. And to clarify the location `S` comes from module `a`, we also need to record the base offset of  module `a`, 90 in the BMI of `b`.

Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve.

Let's continue with the existing design to understand what's going on. Another interesting case is:

```
// c.cppm
export module c;
import a;
import b;
...
```

In `c.cppm`, when we import `a`, we still need to allocate a base location offset for it, let's say the value becomes to `200` somehow. Then when we reach the location `S` recorded in module `b`, we need to translate it into the current source location space. The solution is quite simple, we can get it by `135 + (200 - 90) = 245`. In another word, the offset of a source location in current module can be computed as `Recorded Offset + Base Offset of the its module file - Recorded Base Offset`.

Then we're almost done about how we handle the offset of source locations in serializers.

# The high level design of current patch

>From the abstract level, what we want to do  is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location  in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location.

So in this patch, for each source location, we will store the local offset of the location and the module file index. For  the above example, in `b.pcm`, the source location of `S` will be recorded as `135` directly. And in the new design, the source location of `S` will be recorded as `<1, 45>`. Here `1` stands for the module file index of `a` in module `b`. And `45` means the offset of `S` to the base offset of module `a`.

So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @<!-- -->Bigcheese , this should be helpful for clang explicit modules too.

And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 208M from 200M. I think the trade-off is pretty fair.

# Some low level details

I didn't use another slot to record the module file index. I tried to use the higher bits of the existing source location encodings to store that information. This design may be safe. Since we use `unsigned` to store source locations but we use uint64_t in serialization. And generally `unsigned` is 32 bit width in most platforms. So it might not be a safe problem.

Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty.

Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the `SourceLocationSequence` optimization. 

This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy. I don't remove optimizations as `SourceLocationSequence` in the current patch to avoid increasing the size of the current patch. I'll try to remove that as a NFC patch after this landed.

# Correctness

The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location. For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it may be correct.

# Future Plans

I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing.


---

Patch is 42.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86912.diff


15 Files Affected:

- (modified) clang/include/clang/Basic/SourceLocation.h (+1) 
- (modified) clang/include/clang/Serialization/ASTBitCodes.h (+25-31) 
- (modified) clang/include/clang/Serialization/ASTReader.h (+34-20) 
- (modified) clang/include/clang/Serialization/ASTWriter.h (+4) 
- (modified) clang/include/clang/Serialization/ModuleFile.h (-4) 
- (modified) clang/include/clang/Serialization/SourceLocationEncoding.h (+53-25) 
- (modified) clang/lib/Frontend/ASTUnit.cpp (-2) 
- (modified) clang/lib/Serialization/ASTReader.cpp (+33-53) 
- (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+1-1) 
- (modified) clang/lib/Serialization/ASTWriter.cpp (+34-7) 
- (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+5-3) 
- (modified) clang/lib/Serialization/ModuleFile.cpp (-1) 
- (added) clang/test/Modules/no-transitive-source-location-change.cppm (+69) 
- (modified) clang/test/Modules/pr61067.cppm (-25) 
- (modified) clang/unittests/Serialization/SourceLocationEncodingTest.cpp (+5-61) 


``````````diff
diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait<SourceLocation, void>;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index f31efa5117f0d1..628ce03572fea6 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include <cassert>
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-      : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-        BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-    return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+      : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-    return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-      : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) {
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+      : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-    return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-    return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
-             uint64_t DeclTypesBlockStartOffset) {
-    setLocation(Loc);
+  DeclOffset(RawLocEncoding RawLoc, uint64_t BitOffset,
+             uint64_t DeclTypesBlockStartOffset)
+      : RawLoc(RawLoc) {
     setBitOffset(BitOffset, DeclTypesBlockStartOffset);
   }
 
-  void setLocation(SourceLocation L) { Loc = L.getRawEncoding(); }
+  void setRawLoc(RawLocEncoding Loc) { RawLoc = Loc; }
 
-  SourceLocation getLocation() const {
-    return SourceLocation::getFromRawEncoding(Loc);
-  }
+  RawLocEncoding getRawLoc() const { return RawLoc; }
 
   void setBitOffset(uint64_t Offset, const uint64_t DeclTypesBlockStartOffset) {
     BitOffset.setBitOffset(Offset - DeclTypesBlockStartOffset);
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 370d8037a4da17..017c6b76a91495 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -696,7 +696,7 @@ class ASTReader
   /// Mapping from global submodule IDs to the module file in which the
   /// submodule resides along with the offset that should be added to the
   /// global submodule ID to produce a local ID.
-  GlobalSubmoduleMapType GlobalSubmoduleMap;
+  mutable GlobalSubmoduleMapType GlobalSubmoduleMap;
 
   /// A set of hidden declarations.
   using HiddenNames = SmallVector<Decl *, 2>;
@@ -942,6 +942,12 @@ class ASTReader
   /// Sema tracks these to emit deferred diags.
   llvm::SmallSetVector<serialization::DeclID, 4> DeclsToCheckForDeferredDiags;
 
+  /// The module files imported by different module files. Indirectly imported
+  /// module files are included too. The information comes from
+  /// ReadModuleOffsetMap(ModuleFile&).
+  mutable llvm::DenseMap<ModuleFile *, llvm::SmallVector<ModuleFile *>>
+      ImportedModuleFiles;
+
 private:
   struct ImportedSubmodule {
     serialization::SubmoduleID ID;
@@ -1761,6 +1767,7 @@ class ASTReader
 
   /// Retrieve the module manager.
   ModuleManager &getModuleManager() { return ModuleMgr; }
+  const ModuleManager &getModuleManager() const { return ModuleMgr; }
 
   /// Retrieve the preprocessor.
   Preprocessor &getPreprocessor() const { return PP; }
@@ -2170,8 +2177,8 @@ class ASTReader
 
   /// Retrieve the global submodule ID given a module and its local ID
   /// number.
-  serialization::SubmoduleID
-  getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID);
+  serialization::SubmoduleID getGlobalSubmoduleID(ModuleFile &M,
+                                                  unsigned LocalID) const;
 
   /// Retrieve the submodule that corresponds to a global submodule ID.
   ///
@@ -2184,7 +2191,7 @@ class ASTReader
 
   /// Retrieve the module file with a given local ID within the specified
   /// ModuleFile.
-  ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID);
+  ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID) const;
 
   /// Get an ID for the given module file.
   unsigned getModuleFileID(ModuleFile *M);
@@ -2220,40 +2227,47 @@ class ASTReader
     return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-                                                LocSeq *Seq = nullptr) const {
+  std::pair<SourceLocation, unsigned>
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+                                 LocSeq *Seq = nullptr) const {
     return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile &ModuleFile,
-                                    SourceLocation::UIntTy Raw,
-                                    LocSeq *Seq = nullptr) const {
-    SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-    return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile &MF, RawLocEncoding Raw,
+                                       LocSeq *Seq = nullptr) const {
+    if (!MF.ModuleOffsetMap.empty())
+      ReadModuleOffsetMap(MF);
+
+    auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+    ModuleFile *ModuleFileHomingLoc =
+        ModuleFileIndex ? ImportedModuleFiles[&MF][ModuleFileIndex - 1] : &MF;
+    return TranslateSourceLocation(*ModuleFileHomingLoc, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile &ModuleFile,
                                          SourceLocation Loc) const {
-    if (!ModuleFile.ModuleOffsetMap.empty())
-      ReadModuleOffsetMap(ModuleFile);
-    assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-               ModuleFile.SLocRemap.end() &&
-           "Cannot find offset to remap.");
-    SourceLocation::IntTy Remap =
-        ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-    return Loc.getLocWithOffset(Remap);
+    if (Loc.isInvalid())
+      return Loc;
+
+    // It implies that the Loc is already translated.
+    if (SourceMgr.isLoadedSourceLocation(Loc))
+      return Loc;
+
+    return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2);
   }
 
   /// Read a source location.
   SourceLocation ReadSourceLocation(ModuleFile &ModuleFile,
                                     const RecordDataImpl &Record, unsigned &Idx,
                                     LocSeq *Seq = nullptr) {
-    return ReadSourceLocation(ModuleFile, Record[Idx++], Seq);
+    return ReadRawSourceLocation(ModuleFile, Record[Idx++], Seq);
   }
 
   /// Read a FileID.
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index 3ed9803fa3745b..70bf204ed598ef 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -648,6 +648,10 @@ class ASTWriter : public ASTDeserializationListener,
   void AddSourceLocation(SourceLocation Loc, RecordDataImpl &Record,
                          LocSeq *Seq = nullptr);
 
+  /// Return the raw encodings for source locations.
+  SourceLocationEncoding::RawLocEncoding
+  getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq = nullptr);
+
   /// Emit a source range.
   void AddSourceRange(SourceRange Range, RecordDataImpl &Record,
                       LocSeq *Seq = nullptr);
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index bc0aa89966c2b4..ea24b44e5e411b 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -295,10 +295,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// Remapping table for source locations in this module.
-  ContinuousRangeMap<SourceLocation::UIntTy, SourceLocation::IntTy, 2>
-      SLocRemap;
-
   // === Identifiers ===
 
   /// The number of identifiers in this AST file.
diff --git a/clang/include/clang/Serialization/SourceLocationEncoding.h b/clang/include/clang/Serialization/SourceLocationEncoding.h
index 9bb0dbe2e4d6f6..3925b08c1837ed 100644
--- a/clang/include/clang/Serialization/SourceLocationEncoding.h
+++ b/clang/include/clang/Serialization/SourceLocationEncoding.h
@@ -6,23 +6,26 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// Source locations are stored pervasively in the AST, making up a third of
-// the size of typical serialized files. Storing them efficiently is important.
+// We wish to encode the SourceLocation from other module file not dependent
+// on the other module file. So that the source location changes from other
+// module file may not affect the contents of the current module file. Then the
+// users don't need to recompile the whole project due to a new line in a module
+// unit in the root of the dependency graph.
 //
-// We use integers optimized by VBR-encoding, because:
-//  - when abbreviations cannot be used, VBR6 encoding is our only choice
-//  - in the worst case a SourceLocation can be ~any 32-bit number, but in
-//    practice they are highly predictable
+// To achieve this, we need to encode the index of the module file into the
+// encoding of the source location. The encoding of the source location may be:
 //
-// We encode the integer so that likely values encode as small numbers that
-// turn into few VBR chunks:
-//  - the invalid sentinel location is a very common value: it encodes as 0
-//  - the "macro or not" bit is stored at the bottom of the integer
-//    (rather than at the top, as in memory), so macro locations can have
-//    small representations.
-//  - related locations (e.g. of a left and right paren pair) are usually
-//    similar, so when encoding a sequence of locations we store only
-//    differences between successive elements.
+//      |-----------------------|-----------------------|
+//      |          A            |         B         | C |
+//
+//  * A: 32 bit. The index of the module file in the module manager + 1. The +1
+//  here
+//       is necessary since we wish 0 stands for the current module file.
+//  * B: 31 bit. The offset of the source location to the module file containing
+//  it.
+//  * C: The macro bit. We rotate it to the lowest bit so that we can save some
+//  space
+//       in case the index of the module file is 0.
 //
 //===----------------------------------------------------------------------===//
 
@@ -52,11 +55,20 @@ class SourceLocationEncoding {
   friend SourceLocationSequence;
 
 public:
-  static uint64_t encode(SourceLocation Loc,
-                         SourceLocationSequence * = nullptr);
-  static SourceLocation decode(uint64_t, SourceLocationSequence * = nullptr);
+  using RawLocEncoding = uint64_t;
+
+  static RawLocEncoding encode(SourceLocation Loc, UIntTy BaseOffset,
+                               unsigned BaseModuleFileIndex,
+                               SourceLocationSequence * = nullptr);
+  static std::pair<SourceLocation, unsigned>
+  decode(RawLocEncoding, SourceLocationSequence * = nullptr);
 };
 
+/// TODO: Remove SourceLocationSequence since it is not used now.
+/// Since we will put the index for ModuleFile in the high bits in the encodings
+/// for source locations, it is meaningless to reduce the size of source
+/// locations.
+///
 /// Serialized encoding of a sequence of SourceLocations.
 ///
 /// Optimized to produce small values when locations with the sequence are
@@ -149,14 +161,30 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return &Seq; }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-                                               SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+                               unsigned BaseModuleFileIndex,
+                               SourceLocationSequence *Seq) {
+  if (Loc.isInvalid())
+    return 0;
+
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));
+
+  assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;
+  return Encoded;
 }
-inline SourceLocation
-SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) {
-  return Seq ? Seq->decode(Encoded)
-             : SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+inline std::pair<SourceLocation, unsigned>
+SourceLocationEncoding::decode(RawLocEncoding Encoded,
+                               SourceLocationSequence *Seq) {
+  unsigned ModuleFileIndex = Encoded >> 32;
+  Encoded &= ((RawLocEncoding)1 << 33) - 1;
+  SourceLocation Loc = SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+
+  return {Loc, ModuleFileIndex};
 }
 
 } // namespace clang
diff --git a/clang/lib/Frontend/ASTUnit.cpp b/clang/lib/Frontend/ASTUnit.cpp
index 3610a08831e79a..1c655260b09eb5 100644
--- a/clang/lib/Frontend/ASTUnit.cpp
+++ b/clang/lib/Frontend/ASTUnit.cpp
@@ -2373,8 +2373,6 @@ bool ASTUnit::serialize(raw_ostream &OS) {
   return serializeUnit(Writer, Buffer, getSema(), OS);
 }
 
-using SLocRemap = ContinuousRangeMap<unsigned, int, 2>;
-
 void ASTUnit::TranslateStoredDiagnostics(
                           FileManager &FileMgr,
                           SourceManager &SrcMgr,
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 28e8d27fef08c6..625666ead0fb49 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -1645,7 +1645,7 @@ bool ASTReader::ReadSLocEntry(int ID) {
     if (!File)
       return true;
 
-    SourceLocation IncludeLoc = ReadSourceLocation(*F, Record[1]);
+    SourceLocation IncludeLoc = ReadRawSourceLocation(*F, Record[1]);
     if (IncludeLoc.isInvalid() && F->Kind != MK_MainFile) {
       // This is the module's main file.
       IncludeLoc = getImportLocation(F);
@@ -1687,7 +1687,7 @@ bool ASTReader::ReadSLocEntry(int ID) {
     unsigned Offset = Record[0];
     SrcMgr::CharacteristicKind
       FileCharacter = (SrcMgr::CharacteristicKind)Record[2];
-    SourceLocation IncludeLoc = ReadSourceLocation(*F, Record[1]);
+    SourceLocation IncludeLoc = ReadRawSourceLocation(*F, Record[1]);
     if (IncludeLoc.isInvalid() && F->isModule()) {
       IncludeLoc = getImportLocation(F);
     }
@@ -1707,9 +1707,9 @@ bool ASTReader::ReadSLocEntry(int ID) {
 
   case SM_SLOC_EXPANSION_ENTRY: {
     LocSeq::State Seq;
-    SourceLocation SpellingLoc = ReadSourceLocation(*F, Record[1], Seq);
-    SourceLocation ExpansionBegin = ReadSourceLocation(*F, Record[2], Seq);
-    SourceLocation ExpansionEnd = ReadSourceLocation(*F, Record[3], Seq);
+    SourceLocation SpellingLoc = ReadRawSourceLocation(*F, Record[1], Seq);
+    SourceLocation ExpansionBegin = ReadRawSourceLocation(*F, Record[2], Seq);
+    SourceLocation ExpansionEnd = ReadRawSourceLocation(*F, Record[3], Seq);
     SourceMgr.createExpansionLoc(SpellingLoc, ExpansionBegin, ExpansionEnd,
                                  Record[5], Record[4], ID,
                                  BaseOffset + Record[0]);
@@ -3038,8 +3038,10 @@ ASTReader::ReadControlBlock(ModuleFile &F,
         // The import location will be the local one for now; we will adjust
         // all import locations of module imports after the global source
         // location info are setup, in ReadAST.
-        SourceLocation ImportLoc =
+        auto [ImportLoc, ImportModuleFileIndex] =
             ReadUntranslatedSourceLocation(Record[Idx++]);
+        // The import location must belong to the current module file itself.
+        assert(ImportModuleFileIndex == 0);
         off_t StoredSize = !IsImportingStdCXXModule ? (off_t)Record[Idx++] : 0;
         time_t StoredModTime =
             !IsImportingStdCXXModule ? (time_t)Record[Idx++] : 0;
@@ -3658,13 +3660,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
           std::make_pair(SourceManager::MaxLoadedOffset - F.SLocEntryBaseOffset
                            - SLocSpaceSize,&F));
 
-      // Initialize the remapping table.
-      // Invalid stays invalid.
-      F.SLocRemap.insertOrReplace(std::make_pair(0U, 0));
-      // This module. Base was 2 when being compiled.
-      F.SLocRemap.insertOrReplace(std::make_pair(
-          2U, static_cast<SourceLocation::IntTy>(F.SLocEntryBaseOffset - 2)));
-
       TotalNumSLocEntries += F.LocalNumSLocEntries;
       break;
     }
@@ -3941,7 +3936,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       if (Record.size() != 1)
         return llvm::createStringError(std::errc::illegal_byte_sequence,
                                        "invalid pragma optimize record");
-      OptimizeOffPragmaLocation = ReadSourceLocation(F, Record[0]);
+      OptimizeOffPragmaLocation = ReadRawSourceLocation(F, Record[0]);
       break;
 
     case MSSTRUCT_PRAGMA_OPTIONS:
@@ -3957,7 +3952,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
             std::errc::illegal_byte_sequence,
             "invalid pragma pointers to members record");
       PragmaMSPointersToMembersState = Record[0];
-      PointersToMembersPragmaLocation = ReadSourceLocation(F, Record[1]);
+      PointersToMembersPragmaLocation = ReadRawSourceLocation(F, Record[1]);
       break;
 
     case UNUSED_LOCAL_TYPEDEF_NAME_CA...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/86912


More information about the cfe-commits mailing list