[PATCH] D143414: [clang] refactor FileManager::GetUniqueIDMapping

Jan Svoboda via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 6 11:02:35 PST 2023


jansvoboda11 added inline comments.


================
Comment at: clang/lib/Basic/FileManager.cpp:625-627
+        Entries.push_back(
+            FileEntryRef(*reinterpret_cast<const FileEntryRef::MapEntry *>(
+                Value->V.get<const void *>())));
----------------
rmaz wrote:
> jansvoboda11 wrote:
> > Why is this necessary? IIUC, `FileEntryRef` has this logic in `getBaseMapEntry()`. I would expect `FileEntryRef(Entry)` to be correct regardless of what's in `Value->V`.
> I borrowed this from: https://github.com/llvm/llvm-project/blob/main/clang/lib/Basic/FileManager.cpp#L399-L408
> 
> I assumed this was necessary from @benlangmuir 's comment on D142780:
> 
> > Also, it will never give you a vfs mapped path since it's skipping those (V.dyn_cast<FileEntry *>()).
> 
> Are you saying we should just do something like:
> ```
> for (const auto &Entry : SeenFileEntries) {
>     if (llvm::ErrorOr<FileEntryRef::MapValue> Value = Entry.getValue())
>         Entries.push_back(FileEntryRef(Entry));
> }
> ```
> ?
Oh, okay, so the goal is to skip VFS-mapped file entries and only serialize the on-disk ones into PCMs. That makes sense given we do actual `FileEntry` comparison in `HeaderFileInfoTrait::EqualKey()` (instead of plain comparison of filenames that I assumed). I think that for VFS-mapped entries, we always have a separate element in the `SeenFileEntries` map that represents the on-disk file entry. So we should be skipping the VFS-mapped entries entirely instead of considering their remapping target entries again.

If my understanding is correct, we should do something like this:

```
    if (llvm::ErrorOr<FileEntryRef::MapValue> Value = Entry.getValue())
      // Ignore VFS-mapped entries.
      if (Value->V.dyn_cast<FileEntry *>())
        Entries.push_back(FileEntryRef(Entry));
```


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:176
 
+  for (auto &File : FileEntries) {
     const HeaderFileInfo *HFI =
----------------
rmaz wrote:
> jansvoboda11 wrote:
> > I'm afraid that iterating over file entries in "non-deterministic" order can cause non-determinism down the line. For example, calling `HeaderSearch::findAllModulesForHeader()` in the loop can trigger deserialization of `HeaderFileInfo` from loaded PCMs. That code fills the `Module::Headers` vectors, which we serialize without sorting first.
> Do you think we should move the sorting into the `getSeenFileEntries()` method?
I think so, yes.


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1939
     // Massage the file path into an appropriate form.
     StringRef Filename = File->getName();
     SmallString<128> FilenameTmp(Filename);
----------------
rmaz wrote:
> jansvoboda11 wrote:
> > This will insert duplicate entries (with the same key) into the on-disk hash table. I don't know if that's problematic, just thought I'd call it out.
> Not sure I follow, why would we have `FileEntry`s with duplicate names?
Sorry, I had assumed we only do lookups in the on-disk hash table based on filenames (instead of full `FileEntry` comparison) and that for that reason, we should serialize all `FileEntryRef` into the PCM, even the VFS-mapped ones. But I now see that you're trying to only serialize the on-disk `FileEntryRef`s, which will not cause duplication. Ignore my comment here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143414/new/

https://reviews.llvm.org/D143414



More information about the cfe-commits mailing list