[PATCH] D114966: [clang][deps] Split stat and file content caches
    Duncan P. N. Exon Smith via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Wed Jan 12 18:03:46 PST 2022
    
    
  
dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.
Okay, I think this is the last round. Everything looks correct, except a few out-of-date comments.
Two things, one small, one bigger (but not too big I think).
Smaller one is that there's an unnecessary string copy when getting the status.
Bigger one is that I think `CacheShard::ContentsCache` can be deleted, since the map is fully redundant with `CacheShard::EntriesByUID` (most of the inline comments are about that). This is because they are both indexed by UID and created at the same time (two immediately adjacent locks of the same shard mutex with no computation in between). Relatedly, `CachedFileSystemEntry::ContentsAccess` is never modified after construction. Seems reasonable to keep a separate allocation for the `CachedFileContent` since it's fairly big and directories (and cached errors) don't need them, but the `CachedFileSystemEntry` can just point at a raw pointer whose value is never expected to change and we don't need the separate map.
================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:45-52
 /// An in-memory representation of a file system entity that is of interest to
 /// the dependency scanning filesystem.
 ///
 /// It represents one of the following:
 /// - opened file with original contents and a stat value,
 /// - opened file with original contents, minimized contents and a stat value,
 /// - directory entry with its stat value,
----------------
This should mention that it's shared across different filenames, only one per UID now, and the filename is empty in the stat.
================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:132-134
+  /// Non-owning pointer to the file contents. This can be shared between
+  /// multiple entries (e.g. between a symlink and its target).
+  std::atomic<CachedFileContents *> ContentsAccess;
----------------
This comment is out of date, since there's a 1:1 correspondence between CachedFileSystemEntry and CachedFileContents since they're both looked up by UID.
Also, `ContentsAccess` is never modified after construction so it can just be a raw pointer.
Outlining the allocation of `CachedFileContents` might still make sense, since it's big and stat failures (with no content) are probably the common case due to header search patterns... only if we actually create these for stat failures though. Might be worth a comment saying why it's outlined. Maybe it should even be delayed to a follow-up commit to simplify this patch, since now that CachedFileSystemEntry is per-UID it doesn't seem to be required here... but the fields probably need to be made `mutable` regardless somehow so it doesn't seem like a ton of churn.
================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:160-161
+
+    /// Map from unique IDs to cached contents.
+    llvm::DenseMap<llvm::sys::fs::UniqueID, CachedFileContents *> ContentsCache;
+
----------------
I think this map can be deleted, since it's not actually used to deduplicate/share anything that `EntriesByUID` doesn't handle.
================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:193-198
+    /// Returns contents associated with the unique ID if there is some.
+    /// Otherwise, constructs new one with the given buffer, associates it with
+    /// the unique ID and returns the result.
+    CachedFileContents &getOrEmplaceContentsForUID(
+        llvm::sys::fs::UniqueID UID,
+        std::unique_ptr<llvm::MemoryBuffer> OriginalContents);
----------------
I think this can be removed / merged with `getOrEmplaceEntryForUID()`, based on how it's used.
================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:256-259
+    auto Stat = llvm::vfs::Status::copyWithNewName(Entry.getStatus(), Filename);
     if (Stat.isDirectory())
       return Stat;
     return llvm::vfs::Status::copyWithNewSize(Stat, getContents().size());
----------------
I think there's a new/unnecessary std::string copy in the case where `copyWithNewSize()` happens. If the size were fixed first that could be avoided:
```
lang=c++
auto Stat = Entry.getStatus();
if (!Stat.isDirectory())
  Stat = copyWithNewSize(...);
return copyWithNewName(Stat, Filename);
```
================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:161-171
+const CachedFileSystemEntry &
+DependencyScanningFilesystemSharedCache::CacheShard::getOrEmplaceEntryForUID(
+    llvm::sys::fs::UniqueID UID, llvm::vfs::Status Stat,
+    CachedFileContents *Contents) {
+  std::lock_guard<std::mutex> LockGuard(CacheLock);
+  auto Insertion = EntriesByUID.insert({UID, nullptr});
+  if (Insertion.second)
----------------
Looks like the two UID maps are always filled directly after each other. Seems like we can reduce lookups like this.
================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:181-184
+CachedFileContents &
+DependencyScanningFilesystemSharedCache::CacheShard::getOrEmplaceContentsForUID(
+    llvm::sys::fs::UniqueID UID,
+    std::unique_ptr<llvm::MemoryBuffer> OriginalContents) {
----------------
I think this can be merged into getOrEmplaceEntryForUID.
================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:235-245
+const CachedFileSystemEntry &
+DependencyScanningWorkerFilesystem::getOrEmplaceSharedEntryForUID(
+    TentativeEntry TEntry) {
+  auto &Shard = SharedCache.getShardForUID(TEntry.Status.getUniqueID());
+  CachedFileContents *Contents = nullptr;
+  if (TEntry.Contents)
+    Contents = &Shard.getOrEmplaceContentsForUID(TEntry.Status.getUniqueID(),
----------------
I think this can be one call since they're taking the same lock and always done one after the other.
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114966/new/
https://reviews.llvm.org/D114966
    
    
More information about the llvm-commits
mailing list