[PATCH] D114966: [clang][deps] Split filesystem caches

Thu Dec 2 13:55:11 PST 2021

dexonsmith added a comment.

Thanks for working on this; seems like a great start. At a high-level:

- We should check overhead. It'd be good to benchmark scanning LLVM with `clang-scan-deps` before and after this change.
- The locking is getting harder to track, since the acquisition and release are disconnected. I'd rather use a pattern that kept this simple.
- Identified a few pre-existing issues that might be exacerbated by (and/or harder to fix after) this refactor.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:38
+  /// The (potentially minimized) file contents.
+  const llvm::SmallString<1> *Contents;
+  /// The skipped range mappings produced during minimization.
----------------
I think this should be a StringRef (or MemoryBufferRef, which you can construct from two StringRefs, but given that the name is already in the `Status` object probably not useful).

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:68-69
+
+// Small size of 1 allows us to store an empty string with an implicit null
+// terminator without any allocations.
+using Contents = llvm::SmallString<1>;
----------------
dexonsmith wrote:
> I find these two typedefs a bit obfuscating. I see that they might provide some benefit in the patch as-is because of the imposed requirement that returned result uses a pointer to a `SmallString<1>`; as such it's important that the type be identical.
> - Instead, it should use a `StringRef` to avoid depending on storage (already commented above).
> - Even if not for that, it could/should use a `SmallVectorImpl<char>` to avoid imposing a specific requirement on the small size.
> 
> Then `Contents` and `OriginalContents` can be skipped (the latter becoming `std::unique_ptr<llvm::MemoryBuffer>`, but without the obfuscation of a typedef).
Doesn't seem to be a good reason to save a null string. Just use a `StringRef()`.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:68-70
+// Small size of 1 allows us to store an empty string with an implicit null
+// terminator without any allocations.
+using Contents = llvm::SmallString<1>;
----------------
I find these two typedefs a bit obfuscating. I see that they might provide some benefit in the patch as-is because of the imposed requirement that returned result uses a pointer to a `SmallString<1>`; as such it's important that the type be identical.
- Instead, it should use a `StringRef` to avoid depending on storage (already commented above).
- Even if not for that, it could/should use a `SmallVectorImpl<char>` to avoid imposing a specific requirement on the small size.

Then `Contents` and `OriginalContents` can be skipped (the latter becoming `std::unique_ptr<llvm::MemoryBuffer>`, but without the obfuscation of a typedef).

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:73
+/// The value type for original contents cache.
+using OriginalContents = Contents;
+
----------------
This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no reason to `memcpy` it into a new allocation.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:78
+  /// The minimized contents itself.
+  Contents Contents;
   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
----------------
name/field match is a bit confusing. I'm not sure the typedef is buying much here.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132
+    llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache;
+    /// Cache of original file contents. Using \c std::map to ensure references
+    /// are not invalidated on insertion. 
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
----------------
You don't need the heavyweight std::map for reference validation. You can just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive due to allocation traffic, but it's still cheaper than a `std::map`.

But you can also avoid the allocation traffic by using a BumpPtrAllocator, the same pattern as the StringMap above. E.g.:
```
lang=c++
llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc;
llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> OriginalContentsCache;

// insert into shard:
OriginalContents &getOriginalContentContainer(...) {
  std::scoped_lock<std::mutex> L(CacheMutex);
  OriginalContents *&OC = OriginalContents[UID];
  if (!OC)
    OC = new (OriginalContentsAlloc) OriginalContents;
  return *OC;
}

// get original content:
StringRef getOriginalContentBuffer(...) {
  OriginalContents &OC = getOriginalContentContainer(...);
  if (OC.IsInitialized)
    return OC->Content->getBuffer();

  // Could put this after the lock I guess...
  std::unique_ptr<MemoryBuffer> Content = readFile(...);

  // check IsInitialized again after locking in case there's a race
  std::scoped_lock<std::mutex> L(SharedStat.Mutex);
  if (OC->IsInitialized)
    return OC->Content->getBuffer();

  OC->Content = std::move(Content);
  OC->IsInitialized = true;
  return OC->Content->getBuffer();
}
```
Same pattern for minimized content cache. Since the locks are only held briefly there's no need to pass them around and lose clarity about how long it's open. Also, IIRC, `std::unique_lock` is more expensive than `std::scoped_lock` (but my memory could be faulty).

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:133-135
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
+    /// Cache of minimized file contents.
+    std::map<llvm::sys::fs::UniqueID, MinimizedContents> MinimizedContentsCache;
----------------
I wonder if these should really be separate.

Seems better to have something like:
```
lang=c++
struct SharedContent {
  // Even better: unique_atomic_ptr<MemoryBuffer>, to enable lock-free access/updates.
  atomic<bool> HasOriginal;
  std::unique_ptr<MemoryBuffer> Original; 

  // Even better: std::atomic<MinimizedContent *>, with the latter bumpptrallocated, to
  // enable lock-free access/updates.
  atomic<bool> HasMinimized;
  SmallString<0> Minimized; // would be nice to bumpptrallocate this string...
  PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
};
SpecificBumpPtrAllocator<SharedCachedContent> ContentAlloc;
DenseMap<llvm::sys::fs::UniqueID, SharedCachedContent *> ContentCache;
```
With that in place, seems like the `SharedStat` can have `std::atomic<SharedContent *>`, which caches the result of the UID lookup. This way the UID `DenseMap` lookup is once per stat name, saving reducing contention on the per-shard lock.

Then in the local cache, the only map storage would be:
```
llvm::StringMap<SharedStat *, llvm::BumpPtrAllocator> LocalStatCache;
```
No need to duplicate the UID-keyed caches, since the lookups there would set the pointer for the SharedContent.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:25-31
   llvm::ErrorOr<llvm::vfs::Status> Stat = (*MaybeFile)->status();
   if (!Stat)
     return Stat.getError();

   llvm::vfs::File &F = **MaybeFile;
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MaybeBuffer =
       F.getBuffer(Stat->getName());
----------------
Is there a potential (already existing) race condition here? Can't the file change between the stat and opening the buffer?

Seems like either:
- The `Stat` should be updated to have the observed size of the buffer.
- An error should be returned if the size doesn't match.
- The stat and/or read should be retried until they match.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:36-38
+  SharedOriginalContents.reserve(Buffer->getBufferSize() + 1);
+  SharedOriginalContents.append(Buffer->getBufferStart(),
+                                Buffer->getBufferEnd());
----------------
This will introduce a memory regression in the common case where there are no PCHs.
- Previously, only minimized files were saved in memory. These are relatively small, so probably no big deal.
- Now, the original file is being saved as well. These are not small.

Instead, the MemoryBuffer should be saved directly.
- For large files whose size isn't on a page boundary, this will be an `mmap`. This doesn't count against process memory because the kernel can optimize this easily, such as by sharing between processes (e.g., with actual compilation).
- For large files on page boundaries, there was already a memcpy done in order to make this null-terminated. No reason to do that again here.
- For small files, this is already a buffer on the heap... the extra memcpy and allocation probably doesn't matter all that much, but the large file case is worth optimizing for.

This wasteful was already around when files weren't being minimized files, but it's going to use a lot more memory now that original files are stored even when they're going to be minimized.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:249
+
+  ensureLocked(Filename, SharedStat, Lock);

----------------
I don't love the lack of clarity between when the lock is taken and when it's released caused by this being an out parameter. I don't have a specific suggestion, but maybe there's another way to factor the code overall?

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:253-255
+  if (SharedOriginalContents.WasInserted)
+    SharedStat->Stat =
+        readFile(Filename, getUnderlyingFS(), SharedOriginalContents.Value);
----------------
Calling `readFile()` behind a lock doesn't seem great. I did confirm that the original code seems to do the same thing (lock outside of `createFilesystemEntry`), but this refactor seems to bake the pattern into a few more places.

When races aren't very likely it's usually cheaper to:
- lock to check cache, returning cached result if so
- without a lock, compute result
- lock to set cache, but if the cache has been filled in the meantime by another thread, return that and throw out the just-computed one

Maybe it'd be useful to add:
```
std::atomic<bool> IsInitialized;
```
to the MinimizedContents and OriginalContents structures stored in the shared cache. This could make it easier to decouple insertion in the shared cache from initialization. I.e., it'd be safe to release the lock while doing work; another thread won't think the default-constructed contents are correct.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114966/new/

https://reviews.llvm.org/D114966