[PATCH] D114966: [clang][deps] Split filesystem caches

Mon Dec 6 09:21:08 PST 2021

jansvoboda11 marked 2 inline comments as done.
jansvoboda11 added a comment.

In D114966#3168108 <https://reviews.llvm.org/D114966#3168108>, @dexonsmith wrote:

> Thanks for working on this; seems like a great start.

Thanks a lot for the extensive feedback! I'll work through it and create prep patches where sensible. It seems like things can be simplified quite a bit.

> At a high-level:
>
> - We should check overhead. It'd be good to benchmark scanning LLVM with `clang-scan-deps` before and after this change.

In my testing, this patch causes ~20% increase in memory usage.

> - The locking is getting harder to track, since the acquisition and release are disconnected. I'd rather use a pattern that kept this simple.

I agree. I think I'll explore the direction you suggested in one of your inline comments.

> - Identified a few pre-existing issues that might be exacerbated by (and/or harder to fix after) this refactor.

That makes sense.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:73
+/// The value type for original contents cache.
+using OriginalContents = Contents;
+
----------------
dexonsmith wrote:
> This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no reason to `memcpy` it into a new allocation.
Fixed in new prep-patch: D115043.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132
+    llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache;
+    /// Cache of original file contents. Using \c std::map to ensure references
+    /// are not invalidated on insertion. 
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
----------------
dexonsmith wrote:
> You don't need the heavyweight std::map for reference validation. You can just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive due to allocation traffic, but it's still cheaper than a `std::map`.
> 
> But you can also avoid the allocation traffic by using a BumpPtrAllocator, the same pattern as the StringMap above. E.g.:
> ```
> lang=c++
> llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc;
> llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> OriginalContentsCache;
> 
> // insert into shard:
> OriginalContents &getOriginalContentContainer(...) {
>   std::scoped_lock<std::mutex> L(CacheMutex);
>   OriginalContents *&OC = OriginalContents[UID];
>   if (!OC)
>     OC = new (OriginalContentsAlloc) OriginalContents;
>   return *OC;
> }
> 
> // get original content:
> StringRef getOriginalContentBuffer(...) {
>   OriginalContents &OC = getOriginalContentContainer(...);
>   if (OC.IsInitialized)
>     return OC->Content->getBuffer();
> 
>   // Could put this after the lock I guess...
>   std::unique_ptr<MemoryBuffer> Content = readFile(...);
> 
>   // check IsInitialized again after locking in case there's a race
>   std::scoped_lock<std::mutex> L(SharedStat.Mutex);
>   if (OC->IsInitialized)
>     return OC->Content->getBuffer();
> 
>   OC->Content = std::move(Content);
>   OC->IsInitialized = true;
>   return OC->Content->getBuffer();
> }
> ```
> Same pattern for minimized content cache. Since the locks are only held briefly there's no need to pass them around and lose clarity about how long it's open. Also, IIRC, `std::unique_lock` is more expensive than `std::scoped_lock` (but my memory could be faulty).
> 
> 
I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, thanks for the suggestion!

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132
+    llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache;
+    /// Cache of original file contents. Using \c std::map to ensure references
+    /// are not invalidated on insertion. 
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > You don't need the heavyweight std::map for reference validation. You can just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive due to allocation traffic, but it's still cheaper than a `std::map`.
> > 
> > But you can also avoid the allocation traffic by using a BumpPtrAllocator, the same pattern as the StringMap above. E.g.:
> > ```
> > lang=c++
> > llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc;
> > llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> OriginalContentsCache;
> > 
> > // insert into shard:
> > OriginalContents &getOriginalContentContainer(...) {
> >   std::scoped_lock<std::mutex> L(CacheMutex);
> >   OriginalContents *&OC = OriginalContents[UID];
> >   if (!OC)
> >     OC = new (OriginalContentsAlloc) OriginalContents;
> >   return *OC;
> > }
> > 
> > // get original content:
> > StringRef getOriginalContentBuffer(...) {
> >   OriginalContents &OC = getOriginalContentContainer(...);
> >   if (OC.IsInitialized)
> >     return OC->Content->getBuffer();
> > 
> >   // Could put this after the lock I guess...
> >   std::unique_ptr<MemoryBuffer> Content = readFile(...);
> > 
> >   // check IsInitialized again after locking in case there's a race
> >   std::scoped_lock<std::mutex> L(SharedStat.Mutex);
> >   if (OC->IsInitialized)
> >     return OC->Content->getBuffer();
> > 
> >   OC->Content = std::move(Content);
> >   OC->IsInitialized = true;
> >   return OC->Content->getBuffer();
> > }
> > ```
> > Same pattern for minimized content cache. Since the locks are only held briefly there's no need to pass them around and lose clarity about how long it's open. Also, IIRC, `std::unique_lock` is more expensive than `std::scoped_lock` (but my memory could be faulty).
> > 
> > 
> I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, thanks for the suggestion!
Yeah, `scoped_lock` should be cheaper, I'll create a prep-patch for that.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:133-135
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
+    /// Cache of minimized file contents.
+    std::map<llvm::sys::fs::UniqueID, MinimizedContents> MinimizedContentsCache;
----------------
dexonsmith wrote:
> I wonder if these should really be separate.
> 
> Seems better to have something like:
> ```
> lang=c++
> struct SharedContent {
>   // Even better: unique_atomic_ptr<MemoryBuffer>, to enable lock-free access/updates.
>   atomic<bool> HasOriginal;
>   std::unique_ptr<MemoryBuffer> Original; 
> 
>   // Even better: std::atomic<MinimizedContent *>, with the latter bumpptrallocated, to
>   // enable lock-free access/updates.
>   atomic<bool> HasMinimized;
>   SmallString<0> Minimized; // would be nice to bumpptrallocate this string...
>   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> };
> SpecificBumpPtrAllocator<SharedCachedContent> ContentAlloc;
> DenseMap<llvm::sys::fs::UniqueID, SharedCachedContent *> ContentCache;
> ```
> With that in place, seems like the `SharedStat` can have `std::atomic<SharedContent *>`, which caches the result of the UID lookup. This way the UID `DenseMap` lookup is once per stat name, saving reducing contention on the per-shard lock.
> 
> Then in the local cache, the only map storage would be:
> ```
> llvm::StringMap<SharedStat *, llvm::BumpPtrAllocator> LocalStatCache;
> ```
> No need to duplicate the UID-keyed caches, since the lookups there would set the pointer for the SharedContent.
I really like idea of keeping a pointer to `SharedContent` in `SharedStat` and avoiding locking & lookup in the content caches. Merging original and minimized contents would probably simplify things quite a bit as well.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:249
+
+  ensureLocked(Filename, SharedStat, Lock);

----------------
dexonsmith wrote:
> I don't love the lack of clarity between when the lock is taken and when it's released caused by this being an out parameter. I don't have a specific suggestion, but maybe there's another way to factor the code overall?
Fair point, I'll try to simplify this.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:253-255
+  if (SharedOriginalContents.WasInserted)
+    SharedStat->Stat =
+        readFile(Filename, getUnderlyingFS(), SharedOriginalContents.Value);
----------------
dexonsmith wrote:
> Calling `readFile()` behind a lock doesn't seem great. I did confirm that the original code seems to do the same thing (lock outside of `createFilesystemEntry`), but this refactor seems to bake the pattern into a few more places.
> 
> When races aren't very likely it's usually cheaper to:
> - lock to check cache, returning cached result if so
> - without a lock, compute result
> - lock to set cache, but if the cache has been filled in the meantime by another thread, return that and throw out the just-computed one
> 
> Maybe it'd be useful to add:
> ```
> std::atomic<bool> IsInitialized;
> ```
> to the MinimizedContents and OriginalContents structures stored in the shared cache. This could make it easier to decouple insertion in the shared cache from initialization. I.e., it'd be safe to release the lock while doing work; another thread won't think the default-constructed contents are correct.
Could you expand on this a bit more? If we have a lock for each file, how is locking, reading, unlocking slower than locking, unlocking, reading, locking, unlocking?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114966/new/

https://reviews.llvm.org/D114966