[PATCH] D115346: [clang][deps] Squash caches for original and minimized files

Duncan P. N. Exon Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 8 14:51:00 PST 2021


dexonsmith added a comment.

This looks really nice.

One problem is that it's subtle to confirm whether it's thread-safe. The local cache access parts of CachedFileSystemEntry lock-free, but the CachedFileSystemEntry might get changed later (i.e., filled in with minimized content). Are accessed fields guaranteed by context not to be the ones that mutate later?



================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:50-51
+
+  /// Minimize contents of the file.
+  static void minimize(CachedFileSystemEntry &Entry);
 
----------------
Might be more natural as a non-static:
```
lang=c++
void minimize();
```



================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:69-75
+  llvm::ErrorOr<StringRef> getContents(bool Minimized) const {
+    assert(isValid() && "not initialized");
     if (!MaybeStat)
       return MaybeStat.getError();
     assert(!MaybeStat->isDirectory() && "not a file");
-    assert(isValid() && "not initialized");
-    return Contents->getBuffer();
+    return (Minimized ? MinimizedContents : OriginalContents)->getBuffer();
   }
----------------
I *think* this is thread-safe -- since `Minimized` should be the same as when the local cache entry was created -- but it's a bit subtle.

The problematic case I am worried about:
- First use in local cache is non-minimized.
    - Creates shared cache entry that's not minimized.
- Some other local cache wants it to be minimized.
- Later use in local cache is minimized.
    - Accessing `Minimized` pointer races with the other thread setting it.

If the local cache is guaranteed to always pass the same value for `Minimized` as when it fist accessed it, then there shouldn't be a problem...

I wonder if there's a way to make it less subtle?


================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:106-108
+  std::unique_ptr<llvm::MemoryBuffer> OriginalContents;
+  std::unique_ptr<llvm::MemoryBuffer> MinimizedContents;
   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
----------------
I'm finding it a bit subtle detecting if there are races on access / setting of these, but I think it's correct.
- I think I verified that they are "set once".
- All the setters are guarded by locks.
- The first getter per "local" cache is guarded by a lock.
- Subsequent getters are not.

The key question: are the subsequent getters ONLY used when the first getter was successful?

One way to make it more obvious:
```
lang=c++
  struct ContentWithPPRanges {
    std::unique_ptr<llvm::MemoryBuffer> Content;
    PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
  };

private:
  // Always accessed,mutated under a lock. Not mutated after they escape.
  std::unique_ptr<llvm::MemoryBuffer> Original;
  std::unique_ptr<llvm::MemoryBuffer> Minimized;
  PreprocessorSkippedRangeMapping PPSkippedRangeMapping;

  // Used in getters. Pointed-at memory immutable after these are set.
  std::atomic<const llvm::MemoryBuffer *> OriginalAccess;
  std::atomic<const llvm::MemoryBuffer *> MinimizedAccess;
  std::atomic<const PreprocessorSkippedRangeMapping *> PPRangesAccess;
```
I don't think this is the only approach though.



================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:189-191
+    if (ShouldBeMinimized && CacheEntry.hasOriginalContents() &&
+        !CacheEntry.hasMinimizedContents())
+      CachedFileSystemEntry::minimize(CacheEntry);
----------------
Is the DependencyScanningWorkerFilesystem guaranteed to always want either minimized or not minimized?

IOW, if the same filesystem is reused, and the first time only the original file was needed and later the minimized was needed, I don't see the code path for minimizing the file later. But maybe reusing one of these FSs is not supported?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115346/new/

https://reviews.llvm.org/D115346



More information about the llvm-commits mailing list