[PATCH] D115346: [clang][deps] Squash caches for original and minimized files

Jan Svoboda via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 9 07:53:05 PST 2021


jansvoboda11 marked 3 inline comments as done.
jansvoboda11 added inline comments.


================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:50-51
+
+  /// Minimize contents of the file.
+  static void minimize(CachedFileSystemEntry &Entry);
 
----------------
dexonsmith wrote:
> Might be more natural as a non-static:
> ```
> lang=c++
> void minimize();
> ```
> 
Agreed. I originally wanted to minimize diff of the implementation (by being able to refer to `Result` as before), but I'm happy making this an instance method.


================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:69-75
+  llvm::ErrorOr<StringRef> getContents(bool Minimized) const {
+    assert(isValid() && "not initialized");
     if (!MaybeStat)
       return MaybeStat.getError();
     assert(!MaybeStat->isDirectory() && "not a file");
-    assert(isValid() && "not initialized");
-    return Contents->getBuffer();
+    return (Minimized ? MinimizedContents : OriginalContents)->getBuffer();
   }
----------------
dexonsmith wrote:
> I *think* this is thread-safe -- since `Minimized` should be the same as when the local cache entry was created -- but it's a bit subtle.
> 
> The problematic case I am worried about:
> - First use in local cache is non-minimized.
>     - Creates shared cache entry that's not minimized.
> - Some other local cache wants it to be minimized.
> - Later use in local cache is minimized.
>     - Accessing `Minimized` pointer races with the other thread setting it.
> 
> If the local cache is guaranteed to always pass the same value for `Minimized` as when it fist accessed it, then there shouldn't be a problem...
> 
> I wonder if there's a way to make it less subtle?
I extracted the logic into `EntryRef` which now gets returned from the main function. It carries the `bool Minimized` bit and makes reasoning about this more local. WDYT?


================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:106-108
+  std::unique_ptr<llvm::MemoryBuffer> OriginalContents;
+  std::unique_ptr<llvm::MemoryBuffer> MinimizedContents;
   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
----------------
dexonsmith wrote:
> I'm finding it a bit subtle detecting if there are races on access / setting of these, but I think it's correct.
> - I think I verified that they are "set once".
> - All the setters are guarded by locks.
> - The first getter per "local" cache is guarded by a lock.
> - Subsequent getters are not.
> 
> The key question: are the subsequent getters ONLY used when the first getter was successful?
> 
> One way to make it more obvious:
> ```
> lang=c++
>   struct ContentWithPPRanges {
>     std::unique_ptr<llvm::MemoryBuffer> Content;
>     PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
>   };
> 
> private:
>   // Always accessed,mutated under a lock. Not mutated after they escape.
>   std::unique_ptr<llvm::MemoryBuffer> Original;
>   std::unique_ptr<llvm::MemoryBuffer> Minimized;
>   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> 
>   // Used in getters. Pointed-at memory immutable after these are set.
>   std::atomic<const llvm::MemoryBuffer *> OriginalAccess;
>   std::atomic<const llvm::MemoryBuffer *> MinimizedAccess;
>   std::atomic<const PreprocessorSkippedRangeMapping *> PPRangesAccess;
> ```
> I don't think this is the only approach though.
> 
I think there are no races on the original contents. The pointer is unconditionally set on creation of `CachedFileSystemEntry` under a lock that no threads get past without having set the pointer (or failing and never accessing the pointer).

For minimized contents, the latest revision adds check at the beginning of the main function (`needsMinimization`) outside the critical section. There are three paths I can think of:
* The check returns `true` in thread A (pointer is `null`), thread A enters critical section, minimizes the contents and initializes the pointer.
* The check returns `true` in thread A, but thread B entered the critical section, minimized contents and initialized the pointer. When thread A enters the critical section, it performs the check again, figures that out and skips minimization.
* The check returns `false` and the local cache entry is returned.

So there definitely is a race here, but I believe it's benign. Do you agree? Do you think it's worth addressing?


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:253
+  if (Entry.isMinimized()) {
+    if (!Entry.getPPSkippedRangeMapping().empty() && PPSkipMappings)
+      (*PPSkipMappings)[Result->Buffer->getBufferStart()] =
----------------
Maybe the `isMinimized()` check should be performed internally in `EntryRef` when accessing `getPPSkippedRangeMapping()` to make the race avoidance more local...


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:189-191
+    if (ShouldBeMinimized && CacheEntry.hasOriginalContents() &&
+        !CacheEntry.hasMinimizedContents())
+      CachedFileSystemEntry::minimize(CacheEntry);
----------------
dexonsmith wrote:
> Is the DependencyScanningWorkerFilesystem guaranteed to always want either minimized or not minimized?
> 
> IOW, if the same filesystem is reused, and the first time only the original file was needed and later the minimized was needed, I don't see the code path for minimizing the file later. But maybe reusing one of these FSs is not supported?
You're right, minimization of a particular file can be turned on/off on the fly. I already have a test for this in D114966 but somehow dropped it when extracting this patch. Fixed in the latest revision.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115346/new/

https://reviews.llvm.org/D115346



More information about the llvm-commits mailing list