[PATCH] D115346: [clang][deps] Squash caches for original and minimized files
Duncan P. N. Exon Smith via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 9 14:57:40 PST 2021
dexonsmith added inline comments.
================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:106-108
+ std::unique_ptr<llvm::MemoryBuffer> OriginalContents;
+ std::unique_ptr<llvm::MemoryBuffer> MinimizedContents;
PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > I'm finding it a bit subtle detecting if there are races on access / setting of these, but I think it's correct.
> > - I think I verified that they are "set once".
> > - All the setters are guarded by locks.
> > - The first getter per "local" cache is guarded by a lock.
> > - Subsequent getters are not.
> >
> > The key question: are the subsequent getters ONLY used when the first getter was successful?
> >
> > One way to make it more obvious:
> > ```
> > lang=c++
> > struct ContentWithPPRanges {
> > std::unique_ptr<llvm::MemoryBuffer> Content;
> > PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> > };
> >
> > private:
> > // Always accessed,mutated under a lock. Not mutated after they escape.
> > std::unique_ptr<llvm::MemoryBuffer> Original;
> > std::unique_ptr<llvm::MemoryBuffer> Minimized;
> > PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> >
> > // Used in getters. Pointed-at memory immutable after these are set.
> > std::atomic<const llvm::MemoryBuffer *> OriginalAccess;
> > std::atomic<const llvm::MemoryBuffer *> MinimizedAccess;
> > std::atomic<const PreprocessorSkippedRangeMapping *> PPRangesAccess;
> > ```
> > I don't think this is the only approach though.
> >
> I think there are no races on the original contents. The pointer is unconditionally set on creation of `CachedFileSystemEntry` under a lock that no threads get past without having set the pointer (or failing and never accessing the pointer).
>
> For minimized contents, the latest revision adds check at the beginning of the main function (`needsMinimization`) outside the critical section. There are three paths I can think of:
> * The check returns `true` in thread A (pointer is `null`), thread A enters critical section, minimizes the contents and initializes the pointer.
> * The check returns `true` in thread A, but thread B entered the critical section, minimized contents and initialized the pointer. When thread A enters the critical section, it performs the check again, figures that out and skips minimization.
> * The check returns `false` and the local cache entry is returned.
>
> So there definitely is a race here, but I believe it's benign. Do you agree? Do you think it's worth addressing?
I don't trust myself to evaluate whether it's benign, but if there's no atomic mutation, then I think it's possible that when the setter changes a value from "x" to "y" then the racing reader can see a third value "z". You might gain some confidence by using `-fsanitize=thread` on a workload that's going to include this sort of thing -- probably hard to exercise: PCH already exists, try minimizing something that uses the PCH, and then try minimizing something that doesn't.
I'd rather just avoid the race entirely so we don't need to guess though.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115346/new/
https://reviews.llvm.org/D115346
More information about the llvm-commits
mailing list