[PATCH] D72860: [modules] Do not cache invalid state for modules that we attempted to load.

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 5 21:47:28 PST 2020


rsmith added inline comments.


================
Comment at: clang/lib/Serialization/ModuleManager.cpp:183
       // Get a buffer of the file and close the file descriptor when done.
-      Buf = FileMgr.getBufferForFile(NewModule->File, /*isVolatile=*/false);
+      Buf = FileMgr.getBufferForFile(NewModule->File, /*isVolatile=*/true);
     }
----------------
vsapsai wrote:
> dblaikie wrote:
> > vsapsai wrote:
> > > dexonsmith wrote:
> > > > rsmith wrote:
> > > > > dexonsmith wrote:
> > > > > > vsapsai wrote:
> > > > > > > Made this change because if we don't have a valid module but opened a corresponding .pcm file earlier, there is a high chance that .pcm file was rebuilt.
> > > > > > Please add a comment in the code explaining that.
> > > > > This change is proving really bad for us. This prevents using `mmap` for loading module files, and instead forces the entire file to be loaded every time. Please revert.
> > > > Can we limit the revert to explicit vs. implicit module builds?  The scenario Volodymyr was defending against is implicit-only.
> > > Richard, can you please tell what is your modules configuration and how you are invoking clang? For implicit builds loading a module instead of `mmap` is still better than building a module. But for explicit modules I see there is no need to use `isVolatile` as modules aren't changing all the time.
> > Richard/Google use explicit modules, if that's the particular parameter you're asking about.
> > 
> > So, yes, for Google's needs a solution that allows mmap to continue to be used for explicit modules would suffice, I believe.
> > 
> > (not to in any way derail that from being addressed - I thought implicit modules weren't race-y - they're hashed, etc, so they shouldn't collide/be overwritten? Seems like a loss in performance to have to copy the whole thing into memory rather than using mmap, if that could be avoided by more precise hashing/collision avoidance?)
> I need a specific command-line invocation to test with. If tests in `clang/test/Modules/explicit*` are relevant to Google's setup, I can use those as an example. It's just they were touched last time in 2016.
> 
> To discuss racy-ness, that is kinda inherent problem for incremental builds. If a module becomes invalid, all of its users will try to get an up-to-date version of it. And they can try to build that module themselves or check and load already rebuilt module. Unfortunately, I don't see how different hashing can help with it, after all multiple compiler invocations do want the same .pcm file.
Preventing the use of `mmap` seems unacceptable in general -- for either implicit or explicit modules. Anything that forces compile time to be linear in the total size of transitive modules used is not reasonable. To really see the problems here you'll need a real world testcase with thousands to tens of thousands of modules, which I can't reasonably give you.

There should be ways we can avoid problems with racyness. Perhaps we could switch to a content-addressed module file store?
In any case, this patch has introduced a regression, so we should revert this change while we work out what to do.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72860/new/

https://reviews.llvm.org/D72860





More information about the cfe-commits mailing list