[PATCH] D158469: [clang][deps] Compute command lines and file deps on-demand

Wed Aug 23 16:45:21 PDT 2023

jansvoboda11 added a comment.

In D158469#4612012 <https://reviews.llvm.org/D158469#4612012>, @benlangmuir wrote:

>> I tried that approach, but found it way too easy to keep `ModuleDeps` around, which keep scanning instances alive, and use tons of memory.
>
> It seems like the problem (too easy to keep around MD) is the same either way, it's just instead of wasting memory it's leaving dangling pointers that could cause UAF if they're actually used.

You're right. I have a version of this patch that stores `std::weak_ptr<ModuleDepCollector> MDC` in `ModuleDeps` and asserts when you're about to dereference dangling pointer. That might solve that part of the issue.

> Where were you seeing MD held too long? I wonder if there's another way to fix that.

For example in `clang-scan-deps`, we accumulate all `ModuleDeps` into `FullDeps::Modules`. This means that at the end of the scan, that can be keeping up to `NumTUs` worth of `CompilerInvocations` alive.

>> Hmm, maybe we could avoid holding on to the whole CompilerInstance.
>
> This seems promising!

OK, I'll try to explore this a little further. I'm a bit concerned that `FileManager` (and its caches) might still be too heavy to keep around, though.

---

I guess what makes this harder to get right is the fact that `ModuleDeps` and `ModuleDepsGraph` live on different levels. I think that maybe restructuring the API a little bit could simplify things. What I have in mind is replacing the `DependencyConsumer` function `handleModuleDependency(ModuleDeps)` with `handleModuleDepsGraph(ModuleDepsGraph)`. We could then turn `ModuleDepsGraph` into an iterator over proxy objects representing what's currently called `ModuleDeps`. (Lifetime of these proxies would be clearly tied to that of `ModuleDepsGraph`.) I think we can assume that scanner clients are less tempted to "permanently" store the full `ModuleDepsGraph` (since that's wasteful due to cross-TU sharing), compared to `ModuleDeps` (which you probably want to store in some form). Clients would therefore be nudged into creating own storage-friendly version of `ModuleDeps` before disposing of `ModuleDepsGraph`, which would force them to call `getBuildArguments()` and `getFileDeps()` on our `ModuleDeps` proxy before disposing of the heavy-weight graph. WDYT about this direction?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158469/new/

https://reviews.llvm.org/D158469