[PATCH] D140176: [clang][deps] NFC: Split out the module-based API from the TU-based API

Fri Dec 16 13:56:11 PST 2022

artemcm accepted this revision.
artemcm added inline comments.
This revision is now accepted and ready to land.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h:111
+  /// information needed for explicit build.
+  llvm::Expected<ModuleDepsGraph>
+  getModuleDependencies(StringRef ModuleName,
----------------
jansvoboda11 wrote:
> artemcm wrote:
> > The existing [[ https://github.com/apple/swift/blob/main/lib/ClangImporter/ClangModuleDependencyScanner.cpp#L247 | client ]] of the `ModuleName` API queries a single module and it's nice to get a structured output of a `FullDependenciesResult` for the module being queried itself, and get its dependencies from that. Whereas now we will just be getting a sequence of `ModuleDeps`s. So just to confirm, if a client queries `getModuleDependencies("Foo", ...)`, the resulting `ModuleDepsGraph` will contain a `ModuleDeps` for `Foo` itself also, and not just `Foo`'s dependencies? 
> > 
> > Still, the client will now need to traverse this graph to find `Foo` in particular, and it'd be nice to not have to do that. `TranslationUnitDeps` is not the right thing here, but perhaps something simpler could work, along the lines of:
> > ```
> > struct SingleModuleDeps {
> >   ModuleDeps mainQueriedModule
> > 
> >   /// The graph of direct and transitive modular dependencies of mainQueriedModule
> >   ModuleDepsGraph ModuleGraph;
> > };
> > ```
> > What do you think?
> Yes, `ModuleDepsGraph` will contain both the dependencies and `Foo` itself.
> 
> Looking at the client code, it doesn't look like it's making any use of the structured nature of `FullDependenciesResult`, right? The unused `FullDependenciesResult::FullDeps` field doesn't hold anything useful ATM and `FullDependenciesResult::DiscoveredModules` is equivalent to the new `ModuleDepsGraph`. Your client iterates over the whole (flat) dependency graph, creates a cache (map from the module name to its information), and then looks up the queried module in the map.
> 
> I'm not sure the suggested return type is buying you much. You probably still need to iterate over the whole graph to create the associative cache. (I guess you might save one `insert` and `find`, since we'd give you the root explicitly.) If knowing information on the queried module enables something more interesting, I propose this:
> 
> ```struct SingleModuleDeps {
>   ModuleID RootModuleID;
>   std::unordered_map<ModuleID, ModuleDeps> ModuleGraph;
> };
> ```
> 
> This gives you the option to look up information on the root module in constant time, but has the advantage of not "splitting" the dependency graph into two parts (the queried module and the rest). I think this might be useful in the future. Let's say your client learns it needs dependencies of a bunch of Clang modules. It could send them all to the Clang scanner at once, which might have some advantages: better parallelism and returning single dependency graph (less allocations and data copies if parts are shared). If we stored `ModuleDeps` for the queried modules outside of the rest of the `ModuleGraph`, it becomes hard to model dependencies between the queried modules.
> 
> Let me know what you think. I'm happy to do this, but maybe in a follow up patch?
I like your proposed `SingleModuleDeps` design, and think that it'd be useful to both be able to identify the root in constant time and the potential followup use-cases that could benefit from it sound nice too. Doing it in a followup patch sounds just fine. Thanks! 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140176/new/

https://reviews.llvm.org/D140176