[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

Fri Jun 16 22:50:30 PDT 2023

ChuanqiXu added a comment.

> That said, there are two large issues that I think should be addressed in the design (though not necessarily *implemented* now).

Yeah, totally agreed. Design is pretty important especially in open source softwares. I'm pretty open to the suggestions.

---

> support for clang header modules
>
> Google has a large deployment of clangd, serving a codebase that builds significant parts as clang header modules for performance reasons. There is no near-term plan of adopting C++20 modules (multiple reasons, which I probably can't represent well).
> We've been disabling header modules for clangd & other tools for a long time. But it's come to a head, and we expect to get someone working on enabling modules in clangd in ~2 months.
> As I understand it, Meta are in a similar situation (though their solution is to version-lock clangd with the toolchain, keep modules enabled, and accept that some things are broken).
>
> I suspect the answer for header modules is that we can study this patch and understand what the equivalents of graph nodes/deps/names/scanning look like for explicit header modules, and understand that we'll be able to abstractify some names and add some levels of indirection and it'll all work out.

Yeah. On the one hand, I always think that named modules are far different from header modules (including clang header modules and header units) for tools. Since for tools, it makes sense to fallback to traditional inclusion when they saw header units or header modules. For example, 
for clang header modules, IIUC, clangd can work if we rewrite the compile commands to strip a lot of clang header modules related options. And we reuse the rewrite process in the patch. (I know there are some exceptions due to macros). On the other hand, I'm curious about how do you handle/model clang explicit header modules with compilation database? I didn't look about this. Is the headers an entry in the compilation database? And if is it possible for one header to have multiple BMIs? Such problems look not easy to handle so I prefer to fallback it to inclusions.

> Our specific build system setup is that all module-build actions and inputs explicit (-fmodule-file=, -fmodule-map-file=). The build system will not produce usable PCMs due to version skew.
> I know that Meta folks *would* like to make use of available PCMs from the build system.

Also it looks like to me the Meta's method are not  a solution we want. This looks basically what I want for named modules half a year ago. Basically I did nothing and just tried to let clangd fallback to clang to handle everything.

---

> support for large projects
>
> Apart from the concrete scanning of deps, keeping the full project module graph in memory won't always be possible. (It's a perfectly reasonable default implementation though).
>
>> We'll need some abstraction layer (like CompilationDatabase is to compile_commands.json) that exposes enough data to run the algorithms we need, without exposing so much that you have to hold the whole graph in memory. It could be backed by in-memory depscan results, or a build-system artifact, or a live query of the build system.

Yeah, I agree the performance is a key point. In the current design, the ModulesDependencyGraph is only visible to ModulesManager so I feel we left the space for further optimization while I don't have concrete ideas. The current patch only left 5 interfaces for outer users  (currently only TUScheduler) (the interfaces: UpdateNode(PathRef), isReadyToCompile(PathRef), HasInvalidDependencies(PathRef), addCallbackAfterReady(PathRef, std::function<void()>) and GenerateModuleInterfacesFor(PathRef)), which didn't leak any special data to outer users. So I feel while it is OK to adopt ideas to enhance the scaling ability, it is OK to forward now since we've left the space for further optimization. (premature optimization is the root of all evil :) )

> For "every time we update a file, all the affected files (e.g., we changed a header file) will be re-scanned" - we need a way to express this more abstractly than "re-scanned", and more narrowly than "all the affected files" - at least it needs to be limited to all the files that could themselves affect any open file.

Yeah, actually now this patch doesn't make it **explicitly**. Now we achieve this **implicitly** by making `TUScheduler::update` (which will be called every time the TU changes) to call `ModulesManager::UpdateNode(PathRef)`. What I want to do is to let `ModulesManager::UpdateNode(PathRef Node)` to call `TUScheduler::update` for the users of Node.

So here "re-scanned" means to call `TUScheduler::update`. And "all the affected files" is not reflected in the patch actually. Do you have suggestions to improve this or do you feel it is OK to make it implicitly as now?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114