[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Aug 24 04:47:00 PDT 2023


sammccall added a comment.

Aside: I've been doing some investigation into how modules+clangd could work in our huge monorepo (specifically bazel + distributed build cluster).
It looks feasible (with some serious effort) to get all BMI/index/etc data we need for transitive modules to be generated by a copy of clangd running in the distributed build system itself, and likely this will perform better than anything involving building in-process.
So it seems plausible to add a coarse-grained extension point for this, and assume that the "find-and-build-modules" part we're adding now doesn't have to scale to that size. (We'd still like it to scale to projects the size of Chromium, which is roughly 10x LLVM and our usual proxy for "large local project"). I think we can make some simplifying assumptions even beyond experimental stage:

- the module graph will fit in memory
- the compilation database will fit in memory
- the set of files in the project is enumerable
- reading all the source files in the project is possible: takes minutes but not hours

In D153114#4612843 <https://reviews.llvm.org/D153114#4612843>, @ChuanqiXu wrote:

> @sammccall here is a question (or double check) about the intended initial version.
>
> This is the requirement for the initial version:
>
>> Don't attempt any cross-file or cross-version coordination: i.e. don't try to reuse BMIs between different files, don't try to reuse BMIs between (preamble) reparses of the same file, don't try to persist the module graph. Instead, when building a preamble, synchronously scan for the module graph, build the required PCMs on the single preamble thread with filenames private to that preamble, and then proceed to build the preamble.
>
> And all of us agree that it will be the job of the preamble to manage the module files in the end of the day. I just want to ask or double check that it might be OK to not related the module files with preamble in the initial version, right?
>
> Since the definition (or description) of preamble is:
>
>> A preamble can be reused between multiple versions of the file until invalidated by a modification to a header, compile commands or modification to relevant part of the current file.
>
> And it looks like it beyonds the scope of the our requirement of the initial patch.

On the one hand, if building modules in the AST works and is simpler, sure we can delay the "preamble optimization" for them.

But I don't really expect this is the case:

- as we've established, you can reach `import` statements inside the preamble region (`#include` a textual header that itself contains an `import`). So you need to build at least some of the required BMIs before building the preamble, doing it when parsing the main AST is not enough. (I believe the current version of the patch will just fail to parse some code in the preamble in this case).
- rebuilding BMIs with every preamble (not trying to reuse/cache/version them yet) will be pretty slow. But if we rebuild them every main-file reparse (i.e. every keystroke!) I think this is going to be intolerably slow to the point of not even being useful as a prototype.
- I only see two things that are really different between building during preamble vs building during AST, and neither are a lot of work:
  - invalidation: we need the preamble to be invalid if imports outside the preamble region have changed, or if the inputs of modules themselves have changed. But I think we can punt on all of this for now, and just not rebuild the preamble sometimes when we should. (Workaround: manually touch the preamble region after adding imports)
  - we need to attach the modules to PreambleData, rather than having them local to `ParsedAST::build`. But this really does not seem like much work at all - as we're not (yet) sharing modules through the filesystem, encapsulating the lifetime of a generated module in an object is something we should be doing right from the start anyway.
- given this, adding the code to ASTWorker and then moving it to PreambleWorker later seems like it doesn't save anything much over doing it right in the first place


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114



More information about the cfe-commits mailing list