[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

Chuanqi Xu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 22 00:31:50 PDT 2023


ChuanqiXu added a comment.

In D153114#4603579 <https://reviews.llvm.org/D153114#4603579>, @sammccall wrote:

> In D153114#4602414 <https://reviews.llvm.org/D153114#4602414>, @ChuanqiXu wrote:
>
>>> Don't attempt any cross-file or cross-version coordination: i.e. don't try to reuse BMIs between different files, don't try to reuse BMIs between (preamble) reparses of the same file, don't try to persist the module graph. Instead, when building a preamble, synchronously scan for the module graph, build the required PCMs on the single preamble thread with filenames private to that preamble, and then proceed to build the preamble.
>>
>> Do I understand right? If I understand correctly, I fully agree with the direction. We can go slowly, as long as we keep moving forward.
>>
>> Then I'd like to leave the patch as-is for referencing and create new patches following the suggestion.
>
> Yes, that's the suggestion, and that plan makes sense to me, thanks!
>
> I did some more thinking about this (having a concrete implementation helps a lot!) and had a couple more thoughts.
> At some point we should write down a design somewhere, need to strike a balance between doing it early enough to be useful but late enough that we've understood!

Yeah, then let's make the page into some design ideas discussing page.

> Dep scanning - roles
> --------------------
>
> IIUC we do this for two reasons:
>
> - to identify what module names we must have PCMs for in order to build a given TU (either an open file, or a module we're building as PCM)
> - to build a database mapping module name => filename, which we compose with the CDB to know how to build a PCM for a given module name
>
> I think it would be good to clearly separate these. The latter is simpler, more performance-critical, async, and is probably not used at all if the build system can tell us this mapping.
> The latter is more complex, and will always be needed synchronously for the mainfile regardless of the build system.

Yes, agreed.

> Dep scanning - implementation
> -----------------------------
>
> The dep scanner seems to work by getting the compile command and running the preprocessor. This is fairly heavyweight, and I can't see anywhere it's going into single-file mode - is it really reading all `#included` headers? This is certainly not workable for reparses of the mainfile (when no headers have changed).
>
> It seems unneccesary: the standard seems to go to some lengths to ensure that we (almost) only need to lex the top-level file:
>
> - module and import decls can't be produced by macros (seems to be the effect of the `pp-module` directive etc)
> - module and import decls can't be `#include`d (definition of `module-file` and `[cpp.import]` rules)
>
> The wrinkle I see is that some PP usage is possible: the module name can be produced by a macro, and imports can be `#ifdef`d. I think the former is very unlikely (like `#include MACRO_NAME`) and we can not support it, and the latter will just result in us overestimating the deps, which seems OK.
> You have more context here though, and maybe I'm misreading the restrictions placed by the standard. Today clang doesn't seem to enforce most of these sort of restrictions, which is probably worth fixing if they're real.
>
> (This doesn't apply to header modules: it's perfectly possible to include a textual header which includes a modular header, and it's impossible to know without actually preprocessing. This divergence is nasty, but I don't think we should pessimize standard modules for it).

There are some problems due to the complexities of the standard...

The take away may be:

- module and import decls can't be produced by macros (seems to be the effect of the `pp-module` directive etc)
  - yes
- module and import decls can't be `#include`d (definition of `module-file` and `[cpp.import]` rules)
  - the module declarations can't be `#include`d.
  - but the import decls can be `#include`d partially. See the discussion of https://github.com/llvm/llvm-project/issues/59688 for detail. The explanation is:
    - the wording(http://eel.is/c++draft/cpp.import#3) is "If a pp-import is produced by source file inclusion (including by the rewrite produced when a #include directive names an importable header) while processing the group of a module-file, the program is ill-formed."
    - and the definition of `module-file` (http://eel.is/c++draft/cpp.pre#nt:module-file) is `pp-global-module-fragment pp-module group pp-private-module-fragment`.
    - so the phrase `the group of a module-file` only refers to the `group` in the definition of `module-file` literally. We can't expand the grammar.
- the module name can be produced by a macro
  - yes
- imports can be #ifdefd.
  - yes. And this is a pretty important use-case for using modules in practice.

So possibly we have to look into the `#include`s when scanning.

> Interaction with preamble
> -------------------------
>
> At a high level, `import` decls should be processed with the preamble: they should change infrequently, rebuilding modules is expensive, coarse-grained work, we want to make the same policy decisions on whether to use stale PCMs or block on fresh ones etc.
> However they don't appear in a prefix of the file, and this is pretty important to how the preamble action works, so exactly in what sense are they part of the preamble?
>
> I believe the best answer is:
>
> - "preamble" is really a set of required artifacts + conditions to check for validity
> - `import foo` in a file means `foo.pcm` is a required artifact, not that `preamble.pcm` contains an `ImportDecl`
>
> So given this code:
>
>   module;
>   #include <x>
>   module foo;
>   import dep1;
>   module :private;
>   import dep2;
>
> The "preamble region" should end after `#include <x>` and preamble.pcm should contain the AST & PP state up to that point.
> Meanwhile `dep1.pcm` and `dep2.pcm` are separate PCM files that will be loaded on each parse.
> For a preamble to be usable at all, we need to have built `preamble.pcm`, `dep1.pcm`, `dep2.pcm`.
> For a preamble to be up-to-date, the preamble region + set of imported modules must be unchanged, the preamble.pcm must be up to date with respect to its sources, and the module PCMs must be up-to-date with respect to their sources. (unclear exactly how to implement the latter, may need to add extra tracking)
>
> (When building this module as a PCM we may not want to treat dep2 as a dependency or parse the private fragment... but this is not relevant to preambles as we won't be using a preamble for this anyway)

Agreed basically. To make it clear, I think what you proposed may be to make `clang::clangd::PreambleData` contains the paths (or wrapped data structures) to `dep1.pcm` and `dep2.pcm`. And other parts of clangd should interact with the `dep1.pcm` or `dep2.pcm` with `clang::clangd::PreambleData`?

> For a preamble to be up-to-date, the preamble region + set of imported modules must be unchanged, the preamble.pcm must be up to date with respect to its sources, and the module PCMs must be up-to-date with respect to their sources. (unclear exactly how to implement the latter, may need to add extra tracking)

For this issue, (and the related ABA issue you gave inline comments), my thought **was** that the code intelligence doesn't have to have super strict real time requirement with the compilation systems. I mean, in my real experience, it is not so bad to see out-of-date results from code intelligence than the results from compilation systems. (I know this may not be accepted.)

> inside a module unit, imports must appear directly underneath the module declaration, above non-trivial declarations

Then this is not so correct. With the explanation in the same link above (https://github.com/llvm/llvm-project/issues/59688),  the following example may be valid:

  cpp
  // a.cpp
  export module a;



  hpp
  // b.hpp
  import a;



  cpp
  // c.cpp
  module;
  #include <b.hpp>
  export module c;


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114



More information about the cfe-commits mailing list