[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

Mon Oct 10 23:55:04 PDT 2022

iains added a comment.

In D134267#3848539 <https://reviews.llvm.org/D134267#3848539>, @ChuanqiXu wrote:

> In D134267#3847399 <https://reviews.llvm.org/D134267#3847399>, @iains wrote:
>
>> 

>> I am concerned that with the current scheme (with or without the change in this review) we could end up with three process launches per modular file:
>>
>> - one to determine the deps
>> - one to produce the BMI
>> - one to produce the object
>>
>> At least with GCC (and a patch series like the one I drafted) we can reduce that to 2
>>
>> With the interfaces mentioned in P1184 <https://reviews.llvm.org/P1184>, it is plausible to make that 1 (at the expense of potentially many stalled builds - but stalled early in compilation, so hopefully with the smallest memory footprint we could achieve).
>
> I pretty believe the current scheme is better for the following reasons:
> (1) The current scheme has higher paralellism potentially. See the `Why two-phase compilation?` section in https://reviews.llvm.org/D134269 for examples. I believe this should be the future direction to speedup the compilation.

That is a possibility, at the expense of more process launches - and I do not think that the corresponding model for the server-interactive approach has yet been fully explored.

We go to some trouble (with the driver invoking the compiler in-process) to avoid unnecessary process launches .. with the current state of hardware, it could well be more efficient to have many processes waiting rather than many processes launched.

(I am not making a claim here, just observing that the data are incomplete, since we do not have a corresponding model offered for the client-server case, and we do not have any kind of measurements - these are all "thought experiments").

> (2) Your scheme saves the time for deserialization. However, clang implement many ODR checks in the deserialization part. Skipping the deserialization will omit these ODR checks, which will break the semantics. And this is the reason why that scheme may not be good.

What you seem to be saying here is that:

`clang++ -std=c++20 foo.cxxm -c -o foo,o`

Does different ODR analysis from:

`clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
`clang++ -std=c++20 foo.pcm -o foo.o`

If so, then that is of some concern since those are both valid compilations in the current compiler.

In the first case, the back end is connected as an AST consumer to the front - there is no serialise/deserialise.  The model I implemented simply streams the AST instead of throwing it away ...

(repeating here, it is not the deserialisation that I think is key [although that is also useful] but minimising process launches).

> (3) It makes implementation more complex.

Agreed, but it's reasonably self-contained.

> For the first 2 reasons, I believe clang is in the right directions. Also for the filename suffixes, it is helpful for static analyzing tools to know whether or not if a file is module unit. So the static analyzing tools can scan every `.cppm` file in the working directory automatically in the preparation time. For example, if the build system are ready enough, we can omit the `module files` in the build script like: https://github.com/ChuanqiXu9/llvm-project/blob/94a71a7046a8672a7e06311b0c05c0a8c9ae4619/P1689Examples/HelloWorld/CMakeLists.txt#L18-L19.
> I believe this is the reason why MSVC made a similar choice. (clang is easy to support MSVC's 'ixx' extension).

There seem to be two schools of thought.  (1) It is good to use extensions to determine the file contents (2) we should not impose a requirement for magic extension names if the implementation can determine the content without.

For my part, I do prefer the second .. but of course will be happy to go with consensus,

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267