[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

Tue Oct 11 00:49:52 PDT 2022

ChuanqiXu added a comment.

> FAOD: I think this should not be about "who's patches" - but about the compilation model and command lines that we want to end up with.

BTW, in the current context, when I say "my" and "your", I just want to refer the corresponding things. There is no means to offend.

In D134267#3848672 <https://reviews.llvm.org/D134267#3848672>, @iains wrote:

>> (2) Your scheme saves the time for deserialization. However, clang implement many ODR checks in the deserialization part. Skipping the deserialization will omit these ODR checks, which will break the semantics. And this is the reason why that scheme may not be good.
>
> What you seem to be saying here is that:
>
> `clang++ -std=c++20 foo.cxxm -c -o foo,o`
>
> edit;  maybe I should have written -xc++ foo.cxxm
>
> Does different ODR analysis from:
>
> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
> `clang++ -std=c++20 foo.pcm -o foo.o`
>
> If so, then that is of some concern since those are both valid compilations in the current compiler.
>
> In the first case, the back end is connected as an AST consumer to the front - there is no serialise/deserialise.

No. **Currently**, in clang, the following two is the same:

`clang++ -std=c++20 foo.cxxm -c -o foo.o`

with

`clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
`clang++ -std=c++20 foo.pcm -o foo.o`

The first compilation will generate `.pcm` files in the `/tmp` directories and compile the `.pcm` files in the `/tmp` directories.

We could verify it by inserting debugging informations. Or writing a ODR-mismatch example, then when the compiler emits an error, we'll find the compiler emits the error in the deserialization part. I met these 2 situations for many times : )

> The model I implemented simply streams the AST instead of throwing it away ...

So that model misses the ODR checks, which is not good. Since it matters with the semantics.

> (repeating here, it is not the deserialisation that I think is key [although that is also useful] but minimising process launches).

So from my point of view, it is much  more important to preserve the semantics than minimizing the process launches.

> That is a possibility, at the expense of more process launches - and I do not think that the corresponding model for the server-interactive approach has yet been fully explored.
>
> We go to some trouble (with the driver invoking the compiler in-process) to avoid unnecessary process launches .. with the current state of hardware, it could well be more efficient to have many processes waiting rather than many processes launched.
>
> (I am not making a claim here, just observing that the data are incomplete, since we do not have a corresponding model offered for the client-server case, and we do not have any kind of measurements - these are all "thought experiments").

Yeah, I agree that there more space to explore in the client-server model. But for the current building model, the 2 phase compilation is obviously better than the 1 phase compilation. And my point is, if one day we proved that there are many good points in the client/server model, it is not too late to add it in clang. And **currently** it is not a good reason to block the current patch, at least it is pretty simple and innocent.

> There seem to be two schools of thought. (1) It is good to use extensions to determine the file contents (2) we should not impose a requirement for magic extension names if the implementation can determine the content without.
>
> For my part, I do prefer the second .. but of course will be happy to go with consensus,

I'm confused the second. I mean what if the other static analyzing tools (including build system but not limited to) want to analyze the modular project, if they can only invoke the compiler to query the result. Then things look complex enough in the state. But assume we can implement great compiler interfaces. There are still problems. The tools need to pass the potential module unit files to the compiler. With the first choice, the tools can pass the module unit files precisely while with the second one, the input number will be much larger. So it looks really worse to me...

----

Since we discussed many things. Let me try to summarize to avoid missing the topic.

(1) For the client-server model, I think it is too far now. I am not against it. Also I don't feel the current simple patch would is not good for it. 
(2) Your implementation skips the deserialization part, which misses many ODR checks. I think this is not good.
(3) I suddenly realize that your **design** and my **design** are not overlapped in some perspective. I mean the patch is working for `*.cppm` files, and your design is intended for `*.cpp` files. Then a question is: "what would the patch do for `*.cppm` files"? If we don't handle it specially, in your implementation, when we compile a `.cppm` file, the `.pcm` file would be generated twice. Once in the `/tmp` and once in the place you specified. I think this is not acceptable.

So in another shorter word, for the current states of clang, the current patch is more natural and more appropriate with the current states of clang. And for your patch, I feel like it is not so fit for the current clang. I'm not saying not fit the current states is not good. Otherwise we don't have any inventions. I just want to say it requires more work.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267