[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

Tue Oct 11 01:56:49 PDT 2022

iains added a comment.

In D134267#3848862 <https://reviews.llvm.org/D134267#3848862>, @ChuanqiXu wrote:

> In D134267#3848793 <https://reviews.llvm.org/D134267#3848793>, @iains wrote:
>
>> In D134267#3848745 <https://reviews.llvm.org/D134267#3848745>, @ChuanqiXu wrote:
>>
>>>> FAOD: I think this should not be about "who's patches" - but about the compilation model and command lines that we want to end up with.
>>>
>>> BTW, in the current context, when I say "my" and "your", I just want to refer the corresponding things. There is no means to offend.
>>
>> sure - no problem.
>>
>> I guess we ought to be a bit more clear:
>> There is implementation divergence between GCC and clang and from the point to view of users and build systems having two different command line sets is not helpful.
>>
>> The patches I drafted were aimed at removing that divergence.
>> If that is not possible (because the ODR analysis requires steps that make it very difficult or inefficient to do so) then some more thought and/or restructuring is needed.
>
> Yeah, now this is more clear.
>
>>> In D134267#3848672 <https://reviews.llvm.org/D134267#3848672>, @iains wrote:
>>>
>>>>> (2) Your scheme saves the time for deserialization. However, clang implement many ODR checks in the deserialization part. Skipping the deserialization will omit these ODR checks, which will break the semantics. And this is the reason why that scheme may not be good.
>>>>
>>>> What you seem to be saying here is that:
>>>>
>>>> `clang++ -std=c++20 foo.cxxm -c -o foo,o`
>>>>
>>>> edit;  maybe I should have written -xc++ foo.cxxm
>>>>
>>>> Does different ODR analysis from:
>>>>
>>>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>>>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>>>
>>>> If so, then that is of some concern since those are both valid compilations in the current compiler.
>>>>
>>>> In the first case, the back end is connected as an AST consumer to the front - there is no serialise/deserialise.
>>>
>>> No. **Currently**, in clang, the following two is the same:
>>>
>>> `clang++ -std=c++20 foo.cxxm -c -o foo.o`
>>
>> That's why I added the edit ...
>>
>> `clang++ -std=c++20 -xc++ foo.cxxm -c -o foo.o`
>>
>> AFAIU that will not produce any error for the user?
>>
>> suppose I have foo.cxx which includes modules and has local declarations?
>> `clang++ -std=c++20  foo.cxx -c -o foo.o`
>>
>>> with
>>>
>>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>>
>>> The first compilation will generate `.pcm` files in the `/tmp` directories and compile the `.pcm` files in the `/tmp` directories.
>>
>> yes, I should have added the -xc++ in the first case ;)
>
> Oh, I got it. Now it makes sense. I misses the edit : (.
>
> The answer is:
> (1) Now, the behavior is different. And I once sent a bug report for it.
> (2) **Now** there won't be direct error message. And I **was** planned to add it. This is the reason why I closed the above bug report : )
> (3) If we're going to make it an error, this is not decided yet.

If we are going to **//require//** the serialisation/deserialisation for correctness then IMO we should have an error for an incorrect compilation.

>>> We could verify it by inserting debugging informations. Or writing a ODR-mismatch example, then when the compiler emits an error, we'll find the compiler emits the error in the deserialization part. I met these 2 situations for many times : )
>>>
>>>> The model I implemented simply streams the AST instead of throwing it away ...
>>>
>>> So that model misses the ODR checks, which is not good. Since it matters with the semantics.
>>
>> Understood - I wonder if we have layering right here - surely Sema should be responsible for ODR checks?
>> Is there any reason that relevant hashes cannot be generated on the AST //**without**// streaming it?
>> Was this model intended from the start ?
>> (the fact that one can connect a direct AST consumer to the back end, suggests that enforcing serialisation/deserialisation was not expected to be a requirement at some stage)
>
> In fact, we do the ODR checks in Sema too. (Would you feel it amazing? I felt amazing when I realize this at the first time)
>
> For the following example, the ODR check is made in Deserializer:
>
>   module;
>   #include "something"
>   export module any_module_name;
>   import m;
>
> or
>
>   // decls
>   import m;
>
> or
>
>   import m1;
>   import m2;
>
> In the above cases, the ODR check is made in deserialization part. And in the following examples, the ODR check is made in Sema part:
>
>   import m;
>   // decls
>
> What is the principle? I think it is caused by the on-the-fly compilation principle (The name is constructed by myself. I think you know what I am saying.)
>
> I mean, when we (the compiler) parse a declaration and we want to know if it violates the ODR, we made it in the Sema part. And when we import something (precisely, read something from a module, since clang did lazy loading), we did the ODR check in deserialization.
>
> Although I felt odd at first, but now I feel it reasonable in some levels. How do you feel now?

I still feel that deserialisation should do deserialisation, and Sema should be doing the diagnoses when it tries to use a decl (in fact, I would expect that to be required for correctness in the C++20 scheme, since the point of use is significant).  I will defer to folks who know the history better - but this seems like a layering bug to me still.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267