[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

Mon Oct 17 18:09:25 PDT 2022

dblaikie added a comment.

In D134267#3851479 <https://reviews.llvm.org/D134267#3851479>, @ChuanqiXu wrote:

> In D134267#3849629 <https://reviews.llvm.org/D134267#3849629>, @dblaikie wrote:
>
>> To the original point I was making about implicit modules (which might've been my own confusion due to the term being brought up in D134269 <https://reviews.llvm.org/D134269>):
>>
>> Implicit modules is the situation where the compiler, finding that it needs a module while compiling some usage, knows how to go out and spawn a new process to create that module and then cache it. The build system never knows about modules, their builds or their dependencies.
>>
>> This patch had some superficial similarities - specifically around having an on-disk cache of modules that the user/build system/etc invoking the compiler isn't necessarily aware of/managing/invalidating/observing/etc. But it does differ in an important way from implicit modules in that the compiler won't implicitly build modules - you still have to specify the modules and their usage as separate compilations in-order (ie: modules need to be explicitly built before their usage). I think that makes a big difference to this being feasible, at least in the small scale.
>
> Oh, now I got your point. It is caused by the imprecise name. My bad.
>
>> The remaining concern is that this feature should likely not be used by a build system - because it won't know the dependencies (or, if it does know the dependencies then the build system, not the compiler, should be managing the BMIs) & so won't know how to schedule things for maximum parallelism without incorrect ordering, and correct rebuilding of dependencies when necessary.
>
> I agree it won't reach the maximum parallelism. But I think it should be able to rebuild correctly if the build system understands the dependencies between module unit. For example, if B.cpp imports module A, and A is defined in A.cppm. And when A.cppm changes, it will be fine if the build system will compile A.cppm first and compile B.cpp then. I think this is achievable by the build system. (For example, the P1689 <https://reviews.llvm.org/P1689> proposal I'm working on). So the problem becomes a performance problem instead of a correctness problem. So it looks not bad to me. I still feel it is not good to make perfect as the enemy of better.

The build system still needs to know that B.cppm depends on A.cppm - and once it knows that, it's not a huge cost for it to know the name of the file that represents that dependency and is produced by A.cppm and passed to B.cppm, I think?

In short - seems like we should separate out the cache discussion from the "one phase" compilation in the sense of a single build action that takes a .cppm and generates both a .o and a .pcm in one compiler/driver invocation. (maybe something like this is what @iains has already sent out in another review?)

to @iains point about "it'd be good if we didn't have to invoke two underlying commands from the one drivter invocation" - yeah, agreed. Though I wouldn't mind one step being "add the driver interface" and another being "fix whatever serialization isuse/etc/ might stand in the way of doing .cppm->{.o,.pcm} in a single action without serialization, so we can then start stripping stuff out of the .pcm since it'll only need to contain the interface, and not have to worry about having enough info for .o generation anymore"

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267