[PATCH] D118586: [C++20][Modules][3/8] Initial handling for module partitions.

Fri Feb 18 01:47:46 PST 2022

iains added inline comments.

================
Comment at: clang/lib/Sema/SemaModule.cpp:177
+  if (IsPartition) {
+    ModuleName += ":";
+    ModuleName += stringFromPath(Partition);
----------------
ChuanqiXu wrote:
> iains wrote:
> > urnathan wrote:
> > > iains wrote:
> > > > ChuanqiXu wrote:
> > > > > I chose '-' in my implementation since here ModuleName refers to the name in filesystem, right? And I remember '-' is the choice of GCC. So let's keep consistency.
> > > > This is not my understanding.
> > > > 
> > > > ModuleName is the "user-facing" name of the module that is described in the source, and we use this in diagnostics etc.
> > > > 
> > > > The translation of that name to an on-disk name can be arbitrary and there are several mechanisms in clang already (e.g. -fmodule-file=A:B=foo.pcm) the module loader uses these to find the module on the basis of its user-facing name where required.
> > > > 
> > > > When we have P1184, it is even more important that the interface is supplied with the name that the user will put into the source code.
> > > > 
> > > > 
> > > I agree with Iain, we should use ':' is module names here.  When mapping a named module to the file system some translation is needed because ':' is not permitted in file names on some filesystems (you know the ones!)
> > (just to expand a little more)
> > 
> > the on-disk name needs to be chosen suitably for the platform and by the user and/or the build system.
> > 
> > When the FE chooses a default filename (which is done in building jobs, not in the Parser of course) it chooses one based on the source filename.  It would be most unfortunate if the Parser/Sema needed to understand platform filename rules.
> > 
> > When you do  'clang -module-file-info' (with the existing or updated version) you should see the module name as per the source code, the translation would only apply to the filename itself.
> > 
> > - to answer a later comment:
> > 
> > when we do -fmodule-file=A_B.pcm  and A_B.pcm contains a module named A:B the loader notices this pairing when it pre-loads the module, so that when we ask for "A:B" the loader already knows which on-disk file contains. it.
> > 
> > if we use the HeaderSearch mechanisms (when we want to figure out a module-name<=> filename pairing without loading the module) we can use -fmodule-name=A:B=A_B.pcm,
> > 
> > These mechanisms work today, but P1184 is a more flexible mechanism and avoids having  massive command lines with many -fmodule-file= cases.
> > 
> But what if we need to import multiple modules? In our current workflow, we would like to compile importing module in following style:
> ```
> clang++ -std=c++20 -fprebuilt-module-path=path1 -fprebuilt-module-path=path2 ... unit.cpp(m) ...
> ```
> 
> And unit.cpp(m) may look like:
> ```
> export module M;
> import :parta;
> import :partb;
> import :partc;
> ```
> 
> And our compiler would lookup `M-parta.pcm`, `M-partb.pcm` and `M-partc.pcm` in the path given in the command line. However, in current implementation, it would lookup `M:parta.pcm`, `M:partb.pcm` and `M:partc.pcm` in the path but it might fail. So my point here is that the current implementation doesn't work well with `fprebuilt-module-path`. And I don't think we should give up `fprebuilt-module-path` to support multiple `fmodule-file`. The `fprebuilt-module-path` is easier to understand and use for real projects. Even if we support multiple `-fmodule-file`, people might be frustrating to add many `fmodule-file` if they need to import many units.
> 
> So I really insist on this. Or at least let's add one option, something like `-fmodule-partition-separator` here.
The semantics of clang hierarchical module names are different from C++20 module names.

C++20 does not specify any relationship between the module name and the filename - that task is for the user, build system (or in the case we have with clang where it can integrate some of that functionality, the lookup mechanism).

out of interest if the user puts:

```
export module A.B:C.D;
```

how do you represent that on disk in your implementation?

----

In my understanding, if the user does:

```
export module A.B:C.D;

clang -cc1 -std=c++20 -emit-module-interface foo.cpp -o xyzzy.pcm
```
then we should see :

```
clang -module-file-info xyzzy.pcm

Information for module file 'xyzzy.pcm':
  Module format: raw
  Generated by this Clang: (/repos/llvm-git.git 97fc6544d6a09f161d90417db05674a2b3d2a5fe)
  Module name: A.B:C.D

```

I think it would be wrong if that printed:
` Module name: A.B-C.D`

The primary module name is A.B (_not_ A) and there is no implication that we would see
A/B/...

What would happen if we ported clang to some (unknown, unlikely) platform where '-' was not a legal pathname character - we should not expect to have to change Sema for this, right?

-----

In summary my understanding is:

* The compiler (Parser/Sema) should know the C++20 module name as it is written by the user.

* You **can** achieve your objective of using the prebuilt-module-path, but the translation from C++20 module name to on-disk name needs to happen in the lookup, not in artificially changing the module name internally to the compiler.

-----

As a general point, there are already (too) many modules command line flags, and we will probably need to add a few more for C++20.  I have tried in my implementation to keep separate those for C++20 and those for implicit / hierarchical and modules ts.

IMHO it would be better for the user if modules were modal in the driver - and the driver rejected options that do not apply to the mode in action - but that is not for this patch!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118586/new/

https://reviews.llvm.org/D118586