[PATCH] D118586: [C++20][Modules][3/8] Initial handling for module partitions.

Thu Feb 17 22:52:28 PST 2022

ChuanqiXu added inline comments.

================
Comment at: clang/lib/Sema/SemaModule.cpp:177
+  if (IsPartition) {
+    ModuleName += ":";
+    ModuleName += stringFromPath(Partition);
----------------
iains wrote:
> urnathan wrote:
> > iains wrote:
> > > ChuanqiXu wrote:
> > > > I chose '-' in my implementation since here ModuleName refers to the name in filesystem, right? And I remember '-' is the choice of GCC. So let's keep consistency.
> > > This is not my understanding.
> > > 
> > > ModuleName is the "user-facing" name of the module that is described in the source, and we use this in diagnostics etc.
> > > 
> > > The translation of that name to an on-disk name can be arbitrary and there are several mechanisms in clang already (e.g. -fmodule-file=A:B=foo.pcm) the module loader uses these to find the module on the basis of its user-facing name where required.
> > > 
> > > When we have P1184, it is even more important that the interface is supplied with the name that the user will put into the source code.
> > > 
> > > 
> > I agree with Iain, we should use ':' is module names here.  When mapping a named module to the file system some translation is needed because ':' is not permitted in file names on some filesystems (you know the ones!)
> (just to expand a little more)
> 
> the on-disk name needs to be chosen suitably for the platform and by the user and/or the build system.
> 
> When the FE chooses a default filename (which is done in building jobs, not in the Parser of course) it chooses one based on the source filename.  It would be most unfortunate if the Parser/Sema needed to understand platform filename rules.
> 
> When you do  'clang -module-file-info' (with the existing or updated version) you should see the module name as per the source code, the translation would only apply to the filename itself.
> 
> - to answer a later comment:
> 
> when we do -fmodule-file=A_B.pcm  and A_B.pcm contains a module named A:B the loader notices this pairing when it pre-loads the module, so that when we ask for "A:B" the loader already knows which on-disk file contains. it.
> 
> if we use the HeaderSearch mechanisms (when we want to figure out a module-name<=> filename pairing without loading the module) we can use -fmodule-name=A:B=A_B.pcm,
> 
> These mechanisms work today, but P1184 is a more flexible mechanism and avoids having  massive command lines with many -fmodule-file= cases.
> 
But what if we need to import multiple modules? In our current workflow, we would like to compile importing module in following style:
```
clang++ -std=c++20 -fprebuilt-module-path=path1 -fprebuilt-module-path=path2 ... unit.cpp(m) ...
```

And unit.cpp(m) may look like:
```
export module M;
import :parta;
import :partb;
import :partc;
```

And our compiler would lookup `M-parta.pcm`, `M-partb.pcm` and `M-partc.pcm` in the path given in the command line. However, in current implementation, it would lookup `M:parta.pcm`, `M:partb.pcm` and `M:partc.pcm` in the path but it might fail. So my point here is that the current implementation doesn't work well with `fprebuilt-module-path`. And I don't think we should give up `fprebuilt-module-path` to support multiple `fmodule-file`. The `fprebuilt-module-path` is easier to understand and use for real projects. Even if we support multiple `-fmodule-file`, people might be frustrating to add many `fmodule-file` if they need to import many units.

So I really insist on this. Or at least let's add one option, something like `-fmodule-partition-separator` here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118586/new/

https://reviews.llvm.org/D118586