[PATCH] D118586: [C++20][Modules][3/8] Initial handling for module partitions.

Chuanqi Xu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sun Feb 20 18:09:46 PST 2022


ChuanqiXu accepted this revision.
ChuanqiXu added inline comments.


================
Comment at: clang/lib/Sema/SemaModule.cpp:177
+  if (IsPartition) {
+    ModuleName += ":";
+    ModuleName += stringFromPath(Partition);
----------------
urnathan wrote:
> iains wrote:
> > ChuanqiXu wrote:
> > > iains wrote:
> > > > urnathan wrote:
> > > > > iains wrote:
> > > > > > ChuanqiXu wrote:
> > > > > > > I chose '-' in my implementation since here ModuleName refers to the name in filesystem, right? And I remember '-' is the choice of GCC. So let's keep consistency.
> > > > > > This is not my understanding.
> > > > > > 
> > > > > > ModuleName is the "user-facing" name of the module that is described in the source, and we use this in diagnostics etc.
> > > > > > 
> > > > > > The translation of that name to an on-disk name can be arbitrary and there are several mechanisms in clang already (e.g. -fmodule-file=A:B=foo.pcm) the module loader uses these to find the module on the basis of its user-facing name where required.
> > > > > > 
> > > > > > When we have P1184, it is even more important that the interface is supplied with the name that the user will put into the source code.
> > > > > > 
> > > > > > 
> > > > > I agree with Iain, we should use ':' is module names here.  When mapping a named module to the file system some translation is needed because ':' is not permitted in file names on some filesystems (you know the ones!)
> > > > (just to expand a little more)
> > > > 
> > > > the on-disk name needs to be chosen suitably for the platform and by the user and/or the build system.
> > > > 
> > > > When the FE chooses a default filename (which is done in building jobs, not in the Parser of course) it chooses one based on the source filename.  It would be most unfortunate if the Parser/Sema needed to understand platform filename rules.
> > > > 
> > > > When you do  'clang -module-file-info' (with the existing or updated version) you should see the module name as per the source code, the translation would only apply to the filename itself.
> > > > 
> > > > - to answer a later comment:
> > > > 
> > > > when we do -fmodule-file=A_B.pcm  and A_B.pcm contains a module named A:B the loader notices this pairing when it pre-loads the module, so that when we ask for "A:B" the loader already knows which on-disk file contains. it.
> > > > 
> > > > if we use the HeaderSearch mechanisms (when we want to figure out a module-name<=> filename pairing without loading the module) we can use -fmodule-name=A:B=A_B.pcm,
> > > > 
> > > > These mechanisms work today, but P1184 is a more flexible mechanism and avoids having  massive command lines with many -fmodule-file= cases.
> > > > 
> > > But what if we need to import multiple modules? In our current workflow, we would like to compile importing module in following style:
> > > ```
> > > clang++ -std=c++20 -fprebuilt-module-path=path1 -fprebuilt-module-path=path2 ... unit.cpp(m) ...
> > > ```
> > > 
> > > And unit.cpp(m) may look like:
> > > ```
> > > export module M;
> > > import :parta;
> > > import :partb;
> > > import :partc;
> > > ```
> > > 
> > > And our compiler would lookup `M-parta.pcm`, `M-partb.pcm` and `M-partc.pcm` in the path given in the command line. However, in current implementation, it would lookup `M:parta.pcm`, `M:partb.pcm` and `M:partc.pcm` in the path but it might fail. So my point here is that the current implementation doesn't work well with `fprebuilt-module-path`. And I don't think we should give up `fprebuilt-module-path` to support multiple `fmodule-file`. The `fprebuilt-module-path` is easier to understand and use for real projects. Even if we support multiple `-fmodule-file`, people might be frustrating to add many `fmodule-file` if they need to import many units.
> > > 
> > > So I really insist on this. Or at least let's add one option, something like `-fmodule-partition-separator` here.
> > The semantics of clang hierarchical module names are different from C++20 module names.
> > 
> > C++20 does not specify any relationship between the module name and the filename - that task is for the user, build system (or in the case we have with clang where it can integrate some of that functionality, the lookup mechanism).
> > 
> > out of interest if the user puts:
> > 
> > ```
> > export module A.B:C.D;
> > ```
> > 
> > how do you represent that on disk in your implementation?
> > 
> > ----
> > 
> > In my understanding, if the user does:
> > 
> > ```
> > export module A.B:C.D;
> > 
> > clang -cc1 -std=c++20 -emit-module-interface foo.cpp -o xyzzy.pcm
> > ```
> > then we should see :
> > 
> > ```
> > clang -module-file-info xyzzy.pcm
> > 
> > Information for module file 'xyzzy.pcm':
> >   Module format: raw
> >   Generated by this Clang: (/repos/llvm-git.git 97fc6544d6a09f161d90417db05674a2b3d2a5fe)
> >   Module name: A.B:C.D
> > 
> > ```
> > 
> > I think it would be wrong if that printed:
> > ` Module name: A.B-C.D`
> > 
> > The primary module name is A.B (_not_ A) and there is no implication that we would see
> > A/B/...
> > 
> > What would happen if we ported clang to some (unknown, unlikely) platform where '-' was not a legal pathname character - we should not expect to have to change Sema for this, right?
> > 
> > -----
> > 
> > In summary my understanding is:
> > 
> > * The compiler (Parser/Sema) should know the C++20 module name as it is written by the user.
> > 
> > * You **can** achieve your objective of using the prebuilt-module-path, but the translation from C++20 module name to on-disk name needs to happen in the lookup, not in artificially changing the module name internally to the compiler.
> > 
> > -----
> > 
> > As a general point, there are already (too) many modules command line flags, and we will probably need to add a few more for C++20.  I have tried in my implementation to keep separate those for C++20 and those for implicit / hierarchical and modules ts.
> > 
> > IMHO it would be better for the user if modules were modal in the driver - and the driver rejected options that do not apply to the mode in action - but that is not for this patch!
> > 
> > 
> Information should be presented to the user in the form the user understands -- M-P is not such a format.  As Iain says, the association from module names to file names is arbitrary, and baking (a default?) into Sema is not the right thing.
> 
> In some cases the compiler may need to tell the user both.  For instance
>   cannot find module 'M:P' (random/path/M-P.pcm)
> 
> Let's not derail the initial pieces of partitions with this complex mapping problem.
I still think the compiler should have a `-fmodule-partition-separator` option to tell the partition seperator in filesystem to ease the use of C++20 modules. It looks like that the current implementation couldn't handle the case that a module unit imports multiple  partitions. 

But I agree with there already too many options about clang modules. And I agree with we could handle the problem in later patches. I have no objection to stop the patch landing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118586/new/

https://reviews.llvm.org/D118586



More information about the cfe-commits mailing list