[PATCH] D123837: [C++20] [Modules] Judge isInCurrentModule currently

Tue Apr 19 03:15:57 PDT 2022

ChuanqiXu added inline comments.

================
Comment at: clang/include/clang/Basic/Module.h:547-548
+    // <global> is the default name showed in module map.
+    if (isGlobalModule())
+      return "<global>";
+
----------------
I thought to add an assertion here. But I feel like it is not so necessary and friendly.

================
Comment at: clang/include/clang/Basic/Module.h:537-543
+  static StringRef getPrimaryModuleInterfaceName(StringRef Name) {
+    return Name.split(':').first;
+  }
+
   /// Get the primary module interface name from a partition.
   StringRef getPrimaryModuleInterfaceName() const {
+    return getPrimaryModuleInterfaceName(getTopLevelModuleName());
----------------
iains wrote:
> ChuanqiXu wrote:
> > iains wrote:
> > > I do not understand the purpose of this change, we already iterated over making the function more efficient.
> > > 
> > > For C++20 modules there cannot be more than one parent - so that the nested call to getTopLevelModule(), via getTopLevelModuleName() is redundant.
> > > 
> > > Is there some other case that needs special handling?
> > > (If so then I think we should guard this at a higher level)
> > > 
> > The primary purpose of the change is that we need to get the primary module name for getLangOpts().CurrentModule. So we need a static member function which takes a StringRef.
> > 
> > Another point might be the correctness. For example, for the private module fragment, the current implementation would return the right result while the original wouldn't. (Although we didn't use private module fragment much...)
> > 
> > For the most case, I admit the current implementation would call more function a little more. But I think it is negligible.
> (Maybe I am worrying too much - but it does seem that we have quite some complexity in an operation that we expect to repeat many times per TU).
> 
> ----
> 
> > The primary purpose of the change is that we need to get the primary module name for getLangOpts().CurrentModule. So we need a static member function which takes a StringRef.
> 
> That is, of course,  fine (and we are stuck in the case that we are building a partition, with the fact that the primary module is not available).  It seems that there's not a lot we can do there. 
> 
> When we have module,  we could cache the result of the split, tho - so have a "PrimaryModuleName" entry in each module and populate it on the first use.  We are still stuck with the string comparison - but at least we could avoid repeating the split?
> As I noted, the reason we did not bother to do this in the first use was that that use was only once per TU).
> 
> When we are building a consumer of a module, on the other hand, we *do* have the complete module import graph.
> 
> > Another point might be the correctness. For example, for the private module fragment, the current implementation would return the right result while the original wouldn't. (Although we didn't use private module fragment much...)
> 
> Of course, we must fix correctness issues ..
> Since there is only GMF and PMF, perhaps we could reasonably treat those specially `if (isGlobalModule())` .. and/or `if (isPrivateModule())`?
> 
> > For the most case, I admit the current implementation would call more function a little more. But I think it is negligible.
> 
> It is hard to judge at the moment - perhaps we should carry on with what we have and then instrument some reasonable body of code?
> (Maybe I am worrying too much - but it does seem that we have quite some complexity in an operation that we expect to repeat many times per TU).

Never mind! This is the meaning of reviewing.

> Of course, we must fix correctness issues ..
Since there is only GMF and PMF, perhaps we could reasonably treat those specially if (isGlobalModule()) .. and/or if (isPrivateModule())?

Done.

After consideration, I think what you here makes sense. Especially now Module::getPrimaryModuleInterfaceName is only used for Sema.getLangOpts().CurrentModule. So it might be better to handle this in Sema or LangOptions.

================
Comment at: clang/lib/Sema/SemaLookup.cpp:1567-1568
+
+  return M->getPrimaryModuleInterfaceName() ==
+         Module::getPrimaryModuleInterfaceName(LangOpts.CurrentModule);
 }
----------------
iains wrote:
> ChuanqiXu wrote:
> > iains wrote:
> > > ... of course, "correctness before optimisation", but this test function seems like it would benefit from some refactoring as suggested below.
> > > 
> > > it seems a bit unfortunate that we are placing a string comparison in the "acceptable decl" lookup path, which might be made many times (the previous use of the string compare was in making the module decl - which only happens once in a TU).
> > > 
> > > * Sema has visibility of the import graph of modules (via DirectModuleImports and Exports).
> > > * We also know what the module parse state is (FirstDecl, GMF etc)
> > > * If were are not parsing a module (i.e. LangOpts.CurrentModule.empty()) the we could return false early?
> > > * if ModuleScopes is empty we know we are not (yet) parsing a module
> > > * Once we are parsing a module then we can test ModuleScopes.back() to find the current module 
> > > 
> > > there are only two users of isInCurrentModule() - so maybe we refactor could make more use of the higher level state information - and / or cache information on the parents of partitions  to  avoid the complexity of parsing strings each time.
> > > 
> > > what do you think?
> > Yeah, string comparison is relatively expensive and we should avoid that. I added a search cache here and I thought it should handle the most cases.
> > 
> > For the solution you described, I am not sure if I understood correctly. It looks like a quick return path if we found we are not parsing a module? If yes, I think the current check could do similar works. 
> That looks good.
> 
> On the actual test itself:
> 
> 1/
> we have the check of  `(M->isGlobalModule() && !M->Parent)`
> 
> So that answers "this decl is part of the global module", since that fires when we are parsing the GMF.  What will we answer,
>  if `M->isGlobalModule() && M->Parent` ?
>  it seems that with the changes above we will now answer that the decl is part of the Parent - not part of the GM?
> 
> 2/
> When we are actually  parsing the PMF we have a similar case, no?
>  (so that there is no Parent pointer set yet, and I think that means that we would need to examine ModuleScopes to find the Parent module?)
> 
> 
> 1/
> we have the check of  (M->isGlobalModule() && !M->Parent)
>
> So that answers "this decl is part of the global module", since that fires when we are parsing the GMF. What will we answer,
> if M->isGlobalModule() && M->Parent ?
> it seems that with the changes above we will now answer that the decl is part of the Parent - not part of the GM?

Good Catcha! Now in this revision, the function would return false for the case of `M->isGlobalModule() && M->Parent`. Another point I found is that the name of the function is not quite right. Since global module fragment belongs to global module, it wouldn't be within the current module. So I chose a new name `isUsableModule` for it from the wording  of `[module.global.frag]p1`. I tried to call it `isVisibleModule` but I found there is already one... I feel `isUsableModule` is not perfectly good... Any suggestion?

> 2/
> When we are actually parsing the PMF we have a similar case, no?

There might not be a problem. Since PMF belongs to a named module technically. So the current implementation would handle it correctly.

================
Comment at: clang/test/Modules/cxx20-10-1-ex2.cpp:62-69
+// According to [module.import]p7:
+//   Additionally, when a module-import-declaration in a module unit of some
+//   module M imports another module unit U of M, it also imports all 
+//   translation units imported by non-exported module-import-declarations in
+//   the module unit purview of U.
+//
+// So B is imported in B:X3 due to B:X2 imported B. So n is visible here.
----------------
iains wrote:
> ChuanqiXu wrote:
> > iains wrote:
> > > @rsmith 
> > > 
> > > so, I wonder if this is intended, it seems to defeat the mechanisms to avoid circular dependencies.
> > > 
> > > TU1:
> > > module M : Impl; // this is said not to generate a circular dependency because M : Impl does not implicitly import M
> > > 
> > > import M;  // but now we've done it manually ... and if that module is implicitly exported to importers of the Impl. module interface....
> > > ....
> > > 
> > > TU2:
> > > 
> > > export module M;
> > > import : Impl; 
> > > 
> > > // ...  then boom! we have a circular dep.
> > > 
> > > 
> > @rsmith should be the right one to answer this.
> > 
> > Personally, I feel the case you described should be avoided. And I found it is hard to right cyclic module dependent code in current fashion. Since when I compile TU1, the compiler suggests that it failed to found M. And when I compile TU2, the compiler suggests that it failed to found M:impl... So it looks like we could avoid worrying this now. Maybe the build system could give better diagnostic message in the future. I think this might not be a big problem.
> Right, one cannot actually build the faulty module (actually, in the same way as one cannot build example 3 from section 10.3 of the std).
> 
> So probably my concern is unfounded - it's just something the user will have to figure out - we are not required to diagnose the situation (and that would be hard to do for the compiler anyway).
> 
Agreed. This should be the work for build systems.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123837/new/

https://reviews.llvm.org/D123837