[PATCH] D126694: [C++20][Modules] Implementation of GMF decl elision.

Iain Sandoe via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 7 03:49:09 PDT 2022


iains added a comment.

@rsmith, @ChuanqiXu apologies for the multiple revisions, this has turned out to be much more involved than I imagined from the standard's text.

In D126694#3629254 <https://reviews.llvm.org/D126694#3629254>, @ChuanqiXu wrote:

> In D126694#3629251 <https://reviews.llvm.org/D126694#3629251>, @iains wrote:
>
>> In D126694#3629094 <https://reviews.llvm.org/D126694#3629094>, @ChuanqiXu wrote:
>>
>>> BTW, after I applied the patch, the compiler crashes at https://github.com/ChuanqiXu9/stdmodules.
>>
>> That link points to a project - is there (say) a gist of the crash information?
>
> Here is the crash log:

this code now compiles without error,



================
Comment at: clang/include/clang/AST/DeclBase.h:624
   bool isModulePrivate() const {
     return getModuleOwnershipKind() == ModuleOwnershipKind::ModulePrivate;
   }
----------------
iains wrote:
> ChuanqiXu wrote:
> > According to the opinion from @rsmith, the discarded declaration is private too.
> I guess you mean `>=`  ... however Discardable is a stronger constraint than Private 
> 
> If a decl remains marked Discardable (after the processing to determine reachable ones) that means it is both unreachable and invisible.
> So it must not participate in any processing (with the one exception of diagnostic output).  I would be concerned that the change you suggest above could cause a  Discardable decl to be considered in merging  or lookup and we would then need (maybe a lot) of logic like:
> 
> ```
>  if (D->isModulePrivate() && !D->isModuleDiscardable())
>     ...
> ```
> 
> I will take a look on the next iteration.
> 
I did try this and there are a number of regressions - when I looked into these there is some interaction with the changes made in D113545, so I think we should make these changes in a follow-one patch to avoid having two purposes in this one.


================
Comment at: clang/include/clang/Sema/Sema.h:2275-2276
 
+  llvm::SmallPtrSet<const Decl *, 32> SeenDecls;
+  llvm::SmallPtrSet<const clang::Type *, 32> SeenTypes;
+
----------------
rsmith wrote:
> These names seem too general to live directly in `Sema`.
(revised we only need to track the type pointers)

On the basis that the name is poor, for its intended purpose, I revised. 


================
Comment at: clang/include/clang/Sema/Sema.h:2273
+  void HandleGMFReachability(Decl *D) {
+    if (D->isModuleUnreachable() && isCurrentModulePurview()) {
+      D->setModuleOwnershipKind(Decl::ModuleOwnershipKind::ReachableWhenImported);
----------------
ChuanqiXu wrote:
> iains wrote:
> > ChuanqiXu wrote:
> > > I feel better if we would check if D lives in GMF. (We need to insert a check in isDiscardedInGlobalModuleFragment)
> > If the consensus is to add an extra test, OK.
> > 
> > However, as above, specifically to avoid making more and more tests in code that is executed very frequently - as the design currently stands, the only place that  `ModuleUnreachable` is set is in the GMF.
> Yeah, my opinion is the same as above. Although it is the design, it is more semantically clear and robust to add a additional check. I am just afraid it would confuse and block other readers or contributors.
this has now been revised and a check is applied before any change is made to discardable decls.


================
Comment at: clang/lib/AST/TextNodeDumper.cpp:1622
+  if (D->isModuleUnreachable())
+    OS << " ModuleUnreachable";
 }
----------------
ChuanqiXu wrote:
> iains wrote:
> > ChuanqiXu wrote:
> > > It may be better to keep the consistent style.
> > I don't think that is a matter of style `__module_private__` is a keyword used elsewhere?
> > 
> > If you look though the file you will see mostly that the printed output does not prepend or append underscores.
> > 
> > BTW, similar changes are probably needed in other node printers, this was done early to add debug.
> Oh, I found `__module_private__ ` is a keyword in clang modules. I didn't recognize it. Even in this case, I still prefer to keep the style consistently. I think users would be more comfortable to read consistent symbols. Also I think it is acceptable to keep `ModuleUnreachable` since it doesn't matter a lot to me.
OK we now have more fine-grained output for the module ownership in the dumps which allows specific tests to be constructed.  At present, the naming is as per decl.h (with the single exception of __module_private__ which has a user-facing representation).


================
Comment at: clang/lib/Sema/Sema.cpp:1130
     DiagnoseUseOfUnimplementedSelectors();
 
+    // For C++20 modules, we are permitted to elide decls in the Global
----------------
ChuanqiXu wrote:
> I prefer to wrap this logic to a function to make it easier to read.
I think the comment refers to an older version of the changes.


================
Comment at: clang/lib/Sema/Sema.cpp:1131-1133
+    // For C++20 modules, we are permitted to elide decls in the Global
+    // Module Fragment of a module interface if they are unused in the body
+    // of the interface.
----------------
ChuanqiXu wrote:
> I prefer to cite the standard. And the original comment looks not so right (since we don't and couldn't remove a declaration in GMF due to it is not used by the body of the interface)
this is now implemented differently, 


================
Comment at: clang/lib/Sema/Sema.cpp:1138-1141
+      for (DeclContext::decl_iterator DI = DC->decls_begin(),
+                                      DEnd = DC->decls_end();
+           DI != DEnd; ++DI) {
+        Decl *D = *DI;
----------------
ChuanqiXu wrote:
> Prefer range based loop.
this is now implemented differently


================
Comment at: clang/lib/Sema/Sema.cpp:1146
+        M = D->getOwningModule();
+        if (!M || !D->getOwningModule()->isGlobalModule())
+          continue;
----------------
ChuanqiXu wrote:
> `M == GlobalModuleFragment` says that M is a global module fragment in the current TU explicitly. The original implementation doesn't check if M is in the current TU or not.
this is now implemented differently


================
Comment at: clang/lib/Sema/Sema.cpp:1150
+          continue;
+        if (!D->isUsed(false) && !D->isReferenced())
+          WorkList.push_back(D);
----------------
rsmith wrote:
> rsmith wrote:
> > ChuanqiXu wrote:
> > > Should we check for `D->isUsed()` simply? It looks more straight forward to me.
> > Does this work properly if the declaration is referenced within the header unit? I think we track whether we've seen any use of the declaration anywhere, not only if we've seen a use in the current translation unit, and we'll need a different mechanism here.
> s/header unit/global module fragment/
this is now implemented differently


================
Comment at: clang/lib/Sema/SemaModule.cpp:916-920
+            markGMFDeclsReachableFrom(Target, /*MakeVisible*/ true);
+          }
+        } else {
+          markGMFDeclsReachableFrom(Child, /*MakeVisible*/ true);
+        }
----------------
rsmith wrote:
> Do we need the special `MakeVisible` handling here? I would have thought that it would be sufficient for `Child` itself to be visible. Making the target of a using declaration visible seems like it would do the wrong thing for a case like:
> 
> ```
> module;
> int f();
> export module M;
> namespace N { export using ::f; }
> ```
> 
> ```
> import M;
> int a = f(); // should be an error, ::f should not be visible
> int b = N::f(); // should be OK, UsingShadowDecl is visible
> ```
> 
> I wonder if we need any special handling here at all -- if the GMF decls are referenced by something in an `export` block, then I'd have expected they'll already be marked as used and that marking would have marked them as reachable.
> Do we need the special `MakeVisible` handling here? I would have thought that it would be sufficient for `Child` itself to be visible. Making the target of a using declaration visible seems like it would do the wrong thing for a case like:
> 
> ```
> module;
> int f();
> export module M;
> namespace N { export using ::f; }
> ```
> 
> ```
> import M;
> int a = f(); // should be an error, ::f should not be visible
> int b = N::f(); // should be OK, UsingShadowDecl is visible
> ```

fixed, and added a test case to cover this.

> I wonder if we need any special handling here at all -- if the GMF decls are referenced by something in an `export` block, then I'd have expected they'll already be marked as used and that marking would have marked them as reachable.

Indeed. this was over cautious,  However,  the one case that we have to cover is the target of a using decl - those are neither marked `used` nor `referenced`. 


================
Comment at: clang/lib/Sema/SemaModule.cpp:1024-1036
+  // Only visit each Decl once.
+  if (!SeenDecls.insert(Orig).second)
+    return;
+
+  // Do not alter the ownership unless it is ModuleDiscardable.
+  if (Orig->isModuleDiscardable()) {
+    assert(Orig->getOwningModule() &&
----------------
rsmith wrote:
> Instead of storing a `SeenDecls` set and checking it here, is it feasible to check whether we're transitioning this declaration from discardable to retained, and only doing the work below if so?
I had that as my original implementation, which I revised to do the same kind of thing as the type pointers.  However, the check for isModuleDiscardable is cheaper, indeed.

(JFTR, I also tried an implementation with a decl visitor, but it did not seem to be any less complex or really any shorter code-wise)

I cannot see how we can avoid tracking the types.


================
Comment at: clang/lib/Sema/SemaModule.cpp:978
+void Sema::markReachableGMFDecls(Decl *Orig) {
+
+  if (isa<FunctionDecl>(*Orig)) {
----------------
ChuanqiXu wrote:
> iains wrote:
> > ChuanqiXu wrote:
> > > It looks better to add assumption as an assertion .
> > what is `D` here?
> > `markReachableGMFDecls` is only called in the case that `Orig` is **not** discarded (we are marking decls as reachable because they are `used` in the interface..
> > 
> Oh, `D` should be `Orig`, my bad. Yeah, I found `markReachableGMFDecls ` is only called when `Orig` is not discarded. So I suggest to add the assertion to make it explicit and clear and it could avoid misuse in the future.
in the revised code, we have an assert.


================
Comment at: clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp:32
 void test_early() {
-  in_header = 1; // expected-error {{missing '#include "foo.h"'; 'in_header' must be declared before it is used}}
-  // expected-note@*{{not visible}}
+  in_header = 1; // expected-error {{use of undeclared identifier 'in_header'}}
 
----------------
ChuanqiXu wrote:
> iains wrote:
> > ChuanqiXu wrote:
> > > iains wrote:
> > > > ChuanqiXu wrote:
> > > > > This error message looks worse. I image I could hear users complaining this. (I doesn't say it is your fault since this is specified in standard and the standard cases are harder to understand). I want to know if this is consistent with GCC?
> > > > > This error message looks worse. I image I could hear users complaining this. (I doesn't say it is your fault since this is specified in standard and the standard cases are harder to understand). I want to know if this is consistent with GCC?
> > > > 
> > > > As you say, the standard says "neither reachable nor visible" 
> > > > if it's not reachable then. we would not have the information that it was from header foo.h so we cannot form the nicer diagnostic.
> > > > 
> > > > Perhaps we need to invent "reachable for diagnostics" ... which I'd rather not divert effort to right now.
> > > > 
> > > Maybe we could make a new diagnostic message.
> > I do not think it is a matter of a diagnostic message.
> > 
> > If we omit the decl from the BMI (which we are permitted to do if it is ModuleUnreachable), then it cannot be found and therefore the information on which header it came from would not be available.
> > 
> Your word makes sense.
in the current implementation, we mark the decls but do not yet actually remove them - so that the better diagnostic should still be available.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126694/new/

https://reviews.llvm.org/D126694



More information about the cfe-commits mailing list