[PATCH] D136953: [C++20] Diagnose invalid and reserved module names
Aaron Ballman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Nov 1 11:40:59 PDT 2022
aaron.ballman added a comment.
I believe we now have consensus on the patch. I'll land it in a few days in case any reviewers want to weigh in now that the discussion has stabilized.
================
Comment at: clang/lib/Sema/SemaModule.cpp:282
+ StringRef FirstComponentName = Path[0].first->getName();
+ if (!getSourceManager().isInSystemHeader(Path[0].second) &&
+ (FirstComponentName == "std" ||
----------------
philnik wrote:
> aaron.ballman wrote:
> > philnik wrote:
> > > aaron.ballman wrote:
> > > > philnik wrote:
> > > > > aaron.ballman wrote:
> > > > > > philnik wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > philnik wrote:
> > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > philnik wrote:
> > > > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > > > > > > erichkeane wrote:
> > > > > > > > > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > > > > > > > > > > cor3ntin wrote:
> > > > > > > > > > > > > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > > > > > > > > > > > > std modules should be irreverent with system headers; The intuition of the wording should be that the users can't declare modules like `std` or `std.compat` to avoid possible conflicting. The approach I imaged may be add a new compilation flags (call it `-fstd-modules`) now. And if the compiler found a `std` module declaration without `-fstd-modules`, emit an error.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > For now, I think we can skip the check for `-fstd-modules` and add it back when we starts to support std modules actually.
> > > > > > > > > > > > > > > > > > > > The idea is that standard modules are built from system directories... it seems a better heuristic than adding a flag for the purpose of 1 diagnostics ( maybe some other system library could in theory export std with no warning, but I'm not super worried about that being a concern in practice)
> > > > > > > > > > > > > > > > > > > > The idea is that standard modules are built from system directories...
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This is not true. For example, if someday libc++ supports std modules, then we need to build the std modules in our working directory, which is not system directories. And **ideally**, we would distribute and install module file in the system directories. But it is irreverent with the path of the source file.
> > > > > > > > > > > > > > > > > > > then we need to build the std modules in our working directory, which is not system directories.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > `-isystem`, pragmas, and linemarkers are all ways around that -- I don't think we need a feature flag for this, unless I'm misunderstanding something.
> > > > > > > > > > > > > > > > > Although it may be a little bit nit picker, the module unit which declares the std modules won't be header. It is a module unit. So it requires we implement std modules by wrapping linemarkers around the std modules declaration, which looks a little bit overkill.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > And another point is that maybe we need to introduce another feature flags to implement std modules. Although I tried to implement std modules within the current features, I can't prove we can implement std modules in that way in the end of the day.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Let me add some more words. The standards require us to implement std modules without deprecating the system headers. This strategy leads the direction to "implement the components in the original headers and control their visibility in the std module unit".
> > > > > > > > > > > > > > > > > It may look like:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > //--- std.cppm
> > > > > > > > > > > > > > > > > module;
> > > > > > > > > > > > > > > > > #include <algorithm>
> > > > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > > > export module std;
> > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Then how can control the visibility? In my original experiments, I use the style:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > //--- std.cppm
> > > > > > > > > > > > > > > > > module;
> > > > > > > > > > > > > > > > > #include <algorithm>
> > > > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > > > export module std;
> > > > > > > > > > > > > > > > > export namespace std {
> > > > > > > > > > > > > > > > > using std::sort;
> > > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > but this doesn't always work. For example, we can't `using` a in-class friend definition. And there are more reasons, the unreachable declarations in the global module fragment (the section from `module;` to `export module [module_name]`) can be discarded to reduce the size of the module file. And the reachable rules are complex. But the simple story is that it is highly possible the a lot of necessary declarations in global module fragment in the above snippet would be discarded so that the user can't use std modules correctly. I mean, we **may** need a special feature flag for it. And the method with `system headers` looks not good and semantics are not so consistency.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > IMO, any such additional flag (say `-isystem-module`) should ALSO use the `isInSystemHeader` infrastructure. I suspect nearly every place we use `isInSystemHeader` we also mean to exclude a system-module as well.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think that any such flag can/should be added later as you figure out how it should be specified/work. That said, when you do so, it should either also feed `isInSystemHeader`, or basically every use of `isInSystemHeader` should ALSO changed to use the new flag as well
> > > > > > > > > > > > > > > The main confusion part to me is that why we connect `std modules` with system paths? I know implementors can work around the check like the tests did. But what's the point? I know every header of libcxx contains:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > #ifndef _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER
> > > > > > > > > > > > > > > # pragma GCC system_header
> > > > > > > > > > > > > > > #endif
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > but it is for the compatibility with GCC. And it looks not so meaningful to force the implementation of modules to keep such contraints.
> > > > > > > > > > > > > > > I think that any such flag can/should be added later as you figure out how it should be specified/work. That said, when you do so, it should either also feed isInSystemHeader, or basically every use of isInSystemHeader should ALSO changed to use the new flag as well
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1, that's my thinking as well.
> > > > > > > > > > > > > > The main confusion part to me is that why we connect std modules with system paths?
> > > > > > > > > > > > >
> > > > > > > > > > > > > We consider the system paths to be "special" in that they can do things "user" paths cannot do. I think we want to keep that model for modules because of how successful it has been for includes. (e.g., don't suggest fixits in a system module but do suggest them for user modules).
> > > > > > > > > > > > OK, I got it and it won't be a problem we can't workaround.
> > > > > > > > > > > IIUC this would prevent the library from handling the `std` module the same as a user module, right? AFAIK the actual use of `_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER` is to enable warnings in the headers for development, which would not work with the modules with this patch, or am I misunderstanding something? Is there a reason this isn't a warning that's an error by default? That would allow the library to disable it and still serve the same purpose.
> > > > > > > > > > > AFAIK the actual use of _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER is to enable warnings in the headers for development, which would not work with the modules with this patch, or am I misunderstanding something?
> > > > > > > > > >
> > > > > > > > > > Why would the library want a diagnostic telling them they're using a reserved identifier as a module name?
> > > > > > > > > >
> > > > > > > > > > > Is there a reason this isn't a warning that's an error by default? That would allow the library to disable it and still serve the same purpose.
> > > > > > > > > >
> > > > > > > > > > It also allows users to produce modules with reserved identifiers. It's an error that can't be downgraded specifically because I don't think we want our implementation to give arbitrary users that ability.
> > > > > > > > > > > AFAIK the actual use of _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER is to enable warnings in the headers for development, which would not work with the modules with this patch, or am I misunderstanding something?
> > > > > > > > > >
> > > > > > > > > > Why would the library want a diagnostic telling them they're using a reserved identifier as a module name?
> > > > > > > > >
> > > > > > > > > I don't mean specifically this error, I mean more generally that other warnings should be generated from std modules. Treating the headers as system headers disables most warnings, which is the reason libc++ treat them as normal headers in the tests.
> > > > > > > > >
> > > > > > > > > > > Is there a reason this isn't a warning that's an error by default? That would allow the library to disable it and still serve the same purpose.
> > > > > > > > > >
> > > > > > > > > > It also allows users to produce modules with reserved identifiers. It's an error that can't be downgraded specifically because I don't think we want our implementation to give arbitrary users that ability.
> > > > > > > > >
> > > > > > > > > I think there should be some way to enable normal warnings in the special modules, since it makes the life of library developers a lot easier. I don't care whether that's through disabling a warning or some special sauce to enable warnings from the std module, but there should be some way.
> > > > > > > > >
> > > > > > > > > I don't mean specifically this error, I mean more generally that other warnings should be generated from std modules. Treating the headers as system headers disables most warnings, which is the reason libc++ treat them as normal headers in the tests.
> > > > > > > >
> > > > > > > > Ahhh, thank you, that makes a lot more sense to me. :-D
> > > > > > > >
> > > > > > > > > I think there should be some way to enable normal warnings in the special modules, since it makes the life of library developers a lot easier. I don't care whether that's through disabling a warning or some special sauce to enable warnings from the std module, but there should be some way.
> > > > > > > >
> > > > > > > > Just like we have `-Wsystem-headers`, I would expect we'd have something similar for modules (or reuse it, perhaps with a different name, for both headers and modules).
> > > > > > > `-Wsystem-headers` doesn't work because that enables warnings in all system headers, but we only want the warnings from the system library that we write, i.e. libc++. Or can you somehow control in which system headers warnings are emitted?
> > > > > > > -Wsystem-headers doesn't work because that enables warnings in all system headers, but we only want the warnings from the system library that we write, i.e. libc++
> > > > > >
> > > > > > You're correct that it enabled warnings in all system headers if used from the command line, but you can use pragmas to control which warnings are or are not enabled on a file by file basis.
> > > > > >
> > > > > > However, maybe I'm not understanding your concerns properly, so let me back up a step. With this patch, trying to create a module with a reserved name fails with an error unless the source file with the module declaration is in a file considered to be a "system header" (whether through -isystem, a pragma, linemarkers, etc). For libc++, I was thinking this should not be an issue because libc++ has `_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER` to control whether the file is considered a system header or not, so when exporting modules you have to ensure that macro is not defined. If you want to test the behavior as though the user was the one declaring that exported module... well, they get an error because they're not allowed to define that module, so I'm not certain what value there is in testing that specific situation. Am I misunderstanding something?
> > > > > >
> > > > > My understanding right now is that libc++ would have to build the module as a system header. You want this, so users have to try really hard to produce a module with a reserved identifier, which makes sense. Because of that, warnings generated inside the module would be suppressed though.
> > > > > Essentially, my concern is that we don't get all the warnings we want when testing anything inside the `std` module. To get warnings from the `std` module we could use `-Wsystem-headers`. The problem there is that it would also produce warnings in headers we have no control over, which means we can't use it.
> > > > Ah, thank you for clarifying! I think we're considering the libc++ implementation strategy from different angles. One way to do it is as you said -- build the module as a system header. But I was thinking libc++ would do something a bit more novel (and maybe that's a bad idea!):
> > > > ```
> > > > # __LINE__ __FILE__ 1 3 // Enter system header
> > > > export module std;
> > > > # __LINE__ __FILE__ 2 3 // Leave system header
> > > > ```
> > > > and leave the rest of the module code "outside" of the faked up system header block of linemarker directives.
> > > I guess it's `# <line> <file> <mode?> <???>`? That seems to me quite hacky. Maybe it would be nicer if we add an attribute like `[[clang::system_module]]` that would allow reserved identifiers, and maybe even ensure that it's actually one. Though that might be overkill, and would again allow people to more easily create modules with reserved identifiers. It would make the code a lot nicer to read though.
> > > I guess it's # <line> <file> <mode?> <???>? That seems to me quite hacky.
> >
> > Yeah, this is a pretty arcane feature but it's been around for ages. GCC documents this feature (Clang does not and we absolutely should): https://gcc.gnu.org/onlinedocs/gcc-12.2.0/cpp/Preprocessor-Output.html#Preprocessor-Output
> >
> > > It would make the code a lot nicer to read though.
> >
> > It would, but we have to balance protecting this reserved space from the compiler while we can against readability of (already generally unreadable, IMHO) STL headers. But then again, linemarkers allow the user to do this themselves in the same manner as an attribute would. Between the two, I think linemarkers are far less likely to be used by users just trying to get their code to compile without worrying about the details of what they're doing.
> > > I guess it's # <line> <file> <mode?> <???>? That seems to me quite hacky.
> >
> > Yeah, this is a pretty arcane feature but it's been around for ages. GCC documents this feature (Clang does not and we absolutely should): https://gcc.gnu.org/onlinedocs/gcc-12.2.0/cpp/Preprocessor-Output.html#Preprocessor-Output
> Thanks for the link!
> > > It would make the code a lot nicer to read though.
> >
> > It would, but we have to balance protecting this reserved space from the compiler while we can against readability of (already generally unreadable, IMHO) STL headers. But then again, linemarkers allow the user to do this themselves in the same manner as an attribute would. Between the two, I think linemarkers are far less likely to be used by users just trying to get their code to compile without worrying about the details of what they're doing.
> Yeah, I also thought that it might be a bit too easy to add an attribute.
> It would also be possible to just have an internal header like
> ```lang=c++, name=__export_module_std
> #pragma GCC system_header
> export module std;
> ```
> right? That wouldn't be great, but it would be a lot easier to understand than linemarkers.
> As a side note: Anything that we only have to make available through modules could be a lot more readable, since the preprocessor wouldn't be a problem anymore. That won't happen soon, but I can dream.
>
> Anyways, since we can enable warnings inside modules I don't have any concerns here anymore. Thanks for being so patient with me!
> It would also be possible to just have an internal header like ... right?
Yes, that should work I believe. And yes, that would be more direct than using linemarkers.
> Anyways, since we can enable warnings inside modules I don't have any concerns here anymore. Thanks for being so patient with me!
Thank you for the great discussion! If we find we need an attribute/something more here in the future. we can certainly revisit once we have a concrete problem to solve.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136953/new/
https://reviews.llvm.org/D136953
More information about the cfe-commits
mailing list