[clang] 1da4039 - Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)" (#173789)
via cfe-commits
cfe-commits at lists.llvm.org
Tue Jan 20 01:42:51 PST 2026
Author: yronglin
Date: 2026-01-20T17:42:46+08:00
New Revision: 1da403937eb9f48b2de9c27ba1aa0eba50bfdf5f
URL: https://github.com/llvm/llvm-project/commit/1da403937eb9f48b2de9c27ba1aa0eba50bfdf5f
DIFF: https://github.com/llvm/llvm-project/commit/1da403937eb9f48b2de9c27ba1aa0eba50bfdf5f.diff
LOG: Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)" (#173789)
The patch reapply https://github.com/llvm/llvm-project/pull/173130.
This patch implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
Additionally:
- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.
After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```
Fixes https://github.com/llvm/llvm-project/issues/54047
The previous patch has an use-after-free issue in
Lexer::LexTokenInternal function. Since C++20, the `export`, `import`
and `module` identifiers may be an introducer of a C++ module
declaration/importing directive, and the directive will handled in
`LexIdentifierContinue`. Unfortunately, the EOF may be encountered in
`LexIdentifierContinue` and `CurLexer` might be destructed in
`HandleEndOfFile`, If the code after `LexIdentifierContinue` try to
access `LangOps` or other class members in this Lexer, it will hit
undefined behavior.
This patch also fix the use-after-free issue in Lexer by introduce a
mechanism to delay the destruction of `CurLexer` in `Preprocessor`
class.
---------
Signed-off-by: yronglin <yronglin777 at gmail.com>
Added:
clang/test/CXX/drs/cwg2947.cpp
clang/test/CXX/module/cpp.pre/p1.cpp
clang/test/Lexer/cxx20-module-directive.cpp
Modified:
clang/docs/ReleaseNotes.rst
clang/docs/StandardCPlusPlusModules.rst
clang/include/clang/Basic/DiagnosticLexKinds.td
clang/include/clang/Basic/DiagnosticParseKinds.td
clang/include/clang/Basic/IdentifierTable.h
clang/include/clang/Basic/TokenKinds.def
clang/include/clang/Basic/TokenKinds.h
clang/include/clang/Frontend/CompilerInstance.h
clang/include/clang/Lex/CodeCompletionHandler.h
clang/include/clang/Lex/DependencyDirectivesScanner.h
clang/include/clang/Lex/ModuleLoader.h
clang/include/clang/Lex/Preprocessor.h
clang/include/clang/Lex/Token.h
clang/include/clang/Lex/TokenLexer.h
clang/include/clang/Parse/Parser.h
clang/lib/Basic/IdentifierTable.cpp
clang/lib/Basic/TokenKinds.cpp
clang/lib/DependencyScanning/ModuleDepCollector.cpp
clang/lib/Frontend/CompilerInstance.cpp
clang/lib/Frontend/InitPreprocessor.cpp
clang/lib/Frontend/PrintPreprocessedOutput.cpp
clang/lib/Lex/DependencyDirectivesScanner.cpp
clang/lib/Lex/Lexer.cpp
clang/lib/Lex/PPDirectives.cpp
clang/lib/Lex/PPLexerChange.cpp
clang/lib/Lex/Preprocessor.cpp
clang/lib/Lex/TokenConcatenation.cpp
clang/lib/Lex/TokenLexer.cpp
clang/lib/Parse/Parser.cpp
clang/lib/Sema/SemaModule.cpp
clang/test/CXX/basic/basic.link/p3.cpp
clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp
clang/test/CXX/lex/lex.pptoken/p3-2a.cpp
clang/test/CXX/module/basic/basic.link/module-declaration.cpp
clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm
clang/test/Modules/pr121066.cpp
clang/test/Modules/preprocess-named-modules.cppm
clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
clang/unittests/Lex/ModuleDeclStateTest.cpp
clang/www/cxx_dr_status.html
clang/www/cxx_status.html
Removed:
################################################################################
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 100474a5a1777..bde3bb1e81210 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -81,6 +81,8 @@ C++23 Feature Support
C++20 Feature Support
^^^^^^^^^^^^^^^^^^^^^
+- Clang now supports `P1857R3 <https://wg21.link/p1857r3>`_ Modules Dependency Discovery. (#GH54047)
+
C++17 Feature Support
^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/docs/StandardCPlusPlusModules.rst b/clang/docs/StandardCPlusPlusModules.rst
index 71988d0fced98..f6ab17ede46fa 100644
--- a/clang/docs/StandardCPlusPlusModules.rst
+++ b/clang/docs/StandardCPlusPlusModules.rst
@@ -1384,33 +1384,6 @@ declarations which use it. Thus, the preferred name will not be displayed in
the debugger as expected. This is tracked by
`#56490 <https://github.com/llvm/llvm-project/issues/56490>`_.
-Don't emit macros about module declaration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This is covered by `P1857R3 <https://wg21.link/P1857R3>`_. It is mentioned here
-because we want users to be aware that we don't yet implement it.
-
-A direct approach to write code that can be compiled by both modules and
-non-module builds may look like:
-
-.. code-block:: c++
-
- MODULE
- IMPORT header_name
- EXPORT_MODULE MODULE_NAME;
- IMPORT header_name
- EXPORT ...
-
-The intent of this is that this file can be compiled like a module unit or a
-non-module unit depending on the definition of some macros. However, this usage
-is forbidden by P1857R3 which is not yet implemented in Clang. This means that
-is possible to write invalid modules which will no longer be accepted once
-P1857R3 is implemented. This is tracked by
-`#54047 <https://github.com/llvm/llvm-project/issues/54047>`_.
-
-Until then, it is recommended not to mix macros with module declarations.
-
-
Inconsistent filename suffix requirement for importable module units
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index a72d3f37b1b72..77feea9f869e9 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -503,8 +503,8 @@ def warn_cxx98_compat_variadic_macro : Warning<
InGroup<CXX98CompatPedantic>, DefaultIgnore;
def ext_named_variadic_macro : Extension<
"named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
-def err_embedded_directive : Error<
- "embedding a #%0 directive within macro arguments is not supported">;
+def err_embedded_directive : Error<"embedding a %select{#|C++ }0%1 directive "
+ "within macro arguments is not supported">;
def ext_embedded_directive : Extension<
"embedding a directive within macro arguments has undefined behavior">,
InGroup<DiagGroup<"embedded-directive">>;
@@ -998,6 +998,21 @@ def warn_module_conflict : Warning<
InGroup<ModuleConflict>;
// C++20 modules
+def err_pp_module_name_is_macro : Error<
+ "%select{module|partition}0 name component %1 cannot be a object-like macro">;
+def err_pp_module_expected_ident : Error<
+ "expected %select{identifier after '.' in |}0module name">;
+def err_pp_unexpected_tok_after_module_name : Error<
+ "unexpected preprocessing token '%0' after module name, "
+ "only ';' and '[' (start of attribute specifier sequence) are allowed">;
+def warn_pp_extra_tokens_at_module_directive_eol
+ : Warning<"extra tokens after semicolon in '%0' directive">,
+ InGroup<ExtraTokens>;
+def err_pp_module_decl_in_header
+ : Error<"module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl
+ : Error<"module directive lines are not allowed on lines controlled "
+ "by preprocessor conditionals">;
def err_header_import_semi_in_macro : Error<
"semicolon terminating header import declaration cannot be produced "
"by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 63fc4bf2e1505..457d3644de35a 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1779,10 +1779,8 @@ def ext_bit_int : Extension<
} // end of Parse Issue category.
let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
- "module declaration can only appear at the top level">;
-def err_module_expected_ident : Error<
- "expected a module name after '%select{module|import}0'">;
+def err_unexpected_module_or_import_decl : Error<
+ "%select{module|import}0 declaration can only appear at the top level">;
def err_attribute_not_module_attr : Error<
"%0 attribute cannot be applied to a module">;
def err_keyword_not_module_attr : Error<
@@ -1793,6 +1791,10 @@ def err_keyword_not_import_attr : Error<
"%0 cannot be applied to a module import">;
def err_module_expected_semi : Error<
"expected ';' after module name">;
+def err_expected_semi_after_module_or_import
+ : Error<"%0 directive must end with a ';'">;
+def note_module_declared_here : Note<
+ "%select{module|import}0 directive defined here">;
def err_global_module_introducer_not_at_start : Error<
"'module;' introducing a global module fragment can appear only "
"at the start of the translation unit">;
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 043c184323876..1131727ed23ee 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -231,6 +231,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
LLVM_PREFERRED_TYPE(bool)
unsigned IsModulesImport : 1;
+ // True if this is the 'module' contextual keyword.
+ LLVM_PREFERRED_TYPE(bool)
+ unsigned IsModulesDecl : 1;
+
// True if this is a mangled OpenMP variant name.
LLVM_PREFERRED_TYPE(bool)
unsigned IsMangledOpenMPVariantName : 1;
@@ -267,8 +271,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
- IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
- IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+ IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+ IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+ IsKeywordInCpp(false) {}
public:
IdentifierInfo(const IdentifierInfo &) = delete;
@@ -569,12 +574,24 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
}
/// Determine whether this is the contextual keyword \c import.
- bool isModulesImport() const { return IsModulesImport; }
+ bool isImportKeyword() const { return IsModulesImport; }
/// Set whether this identifier is the contextual keyword \c import.
- void setModulesImport(bool I) {
- IsModulesImport = I;
- if (I)
+ void setKeywordImport(bool Val) {
+ IsModulesImport = Val;
+ if (Val)
+ NeedsHandleIdentifier = true;
+ else
+ RecomputeNeedsHandleIdentifier();
+ }
+
+ /// Determine whether this is the contextual keyword \c module.
+ bool isModuleKeyword() const { return IsModulesDecl; }
+
+ /// Set whether this identifier is the contextual keyword \c module.
+ void setModuleKeyword(bool Val) {
+ IsModulesDecl = Val;
+ if (Val)
NeedsHandleIdentifier = true;
else
RecomputeNeedsHandleIdentifier();
@@ -629,7 +646,7 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
void RecomputeNeedsHandleIdentifier() {
NeedsHandleIdentifier = isPoisoned() || hasMacroDefinition() ||
isExtensionToken() || isFutureCompatKeyword() ||
- isOutOfDate() || isModulesImport();
+ isOutOfDate() || isImportKeyword();
}
};
@@ -797,10 +814,11 @@ class IdentifierTable {
// contents.
II->Entry = &Entry;
- // If this is the 'import' contextual keyword, mark it as such.
+ // If this is the 'import' or 'module' contextual keyword, mark it as such.
if (Name == "import")
- II->setModulesImport(true);
-
+ II->setKeywordImport(true);
+ else if (Name == "module")
+ II->setModuleKeyword(true);
return *II;
}
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 3d955095b07a8..a3d286fdb81a7 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,11 @@ PPKEYWORD(pragma)
// C23 & C++26 #embed
PPKEYWORD(embed)
+// C++20 Module Directive
+PPKEYWORD(module)
+PPKEYWORD(__preprocessed_module)
+PPKEYWORD(__preprocessed_import)
+
// GNU Extensions.
PPKEYWORD(import)
PPKEYWORD(include_next)
@@ -1030,6 +1035,9 @@ ANNOTATION(module_include)
ANNOTATION(module_begin)
ANNOTATION(module_end)
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
// Annotation for a header_name token that has been looked up and transformed
// into the name of a header unit.
ANNOTATION(header_unit)
diff --git a/clang/include/clang/Basic/TokenKinds.h b/clang/include/clang/Basic/TokenKinds.h
index a801113c57715..c0316257d9d97 100644
--- a/clang/include/clang/Basic/TokenKinds.h
+++ b/clang/include/clang/Basic/TokenKinds.h
@@ -76,6 +76,10 @@ const char *getPunctuatorSpelling(TokenKind Kind) LLVM_READNONE;
/// tokens like 'int' and 'dynamic_cast'. Returns NULL for other token kinds.
const char *getKeywordSpelling(TokenKind Kind) LLVM_READNONE;
+/// Determines the spelling of simple Objective-C keyword tokens like '@import'.
+/// Returns NULL for other token kinds.
+const char *getObjCKeywordSpelling(ObjCKeywordKind Kind) LLVM_READNONE;
+
/// Returns the spelling of preprocessor keywords, such as "else".
const char *getPPKeywordSpelling(PPKeywordKind Kind) LLVM_READNONE;
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index f56da69a05caf..a3a4c7e55b72b 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -905,7 +905,7 @@ class CompilerInstance : public ModuleLoader {
/// load it.
ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
SourceLocation ImportLoc,
- SourceLocation ModuleNameLoc,
+ SourceRange ModuleNameRange,
bool IsInclusionDirective);
/// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
#ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
#define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
#include "llvm/ADT/StringRef.h"
namespace clang {
class IdentifierInfo;
class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
/// Callback handler that receives notifications when performing code
/// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
/// file where we expect natural language, e.g., a comment, string, or
/// \#error directive.
virtual void CodeCompleteNaturalLanguage() { }
+
+ /// Callback invoked when performing code completion inside the module name
+ /// part of an import directive.
+ virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+ ModuleIdPath Path) {}
};
}
diff --git a/clang/include/clang/Lex/DependencyDirectivesScanner.h b/clang/include/clang/Lex/DependencyDirectivesScanner.h
index f9fec3998ca53..b21da166a96e5 100644
--- a/clang/include/clang/Lex/DependencyDirectivesScanner.h
+++ b/clang/include/clang/Lex/DependencyDirectivesScanner.h
@@ -135,6 +135,22 @@ void printDependencyDirectivesAsSource(
ArrayRef<dependency_directives_scan::Directive> Directives,
llvm::raw_ostream &OS);
+/// Scan an input source buffer for C++20 named module usage.
+///
+/// \param Source The input source buffer.
+///
+/// \returns true if any C++20 named modules related directive was found.
+bool scanInputForCXX20ModulesUsage(StringRef Source);
+
+/// Scan an input source buffer, and check whether the input source is a
+/// preprocessed output.
+///
+/// \param Source The input source buffer.
+///
+/// \returns true if any '__preprocessed_module' or '__preprocessed_import'
+/// directive was found.
+bool isPreprocessedModuleFile(StringRef Source);
+
/// Functor that returns the dependency directives for a given file.
class DependencyDirectivesGetter {
public:
diff --git a/clang/include/clang/Lex/ModuleLoader.h b/clang/include/clang/Lex/ModuleLoader.h
index a58407200c41c..042a5ab1f4a57 100644
--- a/clang/include/clang/Lex/ModuleLoader.h
+++ b/clang/include/clang/Lex/ModuleLoader.h
@@ -159,6 +159,7 @@ class ModuleLoader {
/// \returns Returns true if any modules with that symbol found.
virtual bool lookupMissingImports(StringRef Name,
SourceLocation TriggerLoc) = 0;
+ static std::string getFlatNameFromPath(ModuleIdPath Path);
bool HadFatalFailure = false;
};
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index b1c648e647f41..5adc45a19ca79 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
#include <cassert>
#include <cstddef>
#include <cstdint>
@@ -136,6 +137,64 @@ struct CXXStandardLibraryVersionInfo {
std::uint64_t Version;
};
+/// Record the previous 'export' keyword info.
+///
+/// Since P1857R3, the standard introduced several rules to determine whether
+/// the 'module', 'export module', 'import', 'export import' is a valid
+/// directive introducer. This class is used to record the previous 'export'
+/// keyword token, and then handle 'export module' and 'export import'.
+class ExportContextualKeywordInfo {
+ Token ExportTok;
+ bool AtPhysicalStartOfLine = false;
+
+public:
+ ExportContextualKeywordInfo() = default;
+ ExportContextualKeywordInfo(const Token &Tok, bool AtPhysicalStartOfLine)
+ : ExportTok(Tok), AtPhysicalStartOfLine(AtPhysicalStartOfLine) {}
+
+ bool isValid() const { return ExportTok.is(tok::kw_export); }
+ bool isAtPhysicalStartOfLine() const { return AtPhysicalStartOfLine; }
+ Token getExportTok() const { return ExportTok; }
+ void reset() {
+ ExportTok.startToken();
+ AtPhysicalStartOfLine = false;
+ }
+};
+
+class ModuleNameLoc final
+ : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+ friend TrailingObjects;
+ unsigned NumIdentifierLocs;
+ unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+ return getNumIdentifierLocs();
+ }
+
+ ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+ (void)llvm::copy(Path, getTrailingObjectsNonStrict<IdentifierLoc>());
+ }
+
+public:
+ static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+ unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+ ModuleIdPath getModuleIdPath() const {
+ return {getTrailingObjectsNonStrict<IdentifierLoc>(),
+ getNumIdentifierLocs()};
+ }
+
+ SourceLocation getBeginLoc() const {
+ return getModuleIdPath().front().getLoc();
+ }
+ SourceLocation getEndLoc() const {
+ auto &Last = getModuleIdPath().back();
+ return Last.getLoc().getLocWithOffset(
+ Last.getIdentifierInfo()->getLength());
+ }
+ SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+ std::string str() const {
+ return ModuleLoader::getFlatNameFromPath(getModuleIdPath());
+ }
+};
+
/// Engages in a tight little dance with the lexer to efficiently
/// preprocess tokens.
///
@@ -339,8 +398,9 @@ class Preprocessor {
/// lexed, if any.
SourceLocation ModuleImportLoc;
- /// The import path for named module that we're currently processing.
- SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+ /// The source location of the \c module contextual keyword we just
+ /// lexed, if any.
+ SourceLocation ModuleDeclLoc;
llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
unsigned CheckPointCounter = 0;
@@ -351,6 +411,12 @@ class Preprocessor {
/// Whether the last token we lexed was an '@'.
bool LastTokenWasAt = false;
+ /// Whether we're importing a standard C++20 named Modules.
+ bool ImportingCXXNamedModules = false;
+
+ /// Whether the last token we lexed was an 'export' keyword.
+ ExportContextualKeywordInfo LastTokenWasExportKeyword;
+
/// First pp-token source location in current translation unit.
SourceLocation FirstPPTokenLoc;
@@ -562,9 +628,9 @@ class Preprocessor {
reset();
}
- void handleIdentifier(IdentifierInfo *Identifier) {
- if (isModuleCandidate() && Identifier)
- Name += Identifier->getName().str();
+ void handleModuleName(ModuleNameLoc *NameLoc) {
+ if (isModuleCandidate() && NameLoc)
+ Name += NameLoc->str();
else if (!isNamedModule())
reset();
}
@@ -576,13 +642,6 @@ class Preprocessor {
reset();
}
- void handlePeriod() {
- if (isModuleCandidate())
- Name += ".";
- else if (!isNamedModule())
- reset();
- }
-
void handleSemi() {
if (!Name.empty() && isModuleCandidate()) {
if (State == InterfaceCandidate)
@@ -639,10 +698,6 @@ class Preprocessor {
ModuleDeclSeq ModuleDeclState;
- /// Whether the module import expects an identifier next. Otherwise,
- /// it expects a '.' or ';'.
- bool ModuleImportExpectsIdentifier = false;
-
/// The identifier and source location of the currently-active
/// \#pragma clang arc_cf_code_audited begin.
IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -776,6 +831,12 @@ class Preprocessor {
/// Only one of CurLexer, or CurTokenLexer will be non-null.
std::unique_ptr<Lexer> CurLexer;
+ /// Lexers that are pending destruction, deferred until the current
+ /// Stack of Lexer unwinds completely (LexLevel returns to 0).
+ /// This avoids use-after-free when HandleEndOfFile is called from
+ /// within a Lexer method that still needs to access its members.
+ SmallVector<std::unique_ptr<Lexer>, 2> PendingDestroyLexers;
+
/// The current top of the stack that we're lexing from
/// if not expanding a macro.
///
@@ -1125,6 +1186,9 @@ class Preprocessor {
/// Whether tokens are being skipped until the through header is seen.
bool SkippingUntilPCHThroughHeader = false;
+ /// Whether the main file is preprocessed module file.
+ bool MainFileIsPreprocessedModuleFile = false;
+
/// \{
/// Cache of macro expanders to reduce malloc traffic.
enum { TokenLexerCacheSize = 8 };
@@ -1778,6 +1842,36 @@ class Preprocessor {
std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
bool ForHasEmbed);
+ /// Whether the main file is preprocessed module file.
+ bool isPreprocessedModuleFile() const {
+ return MainFileIsPreprocessedModuleFile;
+ }
+
+ /// Mark the main file as a preprocessed module file, then the 'module' and
+ /// 'import' directive recognition will be suppressed. Only
+ /// '__preprocessed_moduke' and '__preprocessed_import' are allowed.
+ void markMainFileAsPreprocessedModuleFile() {
+ MainFileIsPreprocessedModuleFile = true;
+ }
+
+ bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+ SmallVectorImpl<Token> &Suffix,
+ SmallVectorImpl<IdentifierLoc> &Path,
+ bool AllowMacroExpansion = true,
+ bool IsPartition = false);
+ void EnterModuleSuffixTokenStream(ArrayRef<Token> Toks);
+ void HandleCXXImportDirective(Token Import);
+ void HandleCXXModuleDirective(Token Module);
+
+ /// Callback invoked when the lexer sees one of export, import or module token
+ /// at the start of a line.
+ ///
+ /// This consumes the import/module directive, modifies the
+ /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+ /// read is the correct one.
+ bool HandleModuleContextualKeyword(Token &Result,
+ bool TokAtPhysicalStartOfLine);
+
/// Get the start location of the first pp-token in main file.
SourceLocation getMainFileFirstPPTokenLoc() const {
assert(FirstPPTokenLoc.isValid() &&
@@ -1786,7 +1880,10 @@ class Preprocessor {
}
bool LexAfterModuleImport(Token &Result);
- void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
+ void CollectPPImportSuffix(SmallVectorImpl<Token> &Toks,
+ bool StopUntilEOD = false);
+ bool CollectPPImportSuffixAndEnterStream(SmallVectorImpl<Token> &Toks,
+ bool StopUntilEOD = false);
void makeModuleVisible(Module *M, SourceLocation Loc,
bool IncludeExports = true);
@@ -2308,45 +2405,22 @@ class Preprocessor {
}
}
- /// Check whether the next pp-token is one of the specificed token kind. this
- /// method should have no observable side-effect on the lexed tokens.
- template <typename... Ts> bool isNextPPTokenOneOf(Ts... Ks) {
+ /// isNextPPTokenOneOf - Check whether the next pp-token is one of the
+ /// specificed token kind. this method should have no observable side-effect
+ /// on the lexed tokens.
+ template <typename... Ts> bool isNextPPTokenOneOf(Ts... Ks) const {
static_assert(sizeof...(Ts) > 0,
"requires at least one tok::TokenKind specified");
- // Do some quick tests for rejection cases.
- std::optional<Token> Val;
- if (CurLexer)
- Val = CurLexer->peekNextPPToken();
- else
- Val = CurTokenLexer->peekNextPPToken();
-
- if (!Val) {
- // We have run off the end. If it's a source file we don't
- // examine enclosing ones (C99 5.1.1.2p4). Otherwise walk up the
- // macro stack.
- if (CurPPLexer)
- return false;
- for (const IncludeStackInfo &Entry : llvm::reverse(IncludeMacroStack)) {
- if (Entry.TheLexer)
- Val = Entry.TheLexer->peekNextPPToken();
- else
- Val = Entry.TheTokenLexer->peekNextPPToken();
-
- if (Val)
- break;
-
- // Ran off the end of a source file?
- if (Entry.ThePPLexer)
- return false;
- }
- }
-
- // Okay, we found the token and return. Otherwise we found the end of the
- // translation unit.
- return Val->isOneOf(Ks...);
+ auto NextTokOpt = peekNextPPToken();
+ return NextTokOpt.has_value() ? NextTokOpt->is(Ks...) : false;
}
private:
+ /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+ /// buffer controlled by this lexer, otherwise return the next unexpanded
+ /// token.
+ std::optional<Token> peekNextPPToken() const;
+
/// Identifiers used for SEH handling in Borland. These are only
/// allowed in particular circumstances
// __except block
@@ -2402,20 +2476,27 @@ class Preprocessor {
/// If \p EnableMacros is true, then we consider macros that expand to zero
/// tokens as being ok.
///
+ /// If \p ExtraToks not null, the extra tokens will be saved in this
+ /// container.
+ ///
/// \return The location of the end of the directive (the terminating
/// newline).
- SourceLocation CheckEndOfDirective(const char *DirType,
- bool EnableMacros = false);
+ SourceLocation
+ CheckEndOfDirective(StringRef DirType, bool EnableMacros = false,
+ SmallVectorImpl<Token> *ExtraToks = nullptr);
/// Read and discard all tokens remaining on the current line until
/// the tok::eod token is found. Returns the range of the skipped tokens.
- SourceRange DiscardUntilEndOfDirective() {
+ SourceRange
+ DiscardUntilEndOfDirective(SmallVectorImpl<Token> *DiscardedToks = nullptr) {
Token Tmp;
- return DiscardUntilEndOfDirective(Tmp);
+ return DiscardUntilEndOfDirective(Tmp, DiscardedToks);
}
/// Same as above except retains the token that was found.
- SourceRange DiscardUntilEndOfDirective(Token &Tok);
+ SourceRange
+ DiscardUntilEndOfDirective(Token &Tok,
+ SmallVectorImpl<Token> *DiscardedToks = nullptr);
/// Returns true if the preprocessor has seen a use of
/// __DATE__ or __TIME__ in the file so far.
@@ -2486,11 +2567,10 @@ class Preprocessor {
}
/// If we're importing a standard C++20 Named Modules.
- bool isInImportingCXXNamedModules() const {
- // NamedModuleImportPath will be non-empty only if we're importing
- // Standard C++ named modules.
- return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
- !IsAtImport;
+ bool isImportingCXXNamedModules() const {
+ assert(getLangOpts().CPlusPlusModules &&
+ "Import C++ named modules are only valid for C++20 modules");
+ return ImportingCXXNamedModules;
}
/// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2558,6 +2638,8 @@ class Preprocessor {
}
void PopIncludeMacroStack() {
+ if (CurLexer)
+ PendingDestroyLexers.push_back(std::move(CurLexer));
CurLexer = std::move(IncludeMacroStack.back().TheLexer);
CurPPLexer = IncludeMacroStack.back().ThePPLexer;
CurTokenLexer = std::move(IncludeMacroStack.back().TheTokenLexer);
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 43091a6f3a8c6..d09e951908129 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -297,6 +297,10 @@ class Token {
/// Return the ObjC keyword kind.
tok::ObjCKeywordKind getObjCKeywordID() const;
+ /// Return true if we have a C++20 modules contextual keyword(export, import
+ /// or module).
+ bool isModuleContextualKeyword(bool AllowExport = true) const;
+
bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
/// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 0456dd961fc30..0c0c574267364 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -100,6 +100,10 @@ class TokenLexer {
/// See the flag documentation for details.
bool IsReinject : 1;
+ /// This is true if this TokenLexer is created when handling a C++ module
+ /// directive.
+ bool LexingCXXModuleDirective : 1;
+
public:
/// Create a TokenLexer for the specified macro with the specified actual
/// arguments. Note that this ctor takes ownership of the ActualArgs pointer.
@@ -151,6 +155,14 @@ class TokenLexer {
/// preprocessor directive.
bool isParsingPreprocessorDirective() const;
+ /// setLexingCXXModuleDirective - This is set to true if this TokenLexer is
+ /// created when handling a C++ module directive.
+ void setLexingCXXModuleDirective(bool Val = true);
+
+ /// isLexingCXXModuleDirective - Return true if we are lexing a C++ module or
+ /// import directive.
+ bool isLexingCXXModuleDirective() const;
+
private:
void destroy();
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index f7e7b0ec51d80..cd7dc14701914 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -566,10 +566,6 @@ class Parser : public CodeCompletionHandler {
/// Contextual keywords for Microsoft extensions.
IdentifierInfo *Ident__except;
- // C++2a contextual keywords.
- mutable IdentifierInfo *Ident_import;
- mutable IdentifierInfo *Ident_module;
-
std::unique_ptr<CommentHandler> CommentSemaHandler;
/// Gets set to true after calling ProduceSignatureHelp, it is for a
@@ -1081,6 +1077,9 @@ class Parser : public CodeCompletionHandler {
bool ParseModuleName(SourceLocation UseLoc,
SmallVectorImpl<IdentifierLoc> &Path, bool IsImport);
+ void DiagnoseInvalidCXXModuleDecl(const Sema::ModuleImportState &ImportState);
+ void DiagnoseInvalidCXXModuleImport();
+
//===--------------------------------------------------------------------===//
// Preprocessor code-completion pass-through
void CodeCompleteDirective(bool InConditional) override;
@@ -1091,6 +1090,8 @@ class Parser : public CodeCompletionHandler {
unsigned ArgumentIndex) override;
void CodeCompleteIncludedFile(llvm::StringRef Dir, bool IsAngled) override;
void CodeCompleteNaturalLanguage() override;
+ void CodeCompleteModuleImport(SourceLocation ImportLoc,
+ ModuleIdPath Path) override;
///@}
diff --git a/clang/lib/Basic/IdentifierTable.cpp b/clang/lib/Basic/IdentifierTable.cpp
index 9b4019834c4be..7f96777fbd4cb 100644
--- a/clang/lib/Basic/IdentifierTable.cpp
+++ b/clang/lib/Basic/IdentifierTable.cpp
@@ -298,8 +298,11 @@ void IdentifierTable::AddKeywords(const LangOptions &LangOpts) {
if (LangOpts.IEEE128)
AddKeyword("__ieee128", tok::kw___float128, KEYALL, LangOpts, *this);
- // Add the 'import' contextual keyword.
- get("import").setModulesImport(true);
+ // Add the 'import' and 'module' contextual keywords.
+ get("import").setKeywordImport(true);
+ get("module").setModuleKeyword(true);
+ get("__preprocessed_import").setKeywordImport(true);
+ get("__preprocessed_module").setModuleKeyword(true);
}
/// Checks if the specified token kind represents a keyword in the
@@ -413,6 +416,13 @@ tok::PPKeywordKind IdentifierInfo::getPPKeywordID() const {
unsigned Len = getLength();
if (Len < 2) return tok::pp_not_keyword;
const char *Name = getNameStart();
+
+ if (Name[0] == '_' && isImportKeyword())
+ return tok::pp___preprocessed_import;
+ if (Name[0] == '_' && isModuleKeyword())
+ return tok::pp___preprocessed_module;
+
+ // clang-format off
switch (HASH(Len, Name[0], Name[2])) {
default: return tok::pp_not_keyword;
CASE( 2, 'i', '\0', if);
@@ -431,6 +441,7 @@ tok::PPKeywordKind IdentifierInfo::getPPKeywordID() const {
CASE( 6, 'd', 'f', define);
CASE( 6, 'i', 'n', ifndef);
CASE( 6, 'i', 'p', import);
+ CASE( 6, 'm', 'd', module);
CASE( 6, 'p', 'a', pragma);
CASE( 7, 'd', 'f', defined);
@@ -450,6 +461,7 @@ tok::PPKeywordKind IdentifierInfo::getPPKeywordID() const {
#undef CASE
#undef HASH
}
+ // clang-format on
}
//===----------------------------------------------------------------------===//
diff --git a/clang/lib/Basic/TokenKinds.cpp b/clang/lib/Basic/TokenKinds.cpp
index c300175ce90ba..a5b8c998d9b8e 100644
--- a/clang/lib/Basic/TokenKinds.cpp
+++ b/clang/lib/Basic/TokenKinds.cpp
@@ -46,6 +46,18 @@ const char *tok::getKeywordSpelling(TokenKind Kind) {
return nullptr;
}
+const char *tok::getObjCKeywordSpelling(ObjCKeywordKind Kind) {
+ switch (Kind) {
+#define OBJC_AT_KEYWORD(X) \
+ case objc_##X: \
+ return "@" #X;
+#include "clang/Basic/TokenKinds.def"
+ default:
+ break;
+ }
+ return nullptr;
+}
+
const char *tok::getPPKeywordSpelling(tok::PPKeywordKind Kind) {
switch (Kind) {
#define PPKEYWORD(x) case tok::pp_##x: return #x;
diff --git a/clang/lib/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/DependencyScanning/ModuleDepCollector.cpp
index 70c94bca10275..bbdcd7d8e2b44 100644
--- a/clang/lib/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/DependencyScanning/ModuleDepCollector.cpp
@@ -565,7 +565,8 @@ void ModuleDepCollectorPP::InclusionDirective(
void ModuleDepCollectorPP::moduleImport(SourceLocation ImportLoc,
ModuleIdPath Path,
const Module *Imported) {
- if (MDC.ScanInstance.getPreprocessor().isInImportingCXXNamedModules()) {
+ auto &PP = MDC.ScanInstance.getPreprocessor();
+ if (PP.getLangOpts().CPlusPlusModules && PP.isImportingCXXNamedModules()) {
P1689ModuleInfo RequiredModule;
RequiredModule.ModuleName = Path[0].getIdentifierInfo()->getName().str();
RequiredModule.Type = P1689ModuleInfo::ModuleType::NamedCXXModule;
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp
index e52f237fc5df2..bb5230b4d22f6 100644
--- a/clang/lib/Frontend/CompilerInstance.cpp
+++ b/clang/lib/Frontend/CompilerInstance.cpp
@@ -1762,8 +1762,8 @@ static ModuleSource selectModuleSource(
}
ModuleLoadResult CompilerInstance::findOrCompileModuleAndReadAST(
- StringRef ModuleName, SourceLocation ImportLoc,
- SourceLocation ModuleNameLoc, bool IsInclusionDirective) {
+ StringRef ModuleName, SourceLocation ImportLoc, SourceRange ModuleNameRange,
+ bool IsInclusionDirective) {
// Search for a module with the given name.
HeaderSearch &HS = PP->getHeaderSearchInfo();
Module *M =
@@ -1780,10 +1780,11 @@ ModuleLoadResult CompilerInstance::findOrCompileModuleAndReadAST(
std::string ModuleFilename;
ModuleSource Source =
selectModuleSource(M, ModuleName, ModuleFilename, BuiltModules, HS);
+ SourceLocation ModuleNameLoc = ModuleNameRange.getBegin();
if (Source == MS_ModuleNotFound) {
// We can't find a module, error out here.
getDiagnostics().Report(ModuleNameLoc, diag::err_module_not_found)
- << ModuleName << SourceRange(ImportLoc, ModuleNameLoc);
+ << ModuleName << ModuleNameRange;
return nullptr;
}
if (ModuleFilename.empty()) {
@@ -1969,8 +1970,11 @@ CompilerInstance::loadModule(SourceLocation ImportLoc,
MM.cacheModuleLoad(*Path[0].getIdentifierInfo(), Module);
} else {
+ SourceLocation ModuleNameEndLoc = Path.back().getLoc().getLocWithOffset(
+ Path.back().getIdentifierInfo()->getLength());
ModuleLoadResult Result = findOrCompileModuleAndReadAST(
- ModuleName, ImportLoc, ModuleNameLoc, IsInclusionDirective);
+ ModuleName, ImportLoc, SourceRange{ModuleNameLoc, ModuleNameEndLoc},
+ IsInclusionDirective);
if (!Result.isNormal())
return Result;
if (!Result)
diff --git a/clang/lib/Frontend/InitPreprocessor.cpp b/clang/lib/Frontend/InitPreprocessor.cpp
index 8253fad9e5503..18c694579abdf 100644
--- a/clang/lib/Frontend/InitPreprocessor.cpp
+++ b/clang/lib/Frontend/InitPreprocessor.cpp
@@ -1641,5 +1641,12 @@ void clang::InitializePreprocessor(Preprocessor &PP,
if (FEOpts.DashX.isPreprocessed()) {
PP.getDiagnostics().setSeverity(diag::ext_pp_gnu_line_directive,
diag::Severity::Ignored, SourceLocation());
+
+ // Compiling with -xc++-cpp-output should suppress module directive
+ // recognition. __preprocessed_module can either get the directive treatment
+ // or be accepted directly by phase 7 in a module declaration. In the latter
+ // case, __preprocessed_module will work even if there are preprocessing
+ // tokens on the same line that precede it.
+ PP.markMainFileAsPreprocessedModuleFile();
}
}
diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
index 9e046633328d7..0dc8a86e604d3 100644
--- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp
+++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
@@ -245,6 +245,8 @@ class PrintPPOutputPPCallbacks : public PPCallbacks {
unsigned GetNumToksToSkip() const { return NumToksToSkip; }
void ResetSkipToks() { NumToksToSkip = 0; }
+
+ const Token &GetPrevToken() const { return PrevTok; }
};
} // end anonymous namespace
@@ -758,7 +760,8 @@ void PrintPPOutputPPCallbacks::HandleWhitespaceBeforeTok(const Token &Tok,
if (Tok.is(tok::eof) ||
(Tok.isAnnotation() && !Tok.is(tok::annot_header_unit) &&
!Tok.is(tok::annot_module_begin) && !Tok.is(tok::annot_module_end) &&
- !Tok.is(tok::annot_repl_input_end) && !Tok.is(tok::annot_embed)))
+ !Tok.is(tok::annot_repl_input_end) && !Tok.is(tok::annot_embed) &&
+ !Tok.is(tok::annot_module_name)))
return;
// EmittedDirectiveOnThisLine takes priority over RequireSameLine.
@@ -893,6 +896,7 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok,
!PP.getCommentRetentionState();
bool IsStartOfLine = false;
+ bool IsCXXModuleDirective = false;
char Buffer[256];
while (true) {
// Two lines joined with line continuation ('\' as last character on the
@@ -978,11 +982,38 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok,
*Callbacks->OS << static_cast<int>(Byte);
PrintComma = true;
}
+ } else if (Tok.is(tok::annot_module_name)) {
+ auto *NameLoc = static_cast<ModuleNameLoc *>(Tok.getAnnotationValue());
+ *Callbacks->OS << NameLoc->str();
} else if (Tok.isAnnotation()) {
// Ignore annotation tokens created by pragmas - the pragmas themselves
// will be reproduced in the preprocessed output.
PP.Lex(Tok);
continue;
+ } else if (PP.getLangOpts().CPlusPlusModules && Tok.is(tok::kw_import) &&
+ !Callbacks->GetPrevToken().is(tok::at)) {
+ assert(!IsCXXModuleDirective && "Is an import directive being printed?");
+ IsCXXModuleDirective = true;
+ IsStartOfLine = false;
+ *Callbacks->OS << tok::getPPKeywordSpelling(
+ tok::pp___preprocessed_import);
+ PP.Lex(Tok);
+ continue;
+ } else if (PP.getLangOpts().CPlusPlusModules && Tok.is(tok::kw_module)) {
+ assert(!IsCXXModuleDirective && "Is an module directive being printed?");
+ IsCXXModuleDirective = true;
+ IsStartOfLine = false;
+ *Callbacks->OS << tok::getPPKeywordSpelling(
+ tok::pp___preprocessed_module);
+ PP.Lex(Tok);
+ continue;
+ } else if (PP.getLangOpts().CPlusPlusModules && IsCXXModuleDirective &&
+ Tok.is(tok::semi)) {
+ IsCXXModuleDirective = false;
+ IsStartOfLine = true;
+ *Callbacks->OS << ';';
+ PP.Lex(Tok);
+ continue;
} else if (IdentifierInfo *II = Tok.getIdentifierInfo()) {
*Callbacks->OS << II->getName();
} else if (Tok.isLiteral() && !Tok.needsCleaning() &&
diff --git a/clang/lib/Lex/DependencyDirectivesScanner.cpp b/clang/lib/Lex/DependencyDirectivesScanner.cpp
index fb0c183261474..8320b3ddbca31 100644
--- a/clang/lib/Lex/DependencyDirectivesScanner.cpp
+++ b/clang/lib/Lex/DependencyDirectivesScanner.cpp
@@ -83,6 +83,9 @@ struct Scanner {
/// \returns True on error.
bool scan(SmallVectorImpl<Directive> &Directives);
+ friend bool clang::scanInputForCXX20ModulesUsage(StringRef Source);
+ friend bool clang::isPreprocessedModuleFile(StringRef Source);
+
private:
/// Lexes next token and advances \p First and the \p Lexer.
[[nodiscard]] dependency_directives_scan::Token &
@@ -172,6 +175,7 @@ struct Scanner {
/// true at the end.
bool reportError(const char *CurPtr, unsigned Err);
+ bool ScanningPreprocessedModuleFile = false;
StringMap<char> SplitIds;
StringRef Input;
SmallVectorImpl<dependency_directives_scan::Token> &Tokens;
@@ -542,6 +546,12 @@ static void skipWhitespace(const char *&First, const char *const End) {
bool Scanner::lexModuleDirectiveBody(DirectiveKind Kind, const char *&First,
const char *const End) {
+ assert(Kind == DirectiveKind::cxx_export_import_decl ||
+ Kind == DirectiveKind::cxx_export_module_decl ||
+ Kind == DirectiveKind::cxx_import_decl ||
+ Kind == DirectiveKind::cxx_module_decl ||
+ Kind == DirectiveKind::decl_at_import);
+
const char *DirectiveLoc = Input.data() + CurDirToks.front().Offset;
for (;;) {
// Keep a copy of the First char incase it needs to be reset.
@@ -553,7 +563,7 @@ bool Scanner::lexModuleDirectiveBody(DirectiveKind Kind, const char *&First,
First = Previous;
return false;
}
- if (Tok.is(tok::eof))
+ if (Tok.isOneOf(tok::eof, tok::eod))
return reportError(
DirectiveLoc,
diag::err_dep_source_scanner_missing_semi_after_at_import);
@@ -561,12 +571,25 @@ bool Scanner::lexModuleDirectiveBody(DirectiveKind Kind, const char *&First,
break;
}
- const auto &Tok = lexToken(First, End);
+ bool IsCXXModules = Kind == DirectiveKind::cxx_export_import_decl ||
+ Kind == DirectiveKind::cxx_export_module_decl ||
+ Kind == DirectiveKind::cxx_import_decl ||
+ Kind == DirectiveKind::cxx_module_decl;
+ if (IsCXXModules) {
+ lexPPDirectiveBody(First, End);
+ pushDirective(Kind);
+ return false;
+ }
+
pushDirective(Kind);
- if (Tok.is(tok::eof) || Tok.is(tok::eod))
+ skipWhitespace(First, End);
+ if (First == End)
return false;
- return reportError(DirectiveLoc,
- diag::err_dep_source_scanner_unexpected_tokens_at_import);
+ if (!isVerticalWhitespace(*First))
+ return reportError(
+ DirectiveLoc, diag::err_dep_source_scanner_unexpected_tokens_at_import);
+ skipNewline(First, End);
+ return false;
}
dependency_directives_scan::Token &Scanner::lexToken(const char *&First,
@@ -703,7 +726,12 @@ bool Scanner::lexModule(const char *&First, const char *const End) {
Id = *NextId;
}
- if (Id != "module" && Id != "import") {
+ StringRef Module =
+ ScanningPreprocessedModuleFile ? "__preprocessed_module" : "module";
+ StringRef Import =
+ ScanningPreprocessedModuleFile ? "__preprocessed_import" : "import";
+
+ if (Id != Module && Id != Import) {
skipLine(First, End);
return false;
}
@@ -716,7 +744,7 @@ bool Scanner::lexModule(const char *&First, const char *const End) {
switch (*First) {
case ':': {
// `module :` is never the start of a valid module declaration.
- if (Id == "module") {
+ if (Id == Module) {
skipLine(First, End);
return false;
}
@@ -735,7 +763,7 @@ bool Scanner::lexModule(const char *&First, const char *const End) {
}
case ';': {
// Handle the global module fragment `module;`.
- if (Id == "module" && !Export)
+ if (Id == Module && !Export)
break;
skipLine(First, End);
return false;
@@ -753,7 +781,7 @@ bool Scanner::lexModule(const char *&First, const char *const End) {
TheLexer.seek(getOffsetAt(First), /*IsAtStartOfLine*/ false);
DirectiveKind Kind;
- if (Id == "module")
+ if (Id == Module)
Kind = Export ? cxx_export_module_decl : cxx_module_decl;
else
Kind = Export ? cxx_export_import_decl : cxx_import_decl;
@@ -886,6 +914,19 @@ static bool isStartOfRelevantLine(char First) {
return false;
}
+static inline bool isStartWithPreprocessedModuleDirective(const char *First,
+ const char *End) {
+ assert(First <= End);
+ if (*First == '_') {
+ StringRef Str(First, End - First);
+ return Str.starts_with(
+ tok::getPPKeywordSpelling(tok::pp___preprocessed_module)) ||
+ Str.starts_with(
+ tok::getPPKeywordSpelling(tok::pp___preprocessed_import));
+ }
+ return false;
+}
+
bool Scanner::lexPPLine(const char *&First, const char *const End) {
assert(First != End);
@@ -910,7 +951,13 @@ bool Scanner::lexPPLine(const char *&First, const char *const End) {
CurDirToks.clear();
});
- if (*First == '_') {
+ // FIXME: Shoule we handle @import as a preprocessing directive?
+ if (*First == '@')
+ return lexAt(First, End);
+
+ bool IsPreprocessedModule =
+ isStartWithPreprocessedModuleDirective(First, End);
+ if (*First == '_' && !IsPreprocessedModule) {
if (isNextIdentifierOrSkipLine("_Pragma", First, End))
return lex_Pragma(First, End);
return false;
@@ -922,12 +969,8 @@ bool Scanner::lexPPLine(const char *&First, const char *const End) {
llvm::scope_exit ScEx2(
[&]() { TheLexer.setParsingPreprocessorDirective(false); });
- // Handle "@import".
- if (*First == '@')
- return lexAt(First, End);
-
// Handle module directives for C++20 modules.
- if (*First == 'i' || *First == 'e' || *First == 'm')
+ if (*First == 'i' || *First == 'e' || *First == 'm' || IsPreprocessedModule)
return lexModule(First, End);
// Lex '#'.
@@ -1009,6 +1052,7 @@ bool Scanner::scanImpl(const char *First, const char *const End) {
}
bool Scanner::scan(SmallVectorImpl<Directive> &Directives) {
+ ScanningPreprocessedModuleFile = clang::isPreprocessedModuleFile(Input);
bool Error = scanImpl(Input.begin(), Input.end());
if (!Error) {
@@ -1075,3 +1119,93 @@ void clang::printDependencyDirectivesAsSource(
}
}
}
+
+static void skipUntilMaybeCXX20ModuleDirective(const char *&First,
+ const char *const End) {
+ assert(First <= End);
+ while (First != End) {
+ if (*First == '#') {
+ ++First;
+ skipToNewlineRaw(First, End);
+ }
+ skipWhitespace(First, End);
+ if (const auto Len = isEOL(First, End)) {
+ First += Len;
+ continue;
+ }
+ break;
+ }
+}
+
+bool clang::scanInputForCXX20ModulesUsage(StringRef Source) {
+ const char *First = Source.begin();
+ const char *const End = Source.end();
+ skipUntilMaybeCXX20ModuleDirective(First, End);
+ if (First == End)
+ return false;
+
+ // Check if the next token can even be a module directive before creating a
+ // full lexer.
+ if (!(*First == 'i' || *First == 'e' || *First == 'm'))
+ return false;
+
+ llvm::SmallVector<dependency_directives_scan::Token> Tokens;
+ Scanner S(StringRef(First, End - First), Tokens, nullptr, SourceLocation());
+ S.TheLexer.setParsingPreprocessorDirective(true);
+ if (S.lexModule(First, End))
+ return false;
+ auto IsCXXNamedModuleDirective = [](const DirectiveWithTokens &D) {
+ switch (D.Kind) {
+ case dependency_directives_scan::cxx_module_decl:
+ case dependency_directives_scan::cxx_import_decl:
+ case dependency_directives_scan::cxx_export_module_decl:
+ case dependency_directives_scan::cxx_export_import_decl:
+ return true;
+ default:
+ return false;
+ }
+ };
+ return llvm::any_of(S.DirsWithToks, IsCXXNamedModuleDirective);
+}
+
+bool clang::isPreprocessedModuleFile(StringRef Source) {
+ const char *First = Source.begin();
+ const char *const End = Source.end();
+
+ skipUntilMaybeCXX20ModuleDirective(First, End);
+ if (First == End)
+ return false;
+
+ llvm::SmallVector<dependency_directives_scan::Token> Tokens;
+ Scanner S(StringRef(First, End - First), Tokens, nullptr, SourceLocation());
+ while (First != End) {
+ if (*First == '#') {
+ ++First;
+ skipToNewlineRaw(First, End);
+ } else if (*First == 'e') {
+ S.TheLexer.seek(S.getOffsetAt(First), /*IsAtStartOfLine=*/true);
+ StringRef Id = S.lexIdentifier(First, End);
+ if (Id == "export") {
+ std::optional<StringRef> NextId =
+ S.tryLexIdentifierOrSkipLine(First, End);
+ if (!NextId)
+ return false;
+ Id = *NextId;
+ }
+ if (Id == "__preprocessed_module" || Id == "__preprocessed_import")
+ return true;
+ skipToNewlineRaw(First, End);
+ } else if (isStartWithPreprocessedModuleDirective(First, End))
+ return true;
+ else
+ skipToNewlineRaw(First, End);
+
+ skipWhitespace(First, End);
+ if (const auto Len = isEOL(First, End)) {
+ First += Len;
+ continue;
+ }
+ break;
+ }
+ return false;
+}
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index afebef0974016..5e9d2743ba53f 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -72,6 +72,17 @@ tok::ObjCKeywordKind Token::getObjCKeywordID() const {
return specId ? specId->getObjCKeywordID() : tok::objc_not_keyword;
}
+bool Token::isModuleContextualKeyword(bool AllowExport) const {
+ if (AllowExport && is(tok::kw_export))
+ return true;
+ if (isOneOf(tok::kw_import, tok::kw_module))
+ return true;
+ if (isNot(tok::identifier))
+ return false;
+ const auto *II = getIdentifierInfo();
+ return II->isImportKeyword() || II->isModuleKeyword();
+}
+
/// Determine whether the token kind starts a simple-type-specifier.
bool Token::isSimpleTypeSpecifier(const LangOptions &LangOpts) const {
switch (getKind()) {
@@ -4019,11 +4030,23 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
case 'o': case 'p': case 'q': case 'r': case 's': case 't': /*'u'*/
case 'v': case 'w': case 'x': case 'y': case 'z':
- case '_':
+ case '_': {
// Notify MIOpt that we read a non-whitespace/non-comment token.
MIOpt.ReadToken();
- return LexIdentifierContinue(Result, CurPtr);
+ // LexIdentifierContinue may trigger HandleEndOfFile which would
+ // normally destroy this Lexer. However, the Preprocessor now defers
+ // lexer destruction until the stack of Lexer unwinds (LexLevel == 0),
+ // so it's safe to access member variables after this call returns.
+ bool returnedToken = LexIdentifierContinue(Result, CurPtr);
+
+ if (returnedToken && !LexingRawMode && !Is_PragmaLexer &&
+ !ParsingPreprocessorDirective && LangOpts.CPlusPlusModules &&
+ Result.isModuleContextualKeyword() &&
+ PP->HandleModuleContextualKeyword(Result, TokAtPhysicalStartOfLine))
+ goto HandleDirective;
+ return returnedToken;
+ }
case '$': // $ in identifiers.
if (LangOpts.DollarIdents) {
if (!isLexingRawMode())
@@ -4226,8 +4249,12 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
// it's actually the start of a preprocessing directive. Callback to
// the preprocessor to handle it.
// TODO: -fpreprocessed mode??
- if (TokAtPhysicalStartOfLine && !LexingRawMode && !Is_PragmaLexer)
+ if (TokAtPhysicalStartOfLine && !LexingRawMode && !Is_PragmaLexer) {
+ // We parsed a # character and it's the start of a preprocessing
+ // directive.
+ FormTokenWithChars(Result, CurPtr, tok::hash);
goto HandleDirective;
+ }
Kind = tok::hash;
}
@@ -4414,8 +4441,12 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
// it's actually the start of a preprocessing directive. Callback to
// the preprocessor to handle it.
// TODO: -fpreprocessed mode??
- if (TokAtPhysicalStartOfLine && !LexingRawMode && !Is_PragmaLexer)
+ if (TokAtPhysicalStartOfLine && !LexingRawMode && !Is_PragmaLexer) {
+ // We parsed a # character and it's the start of a preprocessing
+ // directive.
+ FormTokenWithChars(Result, CurPtr, tok::hash);
goto HandleDirective;
+ }
Kind = tok::hash;
}
@@ -4505,9 +4536,6 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
return true;
HandleDirective:
- // We parsed a # character and it's the start of a preprocessing directive.
-
- FormTokenWithChars(Result, CurPtr, tok::hash);
PP->HandleDirective(Result);
if (PP->hadModuleLoaderFatalFailure())
@@ -4530,6 +4558,10 @@ const char *Lexer::convertDependencyDirectiveToken(
Result.setKind(DDTok.Kind);
Result.setFlag((Token::TokenFlags)DDTok.Flags);
Result.setLength(DDTok.Length);
+ if (Result.is(tok::raw_identifier))
+ Result.setRawIdentifierData(TokPtr);
+ else if (Result.isLiteral())
+ Result.setLiteralData(TokPtr);
BufferPtr = TokPtr + DDTok.Length;
return TokPtr;
}
@@ -4587,15 +4619,18 @@ bool Lexer::LexDependencyDirectiveToken(Token &Result) {
Result.setRawIdentifierData(TokPtr);
if (!isLexingRawMode()) {
const IdentifierInfo *II = PP->LookUpIdentifierInfo(Result);
+ if (LangOpts.CPlusPlusModules && Result.isModuleContextualKeyword() &&
+ PP->HandleModuleContextualKeyword(Result, Result.isAtStartOfLine())) {
+ PP->HandleDirective(Result);
+ return false;
+ }
if (II->isHandleIdentifierCase())
return PP->HandleIdentifier(Result);
}
return true;
}
- if (Result.isLiteral()) {
- Result.setLiteralData(TokPtr);
+ if (Result.isLiteral())
return true;
- }
if (Result.is(tok::colon)) {
// Convert consecutive colons to 'tok::coloncolon'.
if (*BufferPtr == ':') {
diff --git a/clang/lib/Lex/PPDirectives.cpp b/clang/lib/Lex/PPDirectives.cpp
index d17e253556697..3d9e7f62757c5 100644
--- a/clang/lib/Lex/PPDirectives.cpp
+++ b/clang/lib/Lex/PPDirectives.cpp
@@ -48,6 +48,7 @@
#include "llvm/Support/SaveAndRestore.h"
#include <algorithm>
#include <cassert>
+#include <cstddef>
#include <cstring>
#include <optional>
#include <string>
@@ -82,14 +83,19 @@ Preprocessor::AllocateVisibilityMacroDirective(SourceLocation Loc,
/// Read and discard all tokens remaining on the current line until
/// the tok::eod token is found.
-SourceRange Preprocessor::DiscardUntilEndOfDirective(Token &Tmp) {
+SourceRange Preprocessor::DiscardUntilEndOfDirective(
+ Token &Tmp, SmallVectorImpl<Token> *DiscardedToks) {
SourceRange Res;
-
- LexUnexpandedToken(Tmp);
+ auto ReadNextTok = [&]() {
+ LexUnexpandedToken(Tmp);
+ if (DiscardedToks && Tmp.isNot(tok::eod))
+ DiscardedToks->push_back(Tmp);
+ };
+ ReadNextTok();
Res.setBegin(Tmp.getLocation());
while (Tmp.isNot(tok::eod)) {
assert(Tmp.isNot(tok::eof) && "EOF seen while discarding directive tokens");
- LexUnexpandedToken(Tmp);
+ ReadNextTok();
}
Res.setEnd(Tmp.getLocation());
return Res;
@@ -456,21 +462,27 @@ void Preprocessor::ReadMacroName(Token &MacroNameTok, MacroUse isDefineUndef,
/// true, then we consider macros that expand to zero tokens as being ok.
///
/// Returns the location of the end of the directive.
-SourceLocation Preprocessor::CheckEndOfDirective(const char *DirType,
- bool EnableMacros) {
+SourceLocation
+Preprocessor::CheckEndOfDirective(StringRef DirType, bool EnableMacros,
+ SmallVectorImpl<Token> *ExtraToks) {
Token Tmp;
+ auto ReadNextTok = [this, ExtraToks, &Tmp](auto &&LexFn) {
+ std::invoke(LexFn, this, Tmp);
+ if (ExtraToks && Tmp.isNot(tok::eod))
+ ExtraToks->push_back(Tmp);
+ };
// Lex unexpanded tokens for most directives: macros might expand to zero
// tokens, causing us to miss diagnosing invalid lines. Some directives (like
// #line) allow empty macros.
if (EnableMacros)
- Lex(Tmp);
+ ReadNextTok(&Preprocessor::Lex);
else
- LexUnexpandedToken(Tmp);
+ ReadNextTok(&Preprocessor::LexUnexpandedToken);
// There should be no tokens after the directive, but we allow them as an
// extension.
while (Tmp.is(tok::comment)) // Skip comments in -C mode.
- LexUnexpandedToken(Tmp);
+ ReadNextTok(&Preprocessor::LexUnexpandedToken);
if (Tmp.is(tok::eod))
return Tmp.getLocation();
@@ -483,8 +495,15 @@ SourceLocation Preprocessor::CheckEndOfDirective(const char *DirType,
if ((LangOpts.GNUMode || LangOpts.C99 || LangOpts.CPlusPlus) &&
!CurTokenLexer)
Hint = FixItHint::CreateInsertion(Tmp.getLocation(),"//");
- Diag(Tmp, diag::ext_pp_extra_tokens_at_eol) << DirType << Hint;
- return DiscardUntilEndOfDirective().getEnd();
+
+ unsigned DiagID = diag::ext_pp_extra_tokens_at_eol;
+ // C++20 import or module directive has no '#' prefix.
+ if (getLangOpts().CPlusPlusModules &&
+ (DirType == "import" || DirType == "module"))
+ DiagID = diag::warn_pp_extra_tokens_at_module_directive_eol;
+
+ Diag(Tmp, DiagID) << DirType << Hint;
+ return DiscardUntilEndOfDirective(ExtraToks).getEnd();
}
void Preprocessor::SuggestTypoedDirective(const Token &Tok,
@@ -610,6 +629,57 @@ void Preprocessor::SkipExcludedConditionalBlock(SourceLocation HashTokenLoc,
continue;
}
+ // There is actually no "skipped block" in the above because the module
+ // directive is not a text-line (https://wg21.link/cpp.pre#2) nor
+ // anything else that is allowed in a group
+ // (https://eel.is/c++draft/cpp.pre#nt:group-part).
+ //
+ // A preprocessor diagnostic (effective with -E) that triggers whenever
+ // a module directive is encountered where a control-line or a text-line
+ // is required.
+ if (getLangOpts().CPlusPlusModules && Tok.isAtStartOfLine() &&
+ Tok.is(tok::raw_identifier) &&
+ (Tok.getRawIdentifier() == "export" ||
+ Tok.getRawIdentifier() == "module")) {
+ llvm::SaveAndRestore ModuleDirectiveSkipping(
+ LastTokenWasExportKeyword);
+ LastTokenWasExportKeyword.reset();
+ LookUpIdentifierInfo(Tok);
+ IdentifierInfo *II = Tok.getIdentifierInfo();
+
+ if (II->getName()[0] == 'e') { // export
+ HandleModuleContextualKeyword(Tok, Tok.isAtStartOfLine());
+ CurLexer->Lex(Tok);
+ if (Tok.is(tok::raw_identifier)) {
+ LookUpIdentifierInfo(Tok);
+ II = Tok.getIdentifierInfo();
+ }
+ }
+
+ if (II->getName()[0] == 'm') { // module
+ // HandleModuleContextualKeyword changes the lexer state, so we need
+ // to save RawLexingMode
+ llvm::SaveAndRestore RestoreLexingRawMode(CurPPLexer->LexingRawMode,
+ false);
+ if (HandleModuleContextualKeyword(Tok, Tok.isAtStartOfLine())) {
+ // We just parsed a # character at the start of a line, so we're
+ // in directive mode. Tell the lexer this so any newlines we see
+ // will be converted into an EOD token (this terminates the
+ // macro).
+ CurPPLexer->ParsingPreprocessorDirective = true;
+ SourceLocation StartLoc = Tok.getLocation();
+ SourceLocation End = DiscardUntilEndOfDirective().getEnd();
+ Diag(StartLoc, diag::err_pp_cond_span_module_decl)
+ << SourceRange(StartLoc, End);
+ CurPPLexer->ParsingPreprocessorDirective = false;
+ // Restore comment saving mode.
+ if (CurLexer)
+ CurLexer->resetExtendedTokenMode();
+ continue;
+ }
+ }
+ }
+
// If this is the end of the buffer, we have an error.
if (Tok.is(tok::eof)) {
// We don't emit errors for unterminated conditionals here,
@@ -1259,12 +1329,14 @@ void Preprocessor::HandleDirective(Token &Result) {
// pp-directive.
bool ReadAnyTokensBeforeDirective =CurPPLexer->MIOpt.getHasReadAnyTokensVal();
- // Save the '#' token in case we need to return it later.
- Token SavedHash = Result;
+ // Save the directive-introducing token('#' and import/module in C++20) in
+ // case we need to return it later.
+ Token Introducer = Result;
// Read the next token, the directive flavor. This isn't expanded due to
// C99 6.10.3p8.
- LexUnexpandedToken(Result);
+ if (Introducer.is(tok::hash))
+ LexUnexpandedToken(Result);
// C99 6.10.3p11: Is this preprocessor directive in macro invocation? e.g.:
// #define A(x) #x
@@ -1283,7 +1355,14 @@ void Preprocessor::HandleDirective(Token &Result) {
case tok::pp___include_macros:
case tok::pp_pragma:
case tok::pp_embed:
- Diag(Result, diag::err_embedded_directive) << II->getName();
+ case tok::pp_module:
+ case tok::pp___preprocessed_module:
+ case tok::pp___preprocessed_import:
+ Diag(Result, diag::err_embedded_directive)
+ << (getLangOpts().CPlusPlusModules &&
+ Introducer.isModuleContextualKeyword(
+ /*AllowExport=*/false))
+ << II->getName();
Diag(*ArgMacro, diag::note_macro_expansion_here)
<< ArgMacro->getIdentifierInfo();
DiscardUntilEndOfDirective();
@@ -1300,7 +1379,8 @@ void Preprocessor::HandleDirective(Token &Result) {
ResetMacroExpansionHelper helper(this);
if (SkippingUntilPCHThroughHeader || SkippingUntilPragmaHdrStop)
- return HandleSkippedDirectiveWhileUsingPCH(Result, SavedHash.getLocation());
+ return HandleSkippedDirectiveWhileUsingPCH(Result,
+ Introducer.getLocation());
switch (Result.getKind()) {
case tok::eod:
@@ -1320,7 +1400,7 @@ void Preprocessor::HandleDirective(Token &Result) {
// directive. However do permit it in the predefines file, as we use line
// markers to mark the builtin macros as being in a system header.
if (getLangOpts().AsmPreprocessor &&
- SourceMgr.getFileID(SavedHash.getLocation()) != getPredefinesFileID())
+ SourceMgr.getFileID(Introducer.getLocation()) != getPredefinesFileID())
break;
return HandleDigitDirective(Result);
default:
@@ -1332,30 +1412,32 @@ void Preprocessor::HandleDirective(Token &Result) {
default: break;
// C99 6.10.1 - Conditional Inclusion.
case tok::pp_if:
- return HandleIfDirective(Result, SavedHash, ReadAnyTokensBeforeDirective);
+ return HandleIfDirective(Result, Introducer,
+ ReadAnyTokensBeforeDirective);
case tok::pp_ifdef:
- return HandleIfdefDirective(Result, SavedHash, false,
+ return HandleIfdefDirective(Result, Introducer, false,
true /*not valid for miopt*/);
case tok::pp_ifndef:
- return HandleIfdefDirective(Result, SavedHash, true,
+ return HandleIfdefDirective(Result, Introducer, true,
ReadAnyTokensBeforeDirective);
case tok::pp_elif:
case tok::pp_elifdef:
case tok::pp_elifndef:
- return HandleElifFamilyDirective(Result, SavedHash, II->getPPKeywordID());
+ return HandleElifFamilyDirective(Result, Introducer,
+ II->getPPKeywordID());
case tok::pp_else:
- return HandleElseDirective(Result, SavedHash);
+ return HandleElseDirective(Result, Introducer);
case tok::pp_endif:
return HandleEndifDirective(Result);
// C99 6.10.2 - Source File Inclusion.
case tok::pp_include:
// Handle #include.
- return HandleIncludeDirective(SavedHash.getLocation(), Result);
+ return HandleIncludeDirective(Introducer.getLocation(), Result);
case tok::pp___include_macros:
// Handle -imacros.
- return HandleIncludeMacrosDirective(SavedHash.getLocation(), Result);
+ return HandleIncludeMacrosDirective(Introducer.getLocation(), Result);
// C99 6.10.3 - Macro Replacement.
case tok::pp_define:
@@ -1373,13 +1455,21 @@ void Preprocessor::HandleDirective(Token &Result) {
// C99 6.10.6 - Pragma Directive.
case tok::pp_pragma:
- return HandlePragmaDirective({PIK_HashPragma, SavedHash.getLocation()});
-
+ return HandlePragmaDirective({PIK_HashPragma, Introducer.getLocation()});
+ case tok::pp_module:
+ case tok::pp___preprocessed_module:
+ return HandleCXXModuleDirective(Result);
+ case tok::pp___preprocessed_import:
+ return HandleCXXImportDirective(Result);
// GNU Extensions.
case tok::pp_import:
- return HandleImportDirective(SavedHash.getLocation(), Result);
+ if (getLangOpts().CPlusPlusModules &&
+ Introducer.isModuleContextualKeyword(
+ /*AllowExport=*/false))
+ return HandleCXXImportDirective(Result);
+ return HandleImportDirective(Introducer.getLocation(), Result);
case tok::pp_include_next:
- return HandleIncludeNextDirective(SavedHash.getLocation(), Result);
+ return HandleIncludeNextDirective(Introducer.getLocation(), Result);
case tok::pp_warning:
if (LangOpts.CPlusPlus)
@@ -1400,8 +1490,8 @@ void Preprocessor::HandleDirective(Token &Result) {
case tok::pp_embed: {
if (PreprocessorLexer *CurrentFileLexer = getCurrentFileLexer())
if (OptionalFileEntryRef FERef = CurrentFileLexer->getFileEntry())
- return HandleEmbedDirective(SavedHash.getLocation(), Result, *FERef);
- return HandleEmbedDirective(SavedHash.getLocation(), Result, nullptr);
+ return HandleEmbedDirective(Introducer.getLocation(), Result, *FERef);
+ return HandleEmbedDirective(Introducer.getLocation(), Result, nullptr);
}
case tok::pp_assert:
//isExtension = true; // FIXME: implement #assert
@@ -1430,7 +1520,7 @@ void Preprocessor::HandleDirective(Token &Result) {
if (getLangOpts().AsmPreprocessor) {
auto Toks = std::make_unique<Token[]>(2);
// Return the # and the token after it.
- Toks[0] = SavedHash;
+ Toks[0] = Introducer;
Toks[1] = Result;
// If the second token is a hashhash token, then we need to translate it to
@@ -4095,3 +4185,323 @@ void Preprocessor::HandleEmbedDirective(SourceLocation HashLoc, Token &EmbedTok,
StringRef(static_cast<char *>(Mem), OriginalFilename.size());
HandleEmbedDirectiveImpl(HashLoc, *Params, BinaryContents, FilenameToGo);
}
+
+/// HandleCXXImportDirective - Handle the C++ modules import directives
+///
+/// pp-import:
+/// export[opt] import header-name pp-tokens[opt] ; new-line
+/// export[opt] import header-name-tokens pp-tokens[opt] ; new-line
+/// export[opt] import pp-tokens ; new-line
+///
+/// The header importing are replaced by annot_header_unit token, and the
+/// lexed module name are replaced by annot_module_name token.
+void Preprocessor::HandleCXXImportDirective(Token ImportTok) {
+ assert(getLangOpts().CPlusPlusModules && ImportTok.is(tok::kw_import));
+ llvm::SaveAndRestore<bool> SaveImportingCXXModules(
+ this->ImportingCXXNamedModules, true);
+
+ if (LastTokenWasExportKeyword.isValid())
+ LastTokenWasExportKeyword.reset();
+
+ Token Tok;
+ if (LexHeaderName(Tok)) {
+ if (Tok.isNot(tok::eod))
+ CheckEndOfDirective(ImportTok.getIdentifierInfo()->getName());
+ return;
+ }
+
+ SourceLocation UseLoc = ImportTok.getLocation();
+ SmallVector<Token, 4> DirToks{ImportTok};
+ SmallVector<IdentifierLoc, 2> Path;
+ bool ImportingHeader = false;
+ bool IsPartition = false;
+ std::string FlatName;
+ switch (Tok.getKind()) {
+ case tok::header_name:
+ ImportingHeader = true;
+ DirToks.push_back(Tok);
+ Lex(DirToks.emplace_back());
+ break;
+ case tok::colon:
+ IsPartition = true;
+ DirToks.push_back(Tok);
+ UseLoc = Tok.getLocation();
+ Lex(Tok);
+ [[fallthrough]];
+ case tok::identifier: {
+ bool LeadingSpace = Tok.hasLeadingSpace();
+ unsigned NumToksInDirective = DirToks.size();
+ if (LexModuleNameContinue(Tok, UseLoc, DirToks, Path)) {
+ if (Tok.isNot(tok::eod))
+ CheckEndOfDirective(ImportTok.getIdentifierInfo()->getName(),
+ /*EnableMacros=*/false, &DirToks);
+ EnterModuleSuffixTokenStream(DirToks);
+ return;
+ }
+
+ // Clean the module-name tokens and replace these tokens with
+ // annot_module_name.
+ DirToks.resize(NumToksInDirective);
+ ModuleNameLoc *NameLoc = ModuleNameLoc::Create(*this, Path);
+ DirToks.emplace_back();
+ DirToks.back().setKind(tok::annot_module_name);
+ DirToks.back().setAnnotationRange(NameLoc->getRange());
+ DirToks.back().setAnnotationValue(static_cast<void *>(NameLoc));
+ DirToks.back().setFlagValue(Token::LeadingSpace, LeadingSpace);
+ DirToks.push_back(Tok);
+
+ bool IsValid =
+ (IsPartition && ModuleDeclState.isNamedModule()) || !IsPartition;
+ if (Callbacks && IsValid) {
+ if (IsPartition && ModuleDeclState.isNamedModule()) {
+ FlatName += ModuleDeclState.getPrimaryName();
+ FlatName += ":";
+ }
+
+ FlatName += ModuleLoader::getFlatNameFromPath(Path);
+ SourceLocation StartLoc = IsPartition ? UseLoc : Path[0].getLoc();
+ IdentifierLoc FlatNameLoc(StartLoc, getIdentifierInfo(FlatName));
+
+ // We don't/shouldn't load the standard c++20 modules when preprocessing.
+ // so the imported module is nullptr.
+ Callbacks->moduleImport(ImportTok.getLocation(),
+ ModuleIdPath(FlatNameLoc),
+ /*Imported=*/nullptr);
+ }
+ break;
+ }
+ default:
+ DirToks.push_back(Tok);
+ break;
+ }
+
+ // Consume the pp-import-suffix and expand any macros in it now, if we're not
+ // at the semicolon already.
+ if (!DirToks.back().isOneOf(tok::semi, tok::eod))
+ CollectPPImportSuffix(DirToks);
+
+ if (DirToks.back().isNot(tok::eod))
+ CheckEndOfDirective(ImportTok.getIdentifierInfo()->getName());
+ else
+ DirToks.pop_back();
+
+ // This is not a pp-import after all.
+ if (DirToks.back().isNot(tok::semi)) {
+ EnterModuleSuffixTokenStream(DirToks);
+ return;
+ }
+
+ if (ImportingHeader) {
+ // C++2a [cpp.module]p1:
+ // The ';' preprocessing-token terminating a pp-import shall not have
+ // been produced by macro replacement.
+ SourceLocation SemiLoc = DirToks.back().getLocation();
+ if (SemiLoc.isMacroID())
+ Diag(SemiLoc, diag::err_header_import_semi_in_macro);
+
+ auto Action = HandleHeaderIncludeOrImport(
+ /*HashLoc*/ SourceLocation(), ImportTok, Tok, SemiLoc);
+ switch (Action.Kind) {
+ case ImportAction::None:
+ break;
+
+ case ImportAction::ModuleBegin:
+ // Let the parser know we're textually entering the module.
+ DirToks.emplace_back();
+ DirToks.back().startToken();
+ DirToks.back().setKind(tok::annot_module_begin);
+ DirToks.back().setLocation(SemiLoc);
+ DirToks.back().setAnnotationEndLoc(SemiLoc);
+ DirToks.back().setAnnotationValue(Action.ModuleForHeader);
+ [[fallthrough]];
+
+ case ImportAction::ModuleImport:
+ case ImportAction::HeaderUnitImport:
+ case ImportAction::SkippedModuleImport:
+ // We chose to import (or textually enter) the file. Convert the
+ // header-name token into a header unit annotation token.
+ DirToks[1].setKind(tok::annot_header_unit);
+ DirToks[1].setAnnotationEndLoc(DirToks[0].getLocation());
+ DirToks[1].setAnnotationValue(Action.ModuleForHeader);
+ // FIXME: Call the moduleImport callback?
+ break;
+ case ImportAction::Failure:
+ assert(TheModuleLoader.HadFatalFailure &&
+ "This should be an early exit only to a fatal error");
+ CurLexer->cutOffLexing();
+ return;
+ }
+ }
+
+ EnterModuleSuffixTokenStream(DirToks);
+}
+
+/// HandleCXXModuleDirective - Handle C++ module declaration directives.
+///
+/// pp-module:
+/// export[opt] module pp-tokens[opt] ; new-line
+///
+/// pp-module-name:
+/// pp-module-name-qualifier[opt] identifier
+/// pp-module-partition:
+/// : pp-module-name-qualifier[opt] identifier
+/// pp-module-name-qualifier:
+/// identifier .
+/// pp-module-name-qualifier identifier .
+///
+/// global-module-fragment:
+/// module-keyword ; declaration-seq[opt]
+///
+/// private-module-fragment:
+/// module-keyword : private ; declaration-seq[opt]
+///
+/// The lexed module name are replaced by annot_module_name token.
+void Preprocessor::HandleCXXModuleDirective(Token ModuleTok) {
+ assert(getLangOpts().CPlusPlusModules && ModuleTok.is(tok::kw_module));
+ Token Introducer = ModuleTok;
+ if (LastTokenWasExportKeyword.isValid()) {
+ Introducer = LastTokenWasExportKeyword.getExportTok();
+ LastTokenWasExportKeyword.reset();
+ }
+
+ SourceLocation StartLoc = Introducer.getLocation();
+
+ Token Tok;
+ SourceLocation UseLoc = ModuleTok.getLocation();
+ SmallVector<Token, 4> DirToks{ModuleTok};
+ SmallVector<IdentifierLoc, 2> Path, Partition;
+ LexUnexpandedToken(Tok);
+
+ switch (Tok.getKind()) {
+ // Global Module Fragment.
+ case tok::semi:
+ DirToks.push_back(Tok);
+ break;
+ case tok::colon:
+ DirToks.push_back(Tok);
+ LexUnexpandedToken(Tok);
+ if (Tok.isNot(tok::kw_private)) {
+ if (Tok.isNot(tok::eod))
+ CheckEndOfDirective(ModuleTok.getIdentifierInfo()->getName(),
+ /*EnableMacros=*/false, &DirToks);
+ EnterModuleSuffixTokenStream(DirToks);
+ return;
+ }
+ DirToks.push_back(Tok);
+ break;
+ case tok::identifier: {
+ bool LeadingSpace = Tok.hasLeadingSpace();
+ unsigned NumToksInDirective = DirToks.size();
+
+ // C++ [cpp.module]p3: Any preprocessing tokens after the module
+ // preprocessing token in the module directive are processed just as in
+ // normal text.
+ //
+ // P3034R1 Module Declarations Shouldn’t be Macros.
+ if (LexModuleNameContinue(Tok, UseLoc, DirToks, Path,
+ /*AllowMacroExpansion=*/false)) {
+ if (Tok.isNot(tok::eod))
+ CheckEndOfDirective(ModuleTok.getIdentifierInfo()->getName(),
+ /*EnableMacros=*/false, &DirToks);
+ EnterModuleSuffixTokenStream(DirToks);
+ return;
+ }
+
+ ModuleNameLoc *NameLoc = ModuleNameLoc::Create(*this, Path);
+ DirToks.resize(NumToksInDirective);
+ DirToks.emplace_back();
+ DirToks.back().setKind(tok::annot_module_name);
+ DirToks.back().setAnnotationRange(NameLoc->getRange());
+ DirToks.back().setAnnotationValue(static_cast<void *>(NameLoc));
+ DirToks.back().setFlagValue(Token::LeadingSpace, LeadingSpace);
+ DirToks.push_back(Tok);
+
+ // C++20 [cpp.module]p
+ // The pp-tokens, if any, of a pp-module shall be of the form:
+ // pp-module-name pp-module-partition[opt] pp-tokens[opt]
+ if (Tok.is(tok::colon)) {
+ NumToksInDirective = DirToks.size();
+ LexUnexpandedToken(Tok);
+ LeadingSpace = Tok.hasLeadingSpace();
+ if (LexModuleNameContinue(Tok, UseLoc, DirToks, Partition,
+ /*AllowMacroExpansion=*/false,
+ /*IsPartition=*/true)) {
+ if (Tok.isNot(tok::eod))
+ CheckEndOfDirective(ModuleTok.getIdentifierInfo()->getName(),
+ /*EnableMacros=*/false, &DirToks);
+ EnterModuleSuffixTokenStream(DirToks);
+ return;
+ }
+
+ ModuleNameLoc *PartitionLoc = ModuleNameLoc::Create(*this, Partition);
+ DirToks.resize(NumToksInDirective);
+ DirToks.emplace_back();
+ DirToks.back().setKind(tok::annot_module_name);
+ DirToks.back().setAnnotationRange(NameLoc->getRange());
+ DirToks.back().setAnnotationValue(static_cast<void *>(PartitionLoc));
+ DirToks.back().setFlagValue(Token::LeadingSpace, LeadingSpace);
+ DirToks.push_back(Tok);
+ }
+
+ // If the current token is a macro definition, put it back to token stream
+ // and expand any macros in it later.
+ //
+ // export module M ATTR(some_attr); // -D'ATTR(x)=[[x]]'
+ //
+ // Current token is `ATTR`.
+ if (Tok.is(tok::identifier) &&
+ getMacroDefinition(Tok.getIdentifierInfo())) {
+ std::unique_ptr<Token[]> TokCopy = std::make_unique<Token[]>(1);
+ TokCopy[0] = Tok;
+ EnterTokenStream(std::move(TokCopy), /*NumToks=*/1,
+ /*DisableMacroExpansion=*/false, /*IsReinject=*/false);
+ Lex(Tok);
+ DirToks.back() = Tok;
+ }
+ break;
+ }
+ default:
+ DirToks.push_back(Tok);
+ break;
+ }
+
+ // Consume the pp-import-suffix and expand any macros in it now, if we're not
+ // at the semicolon already.
+ SourceLocation End = DirToks.back().getLocation();
+ std::optional<Token> NextPPTok = DirToks.back();
+ if (DirToks.back().is(tok::eod)) {
+ NextPPTok = peekNextPPToken();
+ if (NextPPTok && NextPPTok->is(tok::raw_identifier))
+ LookUpIdentifierInfo(*NextPPTok);
+ }
+
+ // Only ';' and '[' are allowed after module name.
+ // We also check 'private' because the previous is not a module name.
+ if (!NextPPTok->isOneOf(tok::semi, tok::eod, tok::l_square, tok::kw_private))
+ Diag(*NextPPTok, diag::err_pp_unexpected_tok_after_module_name)
+ << getSpelling(*NextPPTok);
+
+ if (!DirToks.back().isOneOf(tok::semi, tok::eod)) {
+ // Consume the pp-import-suffix and expand any macros in it now. We'll add
+ // it back into the token stream later.
+ CollectPPImportSuffix(DirToks);
+ End = DirToks.back().getLocation();
+ }
+
+ if (DirToks.back().isNot(tok::eod))
+ End = CheckEndOfDirective(ModuleTok.getIdentifierInfo()->getName(),
+ /*EnableMacros=*/false, &DirToks);
+ else
+ End = DirToks.pop_back_val().getLocation();
+
+ if (!IncludeMacroStack.empty()) {
+ Diag(StartLoc, diag::err_pp_module_decl_in_header)
+ << SourceRange(StartLoc, End);
+ }
+
+ if (CurPPLexer->getConditionalStackDepth() != 0) {
+ Diag(StartLoc, diag::err_pp_cond_span_module_decl)
+ << SourceRange(StartLoc, End);
+ }
+ EnterModuleSuffixTokenStream(DirToks);
+}
diff --git a/clang/lib/Lex/PPLexerChange.cpp b/clang/lib/Lex/PPLexerChange.cpp
index b014124153c83..05affedd48a86 100644
--- a/clang/lib/Lex/PPLexerChange.cpp
+++ b/clang/lib/Lex/PPLexerChange.cpp
@@ -441,7 +441,7 @@ bool Preprocessor::HandleEndOfFile(Token &Result, bool isEndOfMacro) {
assert(CurLexer && "Got EOF but no current lexer set!");
Result.startToken();
CurLexer->FormTokenWithChars(Result, CurLexer->BufferEnd, tok::eof);
- CurLexer.reset();
+ PendingDestroyLexers.push_back(std::move(CurLexer));
CurPPLexer = nullptr;
recomputeCurLexerKind();
@@ -558,9 +558,17 @@ bool Preprocessor::HandleEndOfFile(Token &Result, bool isEndOfMacro) {
<< PPOpts.PCHThroughHeader << 0;
}
- if (!isIncrementalProcessingEnabled())
- // We're done with lexing.
- CurLexer.reset();
+ if (!isIncrementalProcessingEnabled()) {
+ // We're done with lexing. If we're inside a nested Lex call (LexLevel > 0),
+ // defer destruction of the lexer until Lex returns to avoid use-after-free
+ // when HandleEndOfFile is called from within Lexer methods that still need
+ // to access their members after this function returns.
+ if (LexLevel > 0 && CurLexer) {
+ PendingDestroyLexers.push_back(std::move(CurLexer));
+ } else {
+ CurLexer.reset();
+ }
+ }
if (!isIncrementalProcessingEnabled())
CurPPLexer = nullptr;
diff --git a/clang/lib/Lex/Preprocessor.cpp b/clang/lib/Lex/Preprocessor.cpp
index 0a25dc19548ec..791a9644b6e85 100644
--- a/clang/lib/Lex/Preprocessor.cpp
+++ b/clang/lib/Lex/Preprocessor.cpp
@@ -35,6 +35,7 @@
#include "clang/Basic/SourceManager.h"
#include "clang/Basic/TargetInfo.h"
#include "clang/Lex/CodeCompletionHandler.h"
+#include "clang/Lex/DependencyDirectivesScanner.h"
#include "clang/Lex/ExternalPreprocessorSource.h"
#include "clang/Lex/HeaderSearch.h"
#include "clang/Lex/LexDiagnostic.h"
@@ -55,11 +56,14 @@
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Capacity.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/MemoryBufferRef.h"
+#include "llvm/Support/SaveAndRestore.h"
#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <cassert>
@@ -115,6 +119,8 @@ Preprocessor::Preprocessor(const PreprocessorOptions &PPOpts,
// We haven't read anything from the external source.
ReadMacrosFromExternalSource = false;
+ LastTokenWasExportKeyword.reset();
+
BuiltinInfo = std::make_unique<Builtin::Context>();
// "Poison" __VA_ARGS__, __VA_OPT__ which can only appear in the expansion of
@@ -576,6 +582,11 @@ void Preprocessor::EnterMainSourceFile() {
// export module M; // error: module declaration must occur
// // at the start of the translation unit.
if (getLangOpts().CPlusPlusModules) {
+ std::optional<StringRef> Input =
+ getSourceManager().getBufferDataOrNone(MainFileID);
+ if (!isPreprocessedModuleFile() && Input)
+ MainFileIsPreprocessedModuleFile =
+ clang::isPreprocessedModuleFile(*Input);
auto Tracer = std::make_unique<NoTrivialPPDirectiveTracer>(*this);
DirTracer = Tracer.get();
addPPCallbacks(std::move(Tracer));
@@ -875,15 +886,13 @@ bool Preprocessor::HandleIdentifier(Token &Identifier) {
// used in contexts where import declarations are disallowed.
//
// Likewise if this is the standard C++ import keyword.
- if (((LastTokenWasAt && II.isModulesImport()) ||
+ if (((LastTokenWasAt && II.isImportKeyword()) ||
Identifier.is(tok::kw_import)) &&
- !InMacroArgs && !DisableMacroExpansion &&
- (getLangOpts().Modules || getLangOpts().DebuggerSupport) &&
+ !InMacroArgs &&
+ (!DisableMacroExpansion || MacroExpansionInDirectivesOverride) &&
CurLexerCallback != CLK_CachingLexer) {
ModuleImportLoc = Identifier.getLocation();
- NamedModuleImportPath.clear();
IsAtImport = true;
- ModuleImportExpectsIdentifier = true;
CurLexerCallback = CLK_LexAfterModuleImport;
}
return true;
@@ -932,6 +941,7 @@ void Preprocessor::Lex(Token &Result) {
// This token is injected to represent the translation of '#include "a.h"'
// into "import a.h;". Mimic the notional ';'.
case tok::annot_module_include:
+ case tok::annot_repl_input_end:
case tok::semi:
TrackGMFState.handleSemi();
StdCXXImportSeqState.handleSemi();
@@ -951,35 +961,23 @@ void Preprocessor::Lex(Token &Result) {
case tok::colon:
ModuleDeclState.handleColon();
break;
- case tok::period:
- ModuleDeclState.handlePeriod();
- break;
- case tok::eod:
+ case tok::kw_import:
+ if (StdCXXImportSeqState.atTopLevel()) {
+ TrackGMFState.handleImport(StdCXXImportSeqState.afterTopLevelSeq());
+ StdCXXImportSeqState.handleImport();
+ }
break;
- case tok::identifier:
- // Check "import" and "module" when there is no open bracket. The two
- // identifiers are not meaningful with open brackets.
+ case tok::kw_module:
if (StdCXXImportSeqState.atTopLevel()) {
- if (Result.getIdentifierInfo()->isModulesImport()) {
- TrackGMFState.handleImport(StdCXXImportSeqState.afterTopLevelSeq());
- StdCXXImportSeqState.handleImport();
- if (StdCXXImportSeqState.afterImportSeq()) {
- ModuleImportLoc = Result.getLocation();
- NamedModuleImportPath.clear();
- IsAtImport = false;
- ModuleImportExpectsIdentifier = true;
- CurLexerCallback = CLK_LexAfterModuleImport;
- }
- break;
- } else if (Result.getIdentifierInfo() == getIdentifierInfo("module")) {
- if (hasSeenNoTrivialPPDirective())
- Result.setFlag(Token::HasSeenNoTrivialPPDirective);
- TrackGMFState.handleModule(StdCXXImportSeqState.afterTopLevelSeq());
- ModuleDeclState.handleModule();
- break;
- }
+ if (hasSeenNoTrivialPPDirective())
+ Result.setFlag(Token::HasSeenNoTrivialPPDirective);
+ TrackGMFState.handleModule(StdCXXImportSeqState.afterTopLevelSeq());
+ ModuleDeclState.handleModule();
}
- ModuleDeclState.handleIdentifier(Result.getIdentifierInfo());
+ break;
+ case tok::annot_module_name:
+ ModuleDeclState.handleModuleName(
+ static_cast<ModuleNameLoc *>(Result.getAnnotationValue()));
if (ModuleDeclState.isModuleCandidate())
break;
[[fallthrough]];
@@ -997,8 +995,17 @@ void Preprocessor::Lex(Token &Result) {
}
LastTokenWasAt = Result.is(tok::at);
+ if (Result.isNot(tok::kw_export))
+ LastTokenWasExportKeyword.reset();
+
--LexLevel;
+ // Destroy any lexers that were deferred while we were in nested Lex calls.
+ // This must happen after decrementing LexLevel but before any other
+ // processing that might re-enter Lex.
+ if (LexLevel == 0 && !PendingDestroyLexers.empty())
+ PendingDestroyLexers.clear();
+
if ((LexLevel == 0 || PreprocessToken) &&
!Result.getFlag(Token::IsReinjected)) {
if (LexLevel == 0)
@@ -1119,41 +1126,247 @@ bool Preprocessor::LexHeaderName(Token &FilenameTok, bool AllowMacroExpansion) {
return false;
}
+std::optional<Token> Preprocessor::peekNextPPToken() const {
+ // Do some quick tests for rejection cases.
+ std::optional<Token> Val;
+ if (CurLexer)
+ Val = CurLexer->peekNextPPToken();
+ else
+ Val = CurTokenLexer->peekNextPPToken();
+
+ if (!Val) {
+ // We have run off the end. If it's a source file we don't
+ // examine enclosing ones (C99 5.1.1.2p4). Otherwise walk up the
+ // macro stack.
+ if (CurPPLexer)
+ return std::nullopt;
+ for (const IncludeStackInfo &Entry : llvm::reverse(IncludeMacroStack)) {
+ if (Entry.TheLexer)
+ Val = Entry.TheLexer->peekNextPPToken();
+ else
+ Val = Entry.TheTokenLexer->peekNextPPToken();
+
+ if (Val)
+ break;
+
+ // Ran off the end of a source file?
+ if (Entry.ThePPLexer)
+ return std::nullopt;
+ }
+ }
+
+ // Okay, we found the token and return. Otherwise we found the end of the
+ // translation unit.
+ return Val;
+}
+
+// We represent the primary and partition names as 'Paths' which are sections
+// of the hierarchical access path for a clang module. However for C++20
+// the periods in a name are just another character, and we will need to
+// flatten them into a string.
+std::string ModuleLoader::getFlatNameFromPath(ModuleIdPath Path) {
+ std::string Name;
+ if (Path.empty())
+ return Name;
+
+ for (auto &Piece : Path) {
+ assert(Piece.getIdentifierInfo() && Piece.getLoc().isValid());
+ if (!Name.empty())
+ Name += ".";
+ Name += Piece.getIdentifierInfo()->getName();
+ }
+ return Name;
+}
+
+ModuleNameLoc *ModuleNameLoc::Create(Preprocessor &PP, ModuleIdPath Path) {
+ assert(!Path.empty() && "expect at least one identifier in a module name");
+ void *Mem = PP.getPreprocessorAllocator().Allocate(
+ totalSizeToAlloc<IdentifierLoc>(Path.size()), alignof(ModuleNameLoc));
+ return new (Mem) ModuleNameLoc(Path);
+}
+
+bool Preprocessor::LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+ SmallVectorImpl<Token> &Suffix,
+ SmallVectorImpl<IdentifierLoc> &Path,
+ bool AllowMacroExpansion,
+ bool IsPartition) {
+ auto ConsumeToken = [&]() {
+ if (AllowMacroExpansion)
+ Lex(Tok);
+ else
+ LexUnexpandedToken(Tok);
+ Suffix.push_back(Tok);
+ };
+
+ while (true) {
+ if (Tok.isNot(tok::identifier)) {
+ if (Tok.is(tok::code_completion)) {
+ CurLexer->cutOffLexing();
+ CodeComplete->CodeCompleteModuleImport(UseLoc, Path);
+ return true;
+ }
+
+ Diag(Tok, diag::err_pp_module_expected_ident) << Path.empty();
+ return true;
+ }
+
+ // [cpp.pre]/p2:
+ // No identifier in the pp-module-name or pp-module-partition shall
+ // currently be defined as an object-like macro.
+ if (MacroInfo *MI = getMacroInfo(Tok.getIdentifierInfo());
+ MI && MI->isObjectLike() && getLangOpts().CPlusPlus20 &&
+ !AllowMacroExpansion) {
+ Diag(Tok, diag::err_pp_module_name_is_macro)
+ << IsPartition << Tok.getIdentifierInfo();
+ Diag(MI->getDefinitionLoc(), diag::note_macro_here)
+ << Tok.getIdentifierInfo();
+ }
+
+ // Record this part of the module path.
+ Path.emplace_back(Tok.getLocation(), Tok.getIdentifierInfo());
+ ConsumeToken();
+
+ if (Tok.isNot(tok::period))
+ return false;
+
+ ConsumeToken();
+ }
+}
+
+/// [cpp.pre]/p2:
+/// A preprocessing directive consists of a sequence of preprocessing tokens
+/// that satisfies the following constraints: At the start of translation phase
+/// 4, the first preprocessing token in the sequence, referred to as a
+/// directive-introducing token, begins with the first character in the source
+/// file (optionally after whitespace containing no new-line characters) or
+/// follows whitespace containing at least one new-line character, and is:
+/// - a # preprocessing token, or
+/// - an import preprocessing token immediately followed on the same logical
+/// source line by a header-name, <, identifier, or : preprocessing token, or
+/// - a module preprocessing token immediately followed on the same logical
+/// source line by an identifier, :, or ; preprocessing token, or
+/// - an export preprocessing token immediately followed on the same logical
+/// source line by one of the two preceding forms.
+///
+///
+/// At the start of phase 4 an import or module token is treated as starting a
+/// directive and are converted to their respective keywords iff:
+/// - After skipping horizontal whitespace are
+/// - at the start of a logical line, or
+/// - preceded by an 'export' at the start of the logical line.
+/// - Are followed by an identifier pp token (before macro expansion), or
+/// - <, ", or : (but not ::) pp tokens for 'import', or
+/// - ; for 'module'
+/// Otherwise the token is treated as an identifier.
+bool Preprocessor::HandleModuleContextualKeyword(
+ Token &Result, bool TokAtPhysicalStartOfLine) {
+ if (!getLangOpts().CPlusPlusModules || !Result.isModuleContextualKeyword())
+ return false;
+
+ if (Result.is(tok::kw_export)) {
+ LastTokenWasExportKeyword = {Result, TokAtPhysicalStartOfLine};
+ return false;
+ }
+
+ /// Trait 'module' and 'import' as a identifier when the main file is a
+ /// preprocessed module file. We only allow '__preprocessed_module' and
+ /// '__preprocessed_import' in this context.
+ IdentifierInfo *II = Result.getIdentifierInfo();
+ if (isPreprocessedModuleFile() &&
+ (II->isStr(tok::getKeywordSpelling(tok::kw_import)) ||
+ II->isStr(tok::getKeywordSpelling(tok::kw_module))))
+ return false;
+
+ if (LastTokenWasExportKeyword.isValid()) {
+ // The export keyword was not at the start of line, it's not a
+ // directive-introducing token.
+ if (!LastTokenWasExportKeyword.isAtPhysicalStartOfLine())
+ return false;
+ // [cpp.pre]/1.4
+ // export // not a preprocessing directive
+ // import foo; // preprocessing directive (ill-formed at phase7)
+ if (TokAtPhysicalStartOfLine)
+ return false;
+ } else if (!TokAtPhysicalStartOfLine)
+ return false;
+
+ llvm::SaveAndRestore<bool> SavedParsingPreprocessorDirective(
+ CurPPLexer->ParsingPreprocessorDirective, true);
+
+ // The next token may be an angled string literal after import keyword.
+ llvm::SaveAndRestore<bool> SavedParsingFilemame(
+ CurPPLexer->ParsingFilename,
+ Result.getIdentifierInfo()->isImportKeyword());
+
+ std::optional<Token> NextTok =
+ CurLexer ? CurLexer->peekNextPPToken() : CurTokenLexer->peekNextPPToken();
+ if (!NextTok)
+ return false;
+
+ if (NextTok->is(tok::raw_identifier))
+ LookUpIdentifierInfo(*NextTok);
+
+ if (Result.getIdentifierInfo()->isImportKeyword()) {
+ if (NextTok->isOneOf(tok::identifier, tok::less, tok::colon,
+ tok::header_name)) {
+ Result.setKind(tok::kw_import);
+ ModuleImportLoc = Result.getLocation();
+ IsAtImport = false;
+ return true;
+ }
+ }
+
+ if (Result.getIdentifierInfo()->isModuleKeyword() &&
+ NextTok->isOneOf(tok::identifier, tok::colon, tok::semi)) {
+ Result.setKind(tok::kw_module);
+ ModuleDeclLoc = Result.getLocation();
+ return true;
+ }
+
+ // Ok, it's an identifier.
+ return false;
+}
+
+bool Preprocessor::CollectPPImportSuffixAndEnterStream(
+ SmallVectorImpl<Token> &Toks, bool StopUntilEOD) {
+ CollectPPImportSuffix(Toks);
+ EnterModuleSuffixTokenStream(Toks);
+ return false;
+}
+
/// Collect the tokens of a C++20 pp-import-suffix.
-void Preprocessor::CollectPpImportSuffix(SmallVectorImpl<Token> &Toks) {
- // FIXME: For error recovery, consider recognizing attribute syntax here
- // and terminating / diagnosing a missing semicolon if we find anything
- // else? (Can we leave that to the parser?)
- unsigned BracketDepth = 0;
+void Preprocessor::CollectPPImportSuffix(SmallVectorImpl<Token> &Toks,
+ bool StopUntilEOD) {
while (true) {
Toks.emplace_back();
Lex(Toks.back());
switch (Toks.back().getKind()) {
- case tok::l_paren: case tok::l_square: case tok::l_brace:
- ++BracketDepth;
- break;
-
- case tok::r_paren: case tok::r_square: case tok::r_brace:
- if (BracketDepth == 0)
- return;
- --BracketDepth;
- break;
-
case tok::semi:
- if (BracketDepth == 0)
+ if (!StopUntilEOD)
return;
- break;
-
+ [[fallthrough]];
+ case tok::eod:
case tok::eof:
return;
-
default:
break;
}
}
}
+// Allocate a holding buffer for a sequence of tokens and introduce it into
+// the token stream.
+void Preprocessor::EnterModuleSuffixTokenStream(ArrayRef<Token> Toks) {
+ if (Toks.empty())
+ return;
+ auto ToksCopy = std::make_unique<Token[]>(Toks.size());
+ std::copy(Toks.begin(), Toks.end(), ToksCopy.get());
+ EnterTokenStream(std::move(ToksCopy), Toks.size(),
+ /*DisableMacroExpansion*/ false, /*IsReinject*/ false);
+ assert(CurTokenLexer && "Must have a TokenLexer");
+ CurTokenLexer->setLexingCXXModuleDirective();
+}
/// Lex a token following the 'import' contextual keyword.
///
@@ -1178,186 +1391,47 @@ bool Preprocessor::LexAfterModuleImport(Token &Result) {
// Figure out what kind of lexer we actually have.
recomputeCurLexerKind();
- // Lex the next token. The header-name lexing rules are used at the start of
- // a pp-import.
- //
- // For now, we only support header-name imports in C++20 mode.
- // FIXME: Should we allow this in all language modes that support an import
- // declaration as an extension?
- if (NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules) {
- if (LexHeaderName(Result))
- return true;
-
- if (Result.is(tok::colon) && ModuleDeclState.isNamedModule()) {
- std::string Name = ModuleDeclState.getPrimaryName().str();
- Name += ":";
- NamedModuleImportPath.emplace_back(Result.getLocation(),
- getIdentifierInfo(Name));
- CurLexerCallback = CLK_LexAfterModuleImport;
- return true;
- }
- } else {
- Lex(Result);
- }
-
- // Allocate a holding buffer for a sequence of tokens and introduce it into
- // the token stream.
- auto EnterTokens = [this](ArrayRef<Token> Toks) {
- auto ToksCopy = std::make_unique<Token[]>(Toks.size());
- std::copy(Toks.begin(), Toks.end(), ToksCopy.get());
- EnterTokenStream(std::move(ToksCopy), Toks.size(),
- /*DisableMacroExpansion*/ true, /*IsReinject*/ false);
- };
-
- bool ImportingHeader = Result.is(tok::header_name);
- // Check for a header-name.
SmallVector<Token, 32> Suffix;
- if (ImportingHeader) {
- // Enter the header-name token into the token stream; a Lex action cannot
- // both return a token and cache tokens (doing so would corrupt the token
- // cache if the call to Lex comes from CachingLex / PeekAhead).
- Suffix.push_back(Result);
-
- // Consume the pp-import-suffix and expand any macros in it now. We'll add
- // it back into the token stream later.
- CollectPpImportSuffix(Suffix);
- if (Suffix.back().isNot(tok::semi)) {
- // This is not a pp-import after all.
- EnterTokens(Suffix);
- return false;
- }
-
- // C++2a [cpp.module]p1:
- // The ';' preprocessing-token terminating a pp-import shall not have
- // been produced by macro replacement.
- SourceLocation SemiLoc = Suffix.back().getLocation();
- if (SemiLoc.isMacroID())
- Diag(SemiLoc, diag::err_header_import_semi_in_macro);
-
- // Reconstitute the import token.
- Token ImportTok;
- ImportTok.startToken();
- ImportTok.setKind(tok::kw_import);
- ImportTok.setLocation(ModuleImportLoc);
- ImportTok.setIdentifierInfo(getIdentifierInfo("import"));
- ImportTok.setLength(6);
-
- auto Action = HandleHeaderIncludeOrImport(
- /*HashLoc*/ SourceLocation(), ImportTok, Suffix.front(), SemiLoc);
- switch (Action.Kind) {
- case ImportAction::None:
- break;
-
- case ImportAction::ModuleBegin:
- // Let the parser know we're textually entering the module.
- Suffix.emplace_back();
- Suffix.back().startToken();
- Suffix.back().setKind(tok::annot_module_begin);
- Suffix.back().setLocation(SemiLoc);
- Suffix.back().setAnnotationEndLoc(SemiLoc);
- Suffix.back().setAnnotationValue(Action.ModuleForHeader);
- [[fallthrough]];
-
- case ImportAction::ModuleImport:
- case ImportAction::HeaderUnitImport:
- case ImportAction::SkippedModuleImport:
- // We chose to import (or textually enter) the file. Convert the
- // header-name token into a header unit annotation token.
- Suffix[0].setKind(tok::annot_header_unit);
- Suffix[0].setAnnotationEndLoc(Suffix[0].getLocation());
- Suffix[0].setAnnotationValue(Action.ModuleForHeader);
- // FIXME: Call the moduleImport callback?
- break;
- case ImportAction::Failure:
- assert(TheModuleLoader.HadFatalFailure &&
- "This should be an early exit only to a fatal error");
- Result.setKind(tok::eof);
- CurLexer->cutOffLexing();
- EnterTokens(Suffix);
- return true;
- }
-
- EnterTokens(Suffix);
- return false;
- }
-
- // The token sequence
- //
- // import identifier (. identifier)*
- //
- // indicates a module import directive. We already saw the 'import'
- // contextual keyword, so now we're looking for the identifiers.
- if (ModuleImportExpectsIdentifier && Result.getKind() == tok::identifier) {
- // We expected to see an identifier here, and we did; continue handling
- // identifiers.
- NamedModuleImportPath.emplace_back(Result.getLocation(),
- Result.getIdentifierInfo());
- ModuleImportExpectsIdentifier = false;
- CurLexerCallback = CLK_LexAfterModuleImport;
- return true;
- }
-
- // If we're expecting a '.' or a ';', and we got a '.', then wait until we
- // see the next identifier. (We can also see a '[[' that begins an
- // attribute-specifier-seq here under the Standard C++ Modules.)
- if (!ModuleImportExpectsIdentifier && Result.getKind() == tok::period) {
- ModuleImportExpectsIdentifier = true;
- CurLexerCallback = CLK_LexAfterModuleImport;
- return true;
- }
-
- // If we didn't recognize a module name at all, this is not a (valid) import.
- if (NamedModuleImportPath.empty() || Result.is(tok::eof))
- return true;
+ SmallVector<IdentifierLoc, 3> Path;
+ Lex(Result);
+ if (LexModuleNameContinue(Result, ModuleImportLoc, Suffix, Path))
+ return CollectPPImportSuffixAndEnterStream(Suffix);
+
+ ModuleNameLoc *NameLoc = ModuleNameLoc::Create(*this, Path);
+ Suffix.clear();
+ Suffix.emplace_back();
+ Suffix.back().setKind(tok::annot_module_name);
+ Suffix.back().setAnnotationRange(NameLoc->getRange());
+ Suffix.back().setAnnotationValue(static_cast<void *>(NameLoc));
+ Suffix.push_back(Result);
// Consume the pp-import-suffix and expand any macros in it now, if we're not
// at the semicolon already.
SourceLocation SemiLoc = Result.getLocation();
- if (Result.isNot(tok::semi)) {
- Suffix.push_back(Result);
- CollectPpImportSuffix(Suffix);
+ if (Suffix.back().isNot(tok::semi)) {
+ if (Suffix.back().isNot(tok::eof))
+ CollectPPImportSuffix(Suffix);
if (Suffix.back().isNot(tok::semi)) {
// This is not an import after all.
- EnterTokens(Suffix);
+ EnterModuleSuffixTokenStream(Suffix);
return false;
}
SemiLoc = Suffix.back().getLocation();
}
- // Under the standard C++ Modules, the dot is just part of the module name,
- // and not a real hierarchy separator. Flatten such module names now.
- //
- // FIXME: Is this the right level to be performing this transformation?
- std::string FlatModuleName;
- if (getLangOpts().CPlusPlusModules) {
- for (auto &Piece : NamedModuleImportPath) {
- // If the FlatModuleName ends with colon, it implies it is a partition.
- if (!FlatModuleName.empty() && FlatModuleName.back() != ':')
- FlatModuleName += ".";
- FlatModuleName += Piece.getIdentifierInfo()->getName();
- }
- SourceLocation FirstPathLoc = NamedModuleImportPath[0].getLoc();
- NamedModuleImportPath.clear();
- NamedModuleImportPath.emplace_back(FirstPathLoc,
- getIdentifierInfo(FlatModuleName));
- }
-
Module *Imported = nullptr;
- // We don't/shouldn't load the standard c++20 modules when preprocessing.
- if (getLangOpts().Modules && !isInImportingCXXNamedModules()) {
- Imported = TheModuleLoader.loadModule(ModuleImportLoc,
- NamedModuleImportPath,
- Module::Hidden,
+ if (getLangOpts().Modules) {
+ Imported = TheModuleLoader.loadModule(ModuleImportLoc, Path, Module::Hidden,
/*IsInclusionDirective=*/false);
if (Imported)
makeModuleVisible(Imported, SemiLoc);
}
if (Callbacks)
- Callbacks->moduleImport(ModuleImportLoc, NamedModuleImportPath, Imported);
+ Callbacks->moduleImport(ModuleImportLoc, Path, Imported);
if (!Suffix.empty()) {
- EnterTokens(Suffix);
+ EnterModuleSuffixTokenStream(Suffix);
return false;
}
return true;
diff --git a/clang/lib/Lex/TokenConcatenation.cpp b/clang/lib/Lex/TokenConcatenation.cpp
index 05f4203bd722b..f94caee24dc11 100644
--- a/clang/lib/Lex/TokenConcatenation.cpp
+++ b/clang/lib/Lex/TokenConcatenation.cpp
@@ -161,7 +161,8 @@ bool TokenConcatenation::AvoidConcat(const Token &PrevPrevTok,
const Token &PrevTok,
const Token &Tok) const {
// No space is required between header unit name in quote and semi.
- if (PrevTok.is(tok::annot_header_unit) && Tok.is(tok::semi))
+ if (PrevTok.isOneOf(tok::annot_header_unit, tok::annot_module_name) &&
+ Tok.is(tok::semi))
return false;
// Conservatively assume that every annotation token that has a printable
@@ -197,11 +198,12 @@ bool TokenConcatenation::AvoidConcat(const Token &PrevPrevTok,
if (Tok.isAnnotation()) {
// Modules annotation can show up when generated automatically for includes.
assert(Tok.isOneOf(tok::annot_module_include, tok::annot_module_begin,
- tok::annot_module_end, tok::annot_embed) &&
+ tok::annot_module_end, tok::annot_embed,
+ tok::annot_module_name) &&
"unexpected annotation in AvoidConcat");
ConcatInfo = 0;
- if (Tok.is(tok::annot_embed))
+ if (Tok.isOneOf(tok::annot_embed, tok::annot_module_name))
return true;
}
diff --git a/clang/lib/Lex/TokenLexer.cpp b/clang/lib/Lex/TokenLexer.cpp
index 47f4134fb1465..db4313f766812 100644
--- a/clang/lib/Lex/TokenLexer.cpp
+++ b/clang/lib/Lex/TokenLexer.cpp
@@ -57,6 +57,7 @@ void TokenLexer::Init(Token &Tok, SourceLocation ELEnd, MacroInfo *MI,
IsReinject = false;
NumTokens = Macro->tokens_end()-Macro->tokens_begin();
MacroExpansionStart = SourceLocation();
+ LexingCXXModuleDirective = false;
SourceManager &SM = PP.getSourceManager();
MacroStartSLocOffset = SM.getNextLocalOffset();
@@ -113,6 +114,7 @@ void TokenLexer::Init(const Token *TokArray, unsigned NumToks,
HasLeadingSpace = false;
NextTokGetsSpace = false;
MacroExpansionStart = SourceLocation();
+ LexingCXXModuleDirective = false;
// Set HasLeadingSpace/AtStartOfLine so that the first token will be
// returned unmodified.
@@ -625,6 +627,18 @@ bool TokenLexer::Lex(Token &Tok) {
// that it is no longer being expanded.
if (Macro) Macro->EnableMacro();
+ // CWG2947: Allow the following code:
+ //
+ // export module m; int x;
+ // extern "C++" int *y = &x;
+ //
+ // The 'extern' token should has 'StartOfLine' flag when current TokenLexer
+ // exits and propagate line start/leading space info.
+ if (!Macro && isLexingCXXModuleDirective()) {
+ AtStartOfLine = true;
+ setLexingCXXModuleDirective(false);
+ }
+
Tok.startToken();
Tok.setFlagValue(Token::StartOfLine , AtStartOfLine);
Tok.setFlagValue(Token::LeadingSpace, HasLeadingSpace || NextTokGetsSpace);
@@ -699,7 +713,9 @@ bool TokenLexer::Lex(Token &Tok) {
HasLeadingSpace = false;
// Handle recursive expansion!
- if (!Tok.isAnnotation() && Tok.getIdentifierInfo() != nullptr) {
+ if (!Tok.isAnnotation() && Tok.getIdentifierInfo() != nullptr &&
+ (!PP.getLangOpts().CPlusPlusModules ||
+ !Tok.isModuleContextualKeyword())) {
// Change the kind of this identifier to the appropriate token kind, e.g.
// turning "for" into a keyword.
IdentifierInfo *II = Tok.getIdentifierInfo();
@@ -947,6 +963,18 @@ bool TokenLexer::isParsingPreprocessorDirective() const {
return Tokens[NumTokens-1].is(tok::eod) && !isAtEnd();
}
+/// setLexingCXXModuleDirective - This is set to true if this TokenLexer is
+/// created when handling C++ module directive.
+void TokenLexer::setLexingCXXModuleDirective(bool Val) {
+ LexingCXXModuleDirective = Val;
+}
+
+/// isLexingCXXModuleDirective - Return true if we are lexing a C++ module or
+/// import directive.
+bool TokenLexer::isLexingCXXModuleDirective() const {
+ return LexingCXXModuleDirective;
+}
+
/// HandleMicrosoftCommentPaste - In microsoft compatibility mode, /##/ pastes
/// together to form a comment that comments out everything in the current
/// macro, other active macros, and anything left on the current physical
diff --git a/clang/lib/Parse/Parser.cpp b/clang/lib/Parse/Parser.cpp
index 8f6f023dd79d0..af3ba7853820f 100644
--- a/clang/lib/Parse/Parser.cpp
+++ b/clang/lib/Parse/Parser.cpp
@@ -17,6 +17,9 @@
#include "clang/AST/DeclTemplate.h"
#include "clang/Basic/DiagnosticParse.h"
#include "clang/Basic/StackExhaustionHandler.h"
+#include "clang/Basic/TokenKinds.h"
+#include "clang/Lex/ModuleLoader.h"
+#include "clang/Lex/Preprocessor.h"
#include "clang/Parse/RAIIObjectsForParser.h"
#include "clang/Sema/DeclSpec.h"
#include "clang/Sema/EnterExpressionEvaluationContext.h"
@@ -515,8 +518,6 @@ void Parser::Initialize() {
Ident_abstract = nullptr;
Ident_override = nullptr;
Ident_GNU_final = nullptr;
- Ident_import = nullptr;
- Ident_module = nullptr;
Ident_super = &PP.getIdentifierTable().get("super");
@@ -572,11 +573,6 @@ void Parser::Initialize() {
PP.SetPoisonReason(Ident_AbnormalTermination,diag::err_seh___finally_block);
}
- if (getLangOpts().CPlusPlusModules) {
- Ident_import = PP.getIdentifierInfo("import");
- Ident_module = PP.getIdentifierInfo("module");
- }
-
Actions.Initialize();
// Prime the lexer look-ahead.
@@ -624,25 +620,8 @@ bool Parser::ParseTopLevelDecl(DeclGroupPtrTy &Result,
switch (NextToken().getKind()) {
case tok::kw_module:
goto module_decl;
-
- // Note: no need to handle kw_import here. We only form kw_import under
- // the Standard C++ Modules, and in that case 'export import' is parsed as
- // an export-declaration containing an import-declaration.
-
- // Recognize context-sensitive C++20 'export module' and 'export import'
- // declarations.
- case tok::identifier: {
- IdentifierInfo *II = NextToken().getIdentifierInfo();
- if ((II == Ident_module || II == Ident_import) &&
- GetLookAheadToken(2).isNot(tok::coloncolon)) {
- if (II == Ident_module)
- goto module_decl;
- else
- goto import_decl;
- }
- break;
- }
-
+ case tok::kw_import:
+ goto import_decl;
default:
break;
}
@@ -710,22 +689,6 @@ bool Parser::ParseTopLevelDecl(DeclGroupPtrTy &Result,
Actions.ActOnEndOfTranslationUnit();
//else don't tell Sema that we ended parsing: more input might come.
return true;
-
- case tok::identifier:
- // C++2a [basic.link]p3:
- // A token sequence beginning with 'export[opt] module' or
- // 'export[opt] import' and not immediately followed by '::'
- // is never interpreted as the declaration of a top-level-declaration.
- if ((Tok.getIdentifierInfo() == Ident_module ||
- Tok.getIdentifierInfo() == Ident_import) &&
- NextToken().isNot(tok::coloncolon)) {
- if (Tok.getIdentifierInfo() == Ident_module)
- goto module_decl;
- else
- goto import_decl;
- }
- break;
-
default:
break;
}
@@ -918,8 +881,10 @@ Parser::ParseExternalDeclaration(ParsedAttributes &Attrs,
case tok::kw_import: {
Sema::ModuleImportState IS = Sema::ModuleImportState::NotACXX20Module;
if (getLangOpts().CPlusPlusModules) {
- llvm_unreachable("not expecting a c++20 import here");
- ProhibitAttributes(Attrs);
+ Diag(Tok, diag::err_unexpected_module_or_import_decl)
+ << /*IsImport*/ true;
+ SkipUntil(tok::semi);
+ return nullptr;
}
SingleDecl = ParseModuleImport(SourceLocation(), IS);
} break;
@@ -1011,7 +976,7 @@ Parser::ParseExternalDeclaration(ParsedAttributes &Attrs,
return nullptr;
case tok::kw_module:
- Diag(Tok, diag::err_unexpected_module_decl);
+ Diag(Tok, diag::err_unexpected_module_or_import_decl) << /*IsImport*/ false;
SkipUntil(tok::semi);
return nullptr;
@@ -2231,6 +2196,11 @@ void Parser::CodeCompleteNaturalLanguage() {
Actions.CodeCompletion().CodeCompleteNaturalLanguage();
}
+void Parser::CodeCompleteModuleImport(SourceLocation ImportLoc,
+ ModuleIdPath Path) {
+ Actions.CodeCompletion().CodeCompleteModuleImport(ImportLoc, Path);
+}
+
bool Parser::ParseMicrosoftIfExistsCondition(IfExistsCondition& Result) {
assert((Tok.is(tok::kw___if_exists) || Tok.is(tok::kw___if_not_exists)) &&
"Expected '__if_exists' or '__if_not_exists'");
@@ -2342,10 +2312,8 @@ Parser::ParseModuleDecl(Sema::ModuleImportState &ImportState) {
? Sema::ModuleDeclKind::Interface
: Sema::ModuleDeclKind::Implementation;
- assert(
- (Tok.is(tok::kw_module) ||
- (Tok.is(tok::identifier) && Tok.getIdentifierInfo() == Ident_module)) &&
- "not a module declaration");
+ assert(Tok.is(tok::kw_module) && "not a module declaration");
+
SourceLocation ModuleLoc = ConsumeToken();
// Attributes appear after the module name, not before.
@@ -2402,6 +2370,10 @@ Parser::ParseModuleDecl(Sema::ModuleImportState &ImportState) {
return nullptr;
}
+ // This should already diagnosed in phase 4, just skip unil semicolon.
+ if (!Tok.isOneOf(tok::semi, tok::l_square))
+ SkipUntil(tok::semi, SkipUntilFlags::StopBeforeMatch);
+
// We don't support any module attributes yet; just parse them and diagnose.
ParsedAttributes Attrs(AttrFactory);
MaybeParseCXX11Attributes(Attrs);
@@ -2410,7 +2382,9 @@ Parser::ParseModuleDecl(Sema::ModuleImportState &ImportState) {
/*DiagnoseEmptyAttrs=*/false,
/*WarnOnUnknownAttrs=*/true);
- ExpectAndConsumeSemi(diag::err_module_expected_semi);
+ if (ExpectAndConsumeSemi(diag::err_expected_semi_after_module_or_import,
+ tok::getKeywordSpelling(tok::kw_module)))
+ SkipUntil(tok::semi);
return Actions.ActOnModuleDecl(StartLoc, ModuleLoc, MDK, Path, Partition,
ImportState,
@@ -2424,7 +2398,7 @@ Decl *Parser::ParseModuleImport(SourceLocation AtLoc,
SourceLocation ExportLoc;
TryConsumeToken(tok::kw_export, ExportLoc);
- assert((AtLoc.isInvalid() ? Tok.isOneOf(tok::kw_import, tok::identifier)
+ assert((AtLoc.isInvalid() ? Tok.is(tok::kw_import)
: Tok.isObjCAtKeyword(tok::objc_import)) &&
"Improper start to module import");
bool IsObjCAtImport = Tok.isObjCAtKeyword(tok::objc_import);
@@ -2449,12 +2423,12 @@ Decl *Parser::ParseModuleImport(SourceLocation AtLoc,
Diag(ColonLoc, diag::err_unsupported_module_partition)
<< SourceRange(ColonLoc, Path.back().getLoc());
// Recover by leaving partition empty.
- else if (ParseModuleName(ColonLoc, Path, /*IsImport*/ true))
+ else if (ParseModuleName(ColonLoc, Path, /*IsImport=*/true))
return nullptr;
else
IsPartition = true;
} else {
- if (ParseModuleName(ImportLoc, Path, /*IsImport*/ true))
+ if (ParseModuleName(ImportLoc, Path, /*IsImport=*/true))
return nullptr;
}
@@ -2514,8 +2488,17 @@ Decl *Parser::ParseModuleImport(SourceLocation AtLoc,
SeenError = false;
break;
}
- ExpectAndConsumeSemi(diag::err_module_expected_semi);
- TryConsumeToken(tok::eod);
+
+ bool LexedSemi = false;
+ if (getLangOpts().CPlusPlusModules)
+ LexedSemi =
+ !ExpectAndConsumeSemi(diag::err_expected_semi_after_module_or_import,
+ tok::getKeywordSpelling(tok::kw_import));
+ else
+ LexedSemi = !ExpectAndConsumeSemi(diag::err_module_expected_semi);
+
+ if (!LexedSemi)
+ SkipUntil(tok::semi);
if (SeenError)
return nullptr;
@@ -2546,29 +2529,16 @@ Decl *Parser::ParseModuleImport(SourceLocation AtLoc,
bool Parser::ParseModuleName(SourceLocation UseLoc,
SmallVectorImpl<IdentifierLoc> &Path,
bool IsImport) {
- // Parse the module path.
- while (true) {
- if (!Tok.is(tok::identifier)) {
- if (Tok.is(tok::code_completion)) {
- cutOffParsing();
- Actions.CodeCompletion().CodeCompleteModuleImport(UseLoc, Path);
- return true;
- }
-
- Diag(Tok, diag::err_module_expected_ident) << IsImport;
- SkipUntil(tok::semi);
- return true;
- }
-
- // Record this part of the module path.
- Path.emplace_back(Tok.getLocation(), Tok.getIdentifierInfo());
- ConsumeToken();
-
- if (Tok.isNot(tok::period))
- return false;
-
- ConsumeToken();
+ if (Tok.isNot(tok::annot_module_name)) {
+ SkipUntil(tok::semi);
+ return true;
}
+ ModuleNameLoc *NameLoc =
+ static_cast<ModuleNameLoc *>(Tok.getAnnotationValue());
+ Path.assign(NameLoc->getModuleIdPath().begin(),
+ NameLoc->getModuleIdPath().end());
+ ConsumeAnnotationToken();
+ return false;
}
bool Parser::parseMisplacedModuleImport() {
diff --git a/clang/lib/Sema/SemaModule.cpp b/clang/lib/Sema/SemaModule.cpp
index 24275b97b7462..8cb684fd5ae3b 100644
--- a/clang/lib/Sema/SemaModule.cpp
+++ b/clang/lib/Sema/SemaModule.cpp
@@ -59,23 +59,6 @@ static void checkModuleImportContext(Sema &S, Module *M,
}
}
-// We represent the primary and partition names as 'Paths' which are sections
-// of the hierarchical access path for a clang module. However for C++20
-// the periods in a name are just another character, and we will need to
-// flatten them into a string.
-static std::string stringFromPath(ModuleIdPath Path) {
- std::string Name;
- if (Path.empty())
- return Name;
-
- for (auto &Piece : Path) {
- if (!Name.empty())
- Name += ".";
- Name += Piece.getIdentifierInfo()->getName();
- }
- return Name;
-}
-
/// Helper function for makeTransitiveImportsVisible to decide whether
/// the \param Imported module unit is in the same module with the \param
/// CurrentModule.
@@ -306,7 +289,7 @@ Sema::ActOnModuleDecl(SourceLocation StartLoc, SourceLocation ModuleLoc,
// We were asked to compile a module interface unit but this is a module
// implementation unit.
Diag(ModuleLoc, diag::err_module_interface_implementation_mismatch)
- << FixItHint::CreateInsertion(ModuleLoc, "export ");
+ << FixItHint::CreateInsertion(ModuleLoc, "export ");
MDK = ModuleDeclKind::Interface;
break;
@@ -373,10 +356,10 @@ Sema::ActOnModuleDecl(SourceLocation StartLoc, SourceLocation ModuleLoc,
// Flatten the dots in a module name. Unlike Clang's hierarchical module map
// modules, the dots here are just another character that can appear in a
// module name.
- std::string ModuleName = stringFromPath(Path);
+ std::string ModuleName = ModuleLoader::getFlatNameFromPath(Path);
if (IsPartition) {
ModuleName += ":";
- ModuleName += stringFromPath(Partition);
+ ModuleName += ModuleLoader::getFlatNameFromPath(Partition);
}
// If a module name was explicitly specified on the command line, it must be
// correct.
@@ -389,7 +372,7 @@ Sema::ActOnModuleDecl(SourceLocation StartLoc, SourceLocation ModuleLoc,
<< getLangOpts().CurrentModule;
return nullptr;
}
- const_cast<LangOptions&>(getLangOpts()).CurrentModule = ModuleName;
+ const_cast<LangOptions &>(getLangOpts()).CurrentModule = ModuleName;
auto &Map = PP.getHeaderSearchInfo().getModuleMap();
Module *Mod; // The module we are creating.
@@ -434,7 +417,7 @@ Sema::ActOnModuleDecl(SourceLocation StartLoc, SourceLocation ModuleLoc,
Interface = getModuleLoader().loadModule(ModuleLoc, {ModuleNameLoc},
Module::AllVisible,
/*IsInclusionDirective=*/false);
- const_cast<LangOptions&>(getLangOpts()).CurrentModule = ModuleName;
+ const_cast<LangOptions &>(getLangOpts()).CurrentModule = ModuleName;
if (!Interface) {
Diag(ModuleLoc, diag::err_module_not_defined) << ModuleName;
@@ -597,12 +580,12 @@ DeclResult Sema::ActOnModuleImport(SourceLocation StartLoc,
// otherwise, the name of the importing named module.
ModuleName = NamedMod->getPrimaryModuleInterfaceName().str();
ModuleName += ":";
- ModuleName += stringFromPath(Path);
+ ModuleName += ModuleLoader::getFlatNameFromPath(Path);
ModuleNameLoc =
IdentifierLoc(Path[0].getLoc(), PP.getIdentifierInfo(ModuleName));
Path = ModuleIdPath(ModuleNameLoc);
} else if (getLangOpts().CPlusPlusModules) {
- ModuleName = stringFromPath(Path);
+ ModuleName = ModuleLoader::getFlatNameFromPath(Path);
ModuleNameLoc =
IdentifierLoc(Path[0].getLoc(), PP.getIdentifierInfo(ModuleName));
Path = ModuleIdPath(ModuleNameLoc);
diff --git a/clang/test/CXX/basic/basic.link/p3.cpp b/clang/test/CXX/basic/basic.link/p3.cpp
index e6633a777ddef..bc3622c7bbd64 100644
--- a/clang/test/CXX/basic/basic.link/p3.cpp
+++ b/clang/test/CXX/basic/basic.link/p3.cpp
@@ -13,7 +13,8 @@ struct module { struct inner {}; };
constexpr int n = 123;
export module m; // #1
-module y = {}; // expected-error {{multiple module declarations}} expected-error 2{{}}
+module y = {}; // expected-error {{multiple module declarations}}
+// expected-error at -1 {{unexpected preprocessing token '=' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
// expected-note@#1 {{previous module declaration}}
::import x = {};
@@ -23,8 +24,8 @@ import::inner xi = {};
module::inner yi = {};
namespace N {
- module a;
- import b;
+ module a; // expected-error {{module declaration can only appear at the top level}}
+ import b; // expected-error {{import declaration can only appear at the top level}}
}
extern "C++" module cxxm;
@@ -45,10 +46,11 @@ constexpr int n = 123;
export module m; // #1
-import x = {}; // expected-error {{expected ';' after module name}}
+import x = {}; // expected-error {{import directive must end with a ';'}}
// expected-error at -1 {{module 'x' not found}}
//--- ImportError2.cpp
+// expected-no-diagnostics
module;
struct module { struct inner {}; };
@@ -63,7 +65,4 @@ template<> struct import<n> {
static X y;
};
-// This is not valid because the 'import <n>' is a pp-import, even though it
-// grammatically can't possibly be an import declaration.
-struct X {} import<n>::y; // expected-error {{'n' file not found}}
-
+struct X {} import<n>::y;
diff --git a/clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp b/clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp
index fd0038b3f7745..a57919f48afdd 100644
--- a/clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp
+++ b/clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp
@@ -107,4 +107,4 @@ void test_late() {
// expected-error at -2 {{undeclared identifier}}
internal_private = 1; // expected-error {{use of undeclared identifier 'internal_private'}}
-}
\ No newline at end of file
+}
diff --git a/clang/test/CXX/drs/cwg2947.cpp b/clang/test/CXX/drs/cwg2947.cpp
new file mode 100644
index 0000000000000..d6fba84c0ff3d
--- /dev/null
+++ b/clang/test/CXX/drs/cwg2947.cpp
@@ -0,0 +1,81 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_example1.cpp -D'DOT_BAR=.bar' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_example2.cpp -D'MOD_ATTR=[[vendor::shiny_module]]' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_example3.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_example4.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_example5.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_example6.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_ext1.cpp -verify -E | FileCheck %t/cwg2947_ext1.cpp
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_ext2.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/cwg2947_ext3.cpp -fsyntax-only -verify
+
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_example1.cpp -D'DOT_BAR=.bar' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_example2.cpp -D'MOD_ATTR=[[vendor::shiny_module]]' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_example3.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_example4.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_example5.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_example6.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_ext1.cpp -verify -E | FileCheck %t/cwg2947_ext1.cpp
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_ext2.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++23 %t/cwg2947_ext3.cpp -fsyntax-only -verify
+
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_example1.cpp -D'DOT_BAR=.bar' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_example2.cpp -D'MOD_ATTR=[[vendor::shiny_module]]' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_example3.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_example4.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_example5.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_example6.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_ext1.cpp -verify -E | FileCheck %t/cwg2947_ext1.cpp
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_ext2.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++26 %t/cwg2947_ext3.cpp -fsyntax-only -verify
+
+//--- cwg2947_example1.cpp
+// #define DOT_BAR .bar
+export module foo DOT_BAR; // error: expansion of DOT_BAR; does not begin with ; or [
+// expected-error at -1 {{unexpected preprocessing token '.' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
+
+//--- cwg2947_example2.cpp
+export module M MOD_ATTR; // OK
+// expected-warning at -1 {{unknown attribute 'vendor::shiny_module' ignored}}
+
+//--- cwg2947_example3.cpp
+export module a
+ .b; // error: preprocessing token after pp-module-name is not ; or [
+// expected-error at -1 {{unexpected preprocessing token '.' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
+
+//--- cwg2947_example4.cpp
+export module M [[
+ attr1,
+// expected-warning at -1 {{unknown attribute 'attr1' ignored}}
+ attr2 ]] ; // OK
+// expected-warning at -1 {{unknown attribute 'attr2' ignored}}
+
+//--- cwg2947_example5.cpp
+export module M
+ [[ attr1,
+// expected-warning at -1 {{unknown attribute 'attr1' ignored}}
+ attr2 ]] ; // OK
+// expected-warning at -1 {{unknown attribute 'attr2' ignored}}
+
+//--- cwg2947_example6.cpp
+export module M; int
+// expected-warning at -1 {{extra tokens after semicolon in 'module' directive}}
+ n; // OK
+
+//--- cwg2947_ext1.cpp
+// CHECK: export __preprocessed_module m; int x;
+// CHECK: extern "C++" int *y = &x;
+export module m; int x;
+// expected-warning at -1 {{extra tokens after semicolon in 'module' directive}}
+extern "C++" int *y = &x;
+
+//--- cwg2947_ext2.cpp
+export module x _Pragma("GCC warning \"Hi\"");
+// expected-warning at -1 {{Hi}}
+
+//--- cwg2947_ext3.cpp
+export module x; _Pragma("GCC warning \"hi\""); // expected-warning {{hi}}
+// expected-warning at -1 {{extra tokens after semicolon in 'module' directive}}
diff --git a/clang/test/CXX/lex/lex.pptoken/p3-2a.cpp b/clang/test/CXX/lex/lex.pptoken/p3-2a.cpp
index 0e0e5fec6e9d8..81af65481dc22 100644
--- a/clang/test/CXX/lex/lex.pptoken/p3-2a.cpp
+++ b/clang/test/CXX/lex/lex.pptoken/p3-2a.cpp
@@ -1,7 +1,7 @@
// RUN: not %clang_cc1 -std=c++2a -E -I%S/Inputs %s -o - | FileCheck %s --strict-whitespace --implicit-check-not=ERROR
// Check for context-sensitive header-name token formation.
-// CHECK: import <foo bar>;
+// CHECK: __preprocessed_import <foo bar>;
import <foo bar>;
// Not at the top level: these are each 8 tokens rather than 5.
@@ -12,59 +12,64 @@ import <foo bar>;
// CHECK: [ import <foo bar>; %>
[ import <foo bar>; %>
-// CHECK: import <foo bar>;
+// CHECK: __preprocessed_import <foo bar>;
import <foo bar>;
-// CHECK: foo; import <foo bar>;
+// CHECK: foo; import <foo bar>;
foo; import <foo bar>;
// CHECK: foo import <foo bar>;
foo import <foo bar>;
-// CHECK: import <foo bar> {{\[\[ ]]}};
+// CHECK: __preprocessed_import <foo bar> {{\[\[ ]]}};
import <foo bar> [[ ]];
-// CHECK: import <foo bar> import <foo bar>;
+// CHECK: __preprocessed_import <foo bar> import <foo bar>;
import <foo bar> import <foo bar>;
// FIXME: We do not form header-name tokens in the pp-import-suffix of a
// pp-import. Conforming programs can't tell the
diff erence.
-// CHECK: import <foo bar> {} import <foo bar>;
+// CHECK: __preprocessed_import <foo bar> {} import <foo bar>;
// FIXME: import <foo bar> {} import <foo bar>;
import <foo bar> {} import <foo bar>;
-// CHECK: export import <foo bar>;
+// CHECK: export __preprocessed_import <foo bar>;
export import <foo bar>;
// CHECK: export export import <foo bar>;
export export import <foo bar>;
#define UNBALANCED_PAREN (
-// CHECK: import <foo bar>;
+// CHECK: __preprocessed_import <foo bar>;
import <foo bar>;
UNBALANCED_PAREN
-// CHECK: import <foo bar>;
+// CHECK: __preprocessed_import <foo bar>;
import <foo bar>;
)
_Pragma("clang no_such_pragma (");
-// CHECK: import <foo bar>;
+// CHECK: __preprocessed_import <foo bar>;
import <foo bar>;
#define HEADER <foo bar>
-// CHECK: import <foo bar>;
+// CHECK: __preprocessed_import <foo bar>;
import HEADER;
-// CHECK: import <foo bar>;
+// CHECK: {{^}}foo{{$}}
+// CHECK-NEXT: {{^}} bar{{$}}
+// CHECK-NEXT: {{^}}>;{{$}}
import <
foo
bar
>;
// CHECK: import{{$}}
-// CHECK: {{^}}<foo bar>;
+// CHECK-NEXT: {{^}}<{{$}}
+// CHECK-NEXT: {{^}}foo{{$}}
+// CHECK-NEXT: {{^}} bar{{$}}
+// CHECK-NEXT: {{^}}>;{{$}}
import
<
foo
@@ -72,7 +77,7 @@ foo
>;
// CHECK: import{{$}}
-// CHECK: {{^}}<foo bar>;
+// CHECK: {{^}}<foo bar>;
import
<foo bar>;
diff --git a/clang/test/CXX/module/basic/basic.link/module-declaration.cpp b/clang/test/CXX/module/basic/basic.link/module-declaration.cpp
index 4bdcc9e5f278e..52ba1d9f82f2f 100644
--- a/clang/test/CXX/module/basic/basic.link/module-declaration.cpp
+++ b/clang/test/CXX/module/basic/basic.link/module-declaration.cpp
@@ -46,8 +46,8 @@ export module z;
export module x;
//--- invalid_module_name.cppm
-export module z elderberry; // expected-error {{expected ';'}} \
- // expected-error {{a type specifier is required}}
+export module z elderberry;
+// expected-error at -1 {{unexpected preprocessing token 'elderberry' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
//--- empty_attribute.cppm
// expected-no-diagnostics
diff --git a/clang/test/CXX/module/cpp.pre/p1.cpp b/clang/test/CXX/module/cpp.pre/p1.cpp
new file mode 100644
index 0000000000000..989915004ff57
--- /dev/null
+++ b/clang/test/CXX/module/cpp.pre/p1.cpp
@@ -0,0 +1,207 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+
+// RUN: %clang_cc1 -std=c++20 %t/hash.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/module.cpp -fsyntax-only -verify
+
+// RUN: %clang_cc1 -std=c++20 %t/rightpad.cppm -emit-module-interface -o %t/rightpad.pcm
+// RUN: %clang_cc1 -std=c++20 %t/M_part.cppm -emit-module-interface -o %t/M_part.pcm
+// RUN: %clang_cc1 -std=c++20 -xc++-system-header %t/string -emit-header-unit -o %t/string.pcm
+// RUN: %clang_cc1 -std=c++20 -xc++-user-header %t/squee -emit-header-unit -o %t/squee.pcm
+// RUN: %clang_cc1 -std=c++20 %t/import.cpp -isystem %t \
+// RUN: -fmodule-file=rightpad=%t/rightpad.pcm \
+// RUN: -fmodule-file=M:part=%t/M_part.pcm \
+// RUN: -fmodule-file=%t/string.pcm \
+// RUN: -fmodule-file=%t/squee.pcm \
+// RUN: -fsyntax-only -verify
+
+// RUN: %clang_cc1 -std=c++20 %t/module_decl_not_in_same_line.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/foo.cppm -emit-module-interface -o %t/foo.pcm
+// RUN: %clang_cc1 -std=c++20 %t/import_decl_not_in_same_line.cpp -fmodule-file=foo=%t/foo.pcm -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/not_import.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/import_spaceship.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/leading_empty_macro.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/operator_keyword_and.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/operator_keyword_and2.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/macro_in_module_decl_suffix.cpp -D'ATTR(X)=[[X]]' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/macro_in_module_decl_suffix2.cpp -D'ATTR(X)=[[X]]' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/extra_tokens_after_module_decl1.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/extra_tokens_after_module_decl2.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/object_like_macro_in_module_name.cpp -Dm=x -Dn=y -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/object_like_macro_in_partition_name.cpp -Dm=x -Dn=y -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/unexpected_character_in_pp_module_suffix.cpp -D'm(x)=x' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/semi_in_same_line.cpp -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/preprocessed_module_file.cpp -E | FileCheck %t/preprocessed_module_file.cpp
+// RUN: %clang_cc1 -std=c++20 %t/pedantic-errors.cpp -pedantic-errors -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/xcpp-output.cpp -fsyntax-only -verify -xc++-cpp-output
+// RUN: %clang_cc1 -std=c++20 %t/func_like_macro.cpp -D'm(x)=x' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/lparen.cpp -D'm(x)=x' -D'LPAREN=(' -fsyntax-only -verify
+// RUN: %clang_cc1 -std=c++20 %t/control_line.cpp -fsyntax-only -verify
+
+
+//--- hash.cpp
+// expected-no-diagnostics
+# // preprocessing directive
+
+//--- module.cpp
+// expected-no-diagnostics
+module ; // preprocessing directive
+export module leftpad; // preprocessing directive
+
+//--- string
+#ifndef STRING_H
+#define STRING_H
+#endif // STRING_H
+
+//--- squee
+#ifndef SQUEE_H
+#define SQUEE_H
+#endif
+
+//--- rightpad.cppm
+export module rightpad;
+
+//--- M_part.cppm
+export module M:part;
+
+//--- import.cpp
+export module M;
+import <string>; // expected-warning {{the implementation of header units is in an experimental phase}}
+export import "squee"; // expected-warning {{the implementation of header units is in an experimental phase}}
+import rightpad; // preprocessing directive
+import :part; // preprocessing directive
+
+//--- module_decl_not_in_same_line.cpp
+module // expected-error {{a type specifier is required for all declarations}}
+;export module M; // expected-error {{export declaration can only be used within a module interface}} \
+ // expected-error {{unknown type name 'module'}}
+
+//--- foo.cppm
+export module foo;
+
+//--- import_decl_not_in_same_line.cpp
+export module M;
+export
+import // expected-error {{unknown type name 'import'}}
+foo;
+
+export
+import foo; // expected-error {{unknown type name 'import'}}
+
+//--- not_import.cpp
+export module M;
+import :: // expected-error {{use of undeclared identifier 'import'}}
+import -> // expected-error {{cannot use arrow operator on a type}}
+
+//--- import_spaceship.cpp
+export module M;
+import <=>; // expected-error {{'=' file not found}}
+
+//--- leading_empty_macro.cpp
+// expected-no-diagnostics
+export module M;
+typedef int import;
+#define EMP
+EMP import m; // The phase 7 grammar should see import as a typedef-name.
+
+//--- operator_keyword_and.cpp
+// expected-no-diagnostics
+typedef int import;
+extern
+import and x;
+
+//--- operator_keyword_and2.cpp
+// expected-no-diagnostics
+typedef int module;
+extern
+module and x;
+
+//--- macro_in_module_decl_suffix.cpp
+export module m ATTR(x); // expected-warning {{unknown attribute 'x' ignored}}
+
+//--- macro_in_module_decl_suffix2.cpp
+export module m [[y]] ATTR(x); // expected-warning {{unknown attribute 'y' ignored}} \
+ // expected-warning {{unknown attribute 'x' ignored}}
+
+//--- extra_tokens_after_module_decl1.cpp
+module; int n; // expected-warning {{extra tokens after semicolon in 'module' directive}}
+import foo; int n1; // expected-warning {{extra tokens after semicolon in 'import' directive}}
+ // expected-error at -1 {{module 'foo' not found}}
+const int *p1 = &n1;
+
+
+//--- extra_tokens_after_module_decl2.cpp
+export module m; int n2 // expected-warning {{extra tokens after semicolon in 'module' directive}}
+;
+const int *p2 = &n2;
+
+
+//--- object_like_macro_in_module_name.cpp
+export module m.n;
+// expected-error at -1 {{module name component 'm' cannot be a object-like macro}}
+// expected-note@* {{macro 'm' defined here}}
+// expected-error at -3 {{module name component 'n' cannot be a object-like macro}}
+// expected-note@* {{macro 'n' defined here}}
+
+//--- object_like_macro_in_partition_name.cpp
+export module m:n;
+// expected-error at -1 {{module name component 'm' cannot be a object-like macro}}
+// expected-note@* {{macro 'm' defined here}}
+// expected-error at -3 {{partition name component 'n' cannot be a object-like macro}}
+// expected-note@* {{macro 'n' defined here}}
+
+//--- unexpected_character_in_pp_module_suffix.cpp
+export module m();
+// expected-error at -1 {{unexpected preprocessing token '(' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
+
+//--- semi_in_same_line.cpp
+export module m // OK
+[[]];
+
+import foo // expected-error {{module 'foo' not found}}
+;
+
+//--- preprocessed_module_file.cpp
+// CHECK: __preprocessed_module;
+// CHECK-NEXT: export __preprocessed_module M;
+// CHECK-NEXT: __preprocessed_import std;
+// CHECK-NEXT: export __preprocessed_import bar;
+// CHECK-NEXT: struct import {};
+// CHECK-EMPTY:
+// CHECK-NEXT: import foo;
+module;
+export module M;
+import std;
+export import bar;
+struct import {};
+#define EMPTY
+EMPTY import foo;
+
+//--- pedantic-errors.cpp
+export module m; int n; // expected-warning {{extra tokens after semicolon in 'module' directive}}
+
+//--- xcpp-output.cpp
+// expected-no-diagnostics
+typedef int module;
+module x;
+
+//--- func_like_macro.cpp
+// #define m(x) x
+export module m
+ (foo); // expected-error {{unexpected preprocessing token '(' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
+
+//--- lparen.cpp
+// #define m(x) x
+// #define LPAREN (
+export module m
+ LPAREN foo); // expected-error {{unexpected preprocessing token 'LPAREN' after module name, only ';' and '[' (start of attribute specifier sequence) are allowed}}
+
+//--- control_line.cpp
+#if 0 // #1
+export module m; // expected-error {{module directive lines are not allowed on lines controlled by preprocessor conditionals}}
+#else
+export module m; // expected-error {{module directive lines are not allowed on lines controlled by preprocessor conditionals}} \
+ // expected-error {{module declaration must occur at the start of the translation unit}} \
+ // expected-note@#1 {{add 'module;'}}
+#endif
diff --git a/clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm b/clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm
index f65f050a3c7bd..28fb1827eed3b 100644
--- a/clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm
+++ b/clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm
@@ -44,8 +44,8 @@ import x [[noreturn]]; // expected-error {{'noreturn' attribute cannot be applie
import x [[blarg::noreturn]]; // expected-warning-re {{unknown attribute 'blarg::noreturn' ignored{{.*}}}}
import x.y;
-import x.; // expected-error {{expected a module name after 'import'}}
-import .x; // expected-error {{expected a module name after 'import'}}
+import x.; // expected-error {{expected identifier after '.' in module name}}
+import .x; // expected-error {{unknown type name 'import'}} expected-error {{cannot use dot operator on a type}}
import blarg; // expected-error {{module 'blarg' not found}}
@@ -62,8 +62,8 @@ import x [[noreturn]]; // expected-error {{'noreturn' attribute cannot be applie
import x [[blarg::noreturn]]; // expected-warning-re {{unknown attribute 'blarg::noreturn' ignored{{.*}}}}
import x.y;
-import x.; // expected-error {{expected a module name after 'import'}}
-import .x; // expected-error {{expected a module name after 'import'}}
+import x.; // expected-error {{expected identifier after '.' in module name}}
+import .x; // expected-error {{unknown type name 'import'}} expected-error {{cannot use dot operator on a type}}
import blarg; // expected-error {{module 'blarg' not found}}
diff --git a/clang/test/Lexer/cxx20-module-directive.cpp b/clang/test/Lexer/cxx20-module-directive.cpp
new file mode 100644
index 0000000000000..e420ff4b11407
--- /dev/null
+++ b/clang/test/Lexer/cxx20-module-directive.cpp
@@ -0,0 +1,11 @@
+// RUN: %clang_cc1 -E -std=c++20 %s
+
+// CHECK: export __preprocessed_module M;
+// CHECK-NEXT: export __preprocessed_import K;
+// CHECK-NEXT: typedef int import;
+// CHECK: import m;
+export module M;
+export import K;
+typedef int import;
+#define EMP
+EMP import m;
diff --git a/clang/test/Modules/pr121066.cpp b/clang/test/Modules/pr121066.cpp
index e92a81c53d683..676c5225f2090 100644
--- a/clang/test/Modules/pr121066.cpp
+++ b/clang/test/Modules/pr121066.cpp
@@ -1,4 +1,6 @@
// RUN: %clang_cc1 -std=c++20 -fsyntax-only %s -verify
-import mod // expected-error {{expected ';' after module name}}
+// This import directive is ill-formed, it's missing an ';' after
+// module name, but we try to recovery from error and import the module.
+import mod // expected-error {{import directive must end with a ';'}}
// expected-error at -1 {{module 'mod' not found}}
diff --git a/clang/test/Modules/preprocess-named-modules.cppm b/clang/test/Modules/preprocess-named-modules.cppm
index 67a6cc384a1c7..5feb1772c145b 100644
--- a/clang/test/Modules/preprocess-named-modules.cppm
+++ b/clang/test/Modules/preprocess-named-modules.cppm
@@ -4,4 +4,4 @@
// RUN: %clang_cc1 -std=c++20 -E %s -o - | FileCheck %s
import non_exist_modules;
-// CHECK: import non_exist_modules;
+// CHECK: __preprocessed_import non_exist_modules;
diff --git a/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
index ac3981590fd11..00bac4e96e74d 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
+++ b/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
@@ -193,7 +193,8 @@ TEST_P(ASTMatchersTest, ExportDecl) {
if (!GetParam().isCXX20OrLater()) {
return;
}
- const std::string moduleHeader = "module;export module ast_matcher_test;";
+ const std::string moduleHeader =
+ "module;\n export module ast_matcher_test;\n";
EXPECT_TRUE(matches(moduleHeader + "export void foo();",
exportDecl(has(functionDecl()))));
EXPECT_TRUE(matches(moduleHeader + "export { void foo(); int v; }",
diff --git a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
index ddc87921ea084..79e2832798917 100644
--- a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
+++ b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
@@ -640,7 +640,7 @@ TEST(MinimizeSourceToDependencyDirectivesTest, AtImport) {
EXPECT_STREQ("@import A;\n", Out.data());
ASSERT_FALSE(minimizeSourceToDependencyDirectives("@import A\n;", Out));
- EXPECT_STREQ("@import A\n;\n", Out.data());
+ EXPECT_STREQ("@import A;\n", Out.data());
ASSERT_FALSE(minimizeSourceToDependencyDirectives("@import A.B;\n", Out));
EXPECT_STREQ("@import A.B;\n", Out.data());
@@ -685,18 +685,19 @@ TEST(MinimizeSourceToDependencyDirectivesTest, ImportFailures) {
minimizeSourceToDependencyDirectives("@import MACRO(A);\n", Out));
ASSERT_FALSE(minimizeSourceToDependencyDirectives("@import \" \";\n", Out));
- ASSERT_FALSE(minimizeSourceToDependencyDirectives("import <Foo.h>\n"
+ ASSERT_FALSE(minimizeSourceToDependencyDirectives("import <Foo.h>;\n"
"@import Foo;",
Out));
- EXPECT_STREQ("@import Foo;\n", Out.data());
+ EXPECT_STREQ("import<Foo.h>;\n at import Foo;\n", Out.data());
ASSERT_FALSE(
- minimizeSourceToDependencyDirectives("import <Foo.h>\n"
+ minimizeSourceToDependencyDirectives("import <Foo.h>;\n"
"#import <Foo.h>\n"
"@;\n"
"#pragma clang module import Foo",
Out));
- EXPECT_STREQ("#import <Foo.h>\n"
+ EXPECT_STREQ("import<Foo.h>;\n"
+ "#import <Foo.h>\n"
"#pragma clang module import Foo\n",
Out.data());
}
@@ -1215,4 +1216,41 @@ TEST(MinimizeSourceToDependencyDirectivesTest, TokensBeforeEOF) {
EXPECT_STREQ("#ifndef A\n#define A\n#endif\n<TokBeforeEOF>\n", Out.data());
}
+TEST(MinimizeSourceToDependencyDirectivesTest, PreprocessedModule) {
+ SmallVector<char, 128> Out;
+
+ ASSERT_FALSE(
+ minimizeSourceToDependencyDirectives("export __preprocessed_module M;\n"
+ "struct import {};\n"
+ "import foo;\n"
+ "__preprocessed_import bar;\n",
+ Out));
+ EXPECT_STREQ("export __preprocessed_module M;\n"
+ "__preprocessed_import bar;\n",
+ Out.data());
+}
+
+TEST(MinimizeSourceToDependencyDirectivesTest, ScanningPreprocessedModuleFile) {
+ StringRef Source = R"(
+ export __preprocessed_module M;
+ struct import {};
+ import foo;
+ )";
+
+ ASSERT_TRUE(clang::isPreprocessedModuleFile(Source));
+
+ Source = R"(
+ export module M;
+ struct import {};
+ import foo;
+ )";
+
+ ASSERT_FALSE(clang::isPreprocessedModuleFile(Source));
+
+ Source = R"(
+ __preprocessed_import foo;
+ )";
+ ASSERT_TRUE(clang::isPreprocessedModuleFile(Source));
+}
+
} // end anonymous namespace
diff --git a/clang/unittests/Lex/ModuleDeclStateTest.cpp b/clang/unittests/Lex/ModuleDeclStateTest.cpp
index ac2ddfaf52cd0..3117c4f2f1af0 100644
--- a/clang/unittests/Lex/ModuleDeclStateTest.cpp
+++ b/clang/unittests/Lex/ModuleDeclStateTest.cpp
@@ -40,7 +40,7 @@ class CheckNamedModuleImportingCB : public PPCallbacks {
void moduleImport(SourceLocation ImportLoc, ModuleIdPath Path,
const Module *Imported) override {
ASSERT_TRUE(NextCheckingIndex < IsImportingNamedModulesAssertions.size());
- EXPECT_EQ(PP.isInImportingCXXNamedModules(),
+ EXPECT_EQ(PP.isImportingCXXNamedModules(),
IsImportingNamedModulesAssertions[NextCheckingIndex]);
NextCheckingIndex++;
diff --git a/clang/www/cxx_dr_status.html b/clang/www/cxx_dr_status.html
index e9fadb2dbd4ac..cbab53a46340a 100755
--- a/clang/www/cxx_dr_status.html
+++ b/clang/www/cxx_dr_status.html
@@ -20450,7 +20450,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
<td>[<a href="https://wg21.link/cpp.module">cpp.module</a>]</td>
<td>open</td>
<td>Limiting macro expansion in <I>pp-module</I></td>
- <td align="center">Not resolved</td>
+ <td class="unreleased" align="center">Clang 23</td>
</tr>
<tr class="open" id="2948">
<td><a href="https://cplusplus.github.io/CWG/issues/2948.html">2948</a></td>
diff --git a/clang/www/cxx_status.html b/clang/www/cxx_status.html
index 2618ff930a0e4..06e65ecb7c04c 100755
--- a/clang/www/cxx_status.html
+++ b/clang/www/cxx_status.html
@@ -910,7 +910,7 @@ <h2 id="cxx20">C++20 implementation status</h2>
</tr>
<tr>
<td><a href="https://wg21.link/p1703r1">P1703R1</a></td>
- <td class="none" align="center">Subsumed by P1857</td>
+ <td class="unreleased" align="center">Subsumed by P1857</td>
</tr>
<tr> <!-- from Belfast -->
<td><a href="https://wg21.link/p1874r1">P1874R1</a></td>
@@ -926,14 +926,7 @@ <h2 id="cxx20">C++20 implementation status</h2>
</tr>
<tr>
<td><a href="https://wg21.link/p1857r3">P1857R3</a></td>
- <td class="partial" align="center">
- <details>
- <summary>Clang 21 (Partial)</summary>
- The restriction that "[a] module directive may only appear
- as the first preprocessing tokens in a file" is enforced
- starting in Clang 21.
- </details>
- </td>
+ <td class="unreleased" align="center">Clang 23</td>
</tr>
<tr>
<td><a href="https://wg21.link/p2115r0">P2115R0</a></td>
More information about the cfe-commits
mailing list