[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 21 12:06:40 PDT 2023


rsmith added inline comments.


================
Comment at: clang/docs/ReleaseNotes.rst:97
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- Implemented `CWG1473 <https://wg21.link/CWG1473>`_ which allows spaces after ``operator""``.
-  Clang used to err on the lack of space when the literal suffix identifier was invalid in
-  all the language modes, which contradicted the deprecation of the whitespaces.
-  Also turn on ``-Wdeprecated-literal-operator`` by default in all the language modes.
+- Implemented `CWG1473 <https://wg21.link/CWG1473>`_ allowing lack of space after ``operator""``.
+  Clang used to err on the lack of space when the literal suffix identifier was invalid,
----------------
I don't think this is accurate. Clang supported CWG1473 before these changes, as far as I can see: all valid code under CWG1473 was accepted, and invalid code was diagnosed (by default). Rather, what has changed is the behavior for invalid code: instead of treating an invalid `""blah` as two tokens always, in order to accept as much old code as possible, we now treat it as two tokens only when `blah` is defined as a macro name.

This is still a breaking change in some cases, for users of `-Wno-deprecated-literal-operator`, eg:

```
#define FOO(xyz) ""xyz
```

...  now will be lexed as a single invalid token rather than two tokens.

I'm not sure what the motivation for making changes here was, and D153156's description doesn't really help me understand it. Is the goal to improve the diagnostic quality for these kinds of errors on invalid code? Is there some example for which Clang's behavior with regard to CWG1473 was non-conforming? Something else?


================
Comment at: clang/lib/Lex/Lexer.cpp:2020
+  bool IsLiteralOperator =
+      StringRef(BufferPtr, 2).equals("\"\"") && BufferPtr + 2 == TokStart;
+  if (unsigned TokLen = CurPtr - TokStart;
----------------
Reverse the order of these checks to do the cheaper check first and to avoid the possibility of reading off the end of the input.


================
Comment at: clang/lib/Lex/Lexer.cpp:2027-2029
+    // However, don't diagnose operator""E(...) even if E is a macro as it
+    // results in confusing error messages. Hence, ""E would not be treated as
+    // string concat; instead it's a single PP token (as it should be).
----------------
That's still a breaking change compared to what we designed `-Wno-reserved-literal-operator` to do. How often does it happen in practice that someone has both an ill-formed literal operator *and* a macro defined to the same name as the suffix?


================
Comment at: clang/lib/Lex/Lexer.cpp:2035
+    IdentifierInfo *II = PP->LookUpIdentifierInfo(Result);
+    if (II->hasMacroDefinition()) {
+      Diag(TokStart, LangOpts.MSVCCompat
----------------
This doesn't check whether the identifier is currently defined as a macro, in the presence of modules. It also won't be correct if the lexer has got substantially ahead of the preprocessor, and the `#define` has been lexed but not yet preprocessed.

Overall, it's not really possible to tell from here whether an identifier is defined as a macro. To do this properly, you should instead produce a single token here and teach the preprocessor to consider splitting it into two tokens if the suffix is a reserved ud-suffix naming a defined macro. In principle you could also check from the preprocessor whether the previous produced token was `operator` and use that as part of the decision...


================
Comment at: clang/test/CXX/drs/dr14xx.cpp:487
 
 namespace dr1473 { // dr1473: 18
                    // NB: sup 1762, test reused there
----------------
I don't think this is correct. As far as I can tell, Clang has correctly implemented CWG1473 since version 3.2.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372



More information about the cfe-commits mailing list