[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Mon Aug 2 07:48:20 PDT 2021

aaron.ballman added inline comments.

================
Comment at: clang/include/clang/AST/Expr.h:1839-1840
   StringRef getString() const {
-    assert(getCharByteWidth() == 1 &&
-           "This function is used in places that assume strings use char");
+    assert(isUnevaluated() ||
+           getCharByteWidth() == 1 &&
+               "This function is used in places that assume strings use char");
----------------
This should silence some diagnostics about mixed && and || in the same expression.

================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:248
+def err_unevaluated_string_udl : Error<
+  "an unevaluated string literal cannot be a user defined literal">;
+def err_unevaluated_string_invalid_escape_sequence : Error<
----------------

================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:250
+def err_unevaluated_string_invalid_escape_sequence : Error<
+  "Invalid escape sequence '%0' in an unevaluated string literal">;
 def err_string_concat_mixed_suffix : Error<
----------------

================
Comment at: clang/include/clang/Basic/DiagnosticParseKinds.td:59-60
+
+def err_expected_unevaluated_string : Error<
+  "expected a string literal!">;
+
----------------
No need for this diagnostic,  you can use `err_expected_string_literal` from DiagnosticCommonKinds.td instead.

================
Comment at: clang/include/clang/Lex/LiteralSupport.h:220
   tok::TokenKind Kind;
+  SmallString<512> UnevaluatedBuf;
   SmallString<512> ResultBuf;
----------------
This seems to be unused.

================
Comment at: clang/include/clang/Lex/LiteralSupport.h:235
       MaxTokenLength(0), SizeBound(0), CharByteWidth(0), Kind(tok::unknown),
       ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) {
     init(StringToks);
----------------
Alternatively, you could use an in-class initializer and drop the changes to both ctor init lists.

================
Comment at: clang/lib/AST/Expr.cpp:1076
+
   unsigned ByteLength = Str.size();
+  unsigned Length = ByteLength;
----------------
Basically unused and is shadowed by a declaration below (on line 1087).

================
Comment at: clang/lib/AST/Expr.cpp:1157-1158
   switch (getKind()) {
+  case Unevaluated:
+    break;           // no prefic
   case Ascii: break; // no prefix.
----------------

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:95-96
+  case '?':
+  case 'n':
+  case 't':
+    return true;
----------------
Do you intend to miss a bunch of escapes like `\'` and `\r` (etc)?

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:242
+         diag::err_unevaluated_string_invalid_escape_sequence)
+        << std::string(1, EscapeBegin[1]);
+  }
----------------

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1539

-  // Implement Translation Phase #6: concatenation of string literals
+  // Determines the kind of string from the prefix
+  Kind = tok::string_literal;
----------------

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1541
+  Kind = tok::string_literal;
+  for (const auto &Tok : StringToks) {
+    // Unevaluated string literals can never have a prefix
----------------
This means we're looping over (almost) all the string tokens three times -- once here, once below on line 1562, and again on 1605.

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1542
+  for (const auto &Tok : StringToks) {
+    // Unevaluated string literals can never have a prefix
+    if (Unevaluated && Tok.getKind() != tok::string_literal) {
----------------

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1545
+      if (Diags)
+        Diags->Report(Tok.getLocation(), diag::err_unevaluated_string_prefix);
+      hadError = true;
----------------
This diagnostic might be somewhat odd for Pascal strings because those sort of have a prefix but it's not really the kind of prefix we're talking about. I don't know of a better way to word the diagnostic though. If you think of a way to improve it, then yay, but otherwise, I think it's fine as-is.

================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1963
+                        FullSourceLoc(Tok.getLocation(), SM), CharByteWidth * 8,
+                        Diags, Features, false);
       --ByteNo;
----------------

================
Comment at: clang/lib/Parse/ParseDecl.cpp:422-423

-        ExprResult ArgExpr(
-            Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression()));
+        ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr(
+            ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind)));
         if (ArgExpr.isInvalid()) {
----------------
Hmmm, I'm not certain about these changes.

For some attributes, the standard currently requires accepting any kind of string literal (like `[[deprecated]]` https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to change that, but it's not yet accepted by WG21 (let alone WG14). So giving errors in those cases is a bit of a hard sell -- I think warnings would be a bit more reasonable.

But for other attributes (like `annotate`), it's a bit less clear whether we should *prevent* literal prefixes because the attribute can be used to have runtime impacts (for example, I can imagine someone using `annotate` to emit the string literal bytes into the resulting binary). In some cases, I think it's very reasonable (e.g., `diagnose_if` should behave the same as `deprecated` and `nodiscard` because those are purely about generating diagnostics at compile time).

I kind of wonder whether we're going to want to tablegenerate whether the argument needs to be parsed as unevaluated or not on an attribute-by-attribute basis.

================
Comment at: clang/lib/Parse/ParseDecl.cpp:455-456
+    ParsedAttr::Kind Kind) {
+  if (isTokenStringLiteral() && (Kind == ParsedAttr::AT_Deprecated ||
+                                 Kind == ParsedAttr::AT_WarnUnusedResult)) {
+    ExprResult Result = ParseUnevaluatedStringLiteralExpression();
----------------
I don't think this is the right way to go about this, but the comments were left above.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759