[PATCH] D153914: [clang-cl] Enable concatenation of predefined identifiers

Mon Jul 24 15:18:10 PDT 2023

cor3ntin added inline comments.

================
Comment at: clang/lib/Sema/SemaExpr.cpp:2010-2013
+    OS << "R\"EFLPM("
+       << PredefinedExpr::ComputeName(getPredefinedExprKind(Tok.getKind()),
+                                      currentDecl)
+       << ")EFLPM\"";
----------------
tahonermann wrote:
> RIscRIpt wrote:
> > tahonermann wrote:
> > > "EFLPM"? I'm assuming this is an attempt to avoid an accidental clash with the computed name. Does this suffice to ensure a clash can't happen? If not, can we do better? Per http://eel.is/c++draft/lex.string#nt:r-char and http://eel.is/c++draft/lex.charset#1, we have a lot of flexibility regarding which characters to use.
> > At first I thought you were hinting me to use non-ascii characters for d-char-sequence. However, when I looked closely d-char-sequence is not as flexible as r-chars that you referenced: http://eel.is/c++draft/lex.string#nt:d-char and http://eel.is/c++draft/lex.charset#2
> > 
> > Here're my thoughts:
> > 1. I was afraid that there could be a possibility of a function name contain either quote (`"`) or a backslash (`\`) despite it seemed impossible.
> > 2. At first I used `Lexer::Stringify` to make it bullet-proof. I didn't like it, neither did you (reviewers).
> > 3. I replaced `Lexer::Stringify` with raw-string-literal (d-char-sequence was chosen as acronym of current function name).
> > 4. Current `EFLPM` looks weird and cryptic. And we are limited to [lex.charset.basic] with exceptions (space and earlier chars are not allowed). I think, using a raw-string-literal without d-char-sequence would be enough, because problem could be caused only by two consecutive characters `)"`, neither of which can appear in a function name.
> Sorry about that; you are absolutely right that `d-char-sequence` is (much) more constrained than `r-char-sequence`.
> 
> For `__FUNCSIG__`, the generated text can include arbitrary text. For example, given this declaration:
>   void f(decltype(sizeof("()"))*)
> the macro value is:
>   void __cdecl f(decltype(sizeof ("()")) *)
> which does contain `)"`.
> 
> I think we should do more to avoid any possibility of the resulting string being unparseable because the resulting diagnostics will be completely inscrutable. The best way to do that would be to avoid trying to produce a parseable string literal in the first place. Based on that and given that `Sema::ActOnStringLiteral()` immediately uses `StringLiteralParser` to produce an expanded string literal, I think we would be better off moving this expansion there such that these macros can be expanded to already cooked string literals that don't need to be parsed at all. This will mean having to modify the `StringLiteralParser` constructor to accept a `Decl*` pointer that will be passed `getCurLocalScopeDecl()` in `Sema::ActOnStringLiteral()`. `StringLiteralParser` can then assert if it ever encounters one of these function local predefined tokens when it hasn't been provided a corresponding `Decl*` to use with them.
I didn't think about that scenario but in that case the original idea to use Lexer::Stringify seems like a cleaner solution to me.
Adding that kind of logic to `LiteralSupport` creates a lot of coupling for not much gain that I can see.

In that light, `Lexer::Stringify` seems like the least worse option.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153914/new/

https://reviews.llvm.org/D153914