[PATCH] D153914: [clang-cl] Enable concatenation of predefined identifiers

Richard Dzenis via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jul 25 13:32:19 PDT 2023


RIscRIpt added inline comments.


================
Comment at: clang/include/clang/Basic/TokenKinds.h:87-93
+/// Return true if this token is a predefined macro
+/// unexpandable by MSVC preprocessor.
+inline bool isUnexpandableMsMacro(TokenKind K) {
+  return K == tok::kw___FUNCTION__ || K == tok::kw___FUNCSIG__ ||
+         K == tok::kw_L__FUNCTION__ || K == tok::kw_L__FUNCSIG__ ||
+         K == tok::kw___FUNCDNAME__;
+}
----------------
cor3ntin wrote:
> RIscRIpt wrote:
> > tahonermann wrote:
> > > cor3ntin wrote:
> > > > tahonermann wrote:
> > > > > RIscRIpt wrote:
> > > > > > tahonermann wrote:
> > > > > > > 
> > > > > > Thanks, I like the name. But shouldn't we reflect that we are referring to only Microsoft (unexpandable) macros? How about `isFunctionLocalPredefinedMsMacro`?
> > > > > I don't think the Microsoft association is meaningful. Sure, some of the predefined macros are only treated differently when compiling in Microsoft mode, but some of those macros aren't Microsoft specific. Also, there might be macros provided by other implementations that we'll want to emulate some day.
> > > > I think it is, there is currently no way to link `isFunctionLocalPredefinedMacro` to the MS feature. "MSPredefinedMacro" is pretty self explanatory, same reason we most often - but not always - use GNU in the name of function related to GNU extensions.
> > > > There are enough similar-sounding features and extensions that we should try to make it clear which feature we are talking about.
> > > Perhaps we still haven't found the right name then. With the name that I've been advocating, this function should also return true for `tok::kw___PRETTY_FUNCTION__` even though that macro should not expand to something that can be concatenated with other string literals (whether compiling in Microsoft mode or not).
> > > 
> > > The Microsoft extension is the localized expansion to a string literal that can be concatenated with other string literals. We probably should isolate the conditions for that behavior to one place and if we do that, then I guess naming this function as specific to Microsoft mode is ok; we can always rename it later if it gets used for non-Microsoft extensions.
> > > 
> > > Here is my suggestion then:
> > >   // Return true if the token corresponds to a function local predefined
> > >   // macro that, in Microsoft modes, expands to a string literal (that can
> > >   // then be concatenated with other string literals).
> > >   inline bool isMsFunctionLocalStringLiteralMacro(TokenKind K, const LangOptions &langOpts) {
> > >     return langOpts.MicrosoftExt && (
> > >         K == tok::kw___FUNCTION__ || K == tok::kw_L__FUNCTION__ ||
> > >         K == tok::kw___FUNCSIG__ || K == tok::kw_L__FUNCSIG__ ||
> > >         K == tok::kw___FUNCDNAME__);
> > >   }  
> > > 
> > > This will avoid the need for callers to check for `MicrosoftExt`; that is what I really want to avoid since it is easy to forget to do that check.
> > 1. By requiring user to pass `LangOptions` I think we can remove MS association (implying that there could be other non-msft cases in the future)
> > 2. We would have to include a `LangOptions.h` in `TokenKinds.h`, are we ok with this? Alternatively while this function is for msft cases only, we could pass `bool MicrosoftExt`
> > 
> > Personally, I like version with `LangOptions` and removal of `MS`.
> > By requiring user to pass LangOptions I think we can remove MS association
> 
> I don't think there is any motivation to future proof against hypotheticals, we can always refactor later 
> 
> > We would have to include a LangOptions.h in TokenKinds.h
> 
> This makes me uneasy, but i think we can move the function to `LiteralSupport.h` and include that in `ParseExpr.cpp`.
Sounds good to me. Due to other changes I also moved `TokenIsLikeStringLiteral` there.

> I don't think there is any motivation to future proof against hypotheticals, we can always refactor later
Yes, not a strong argument, but I'd like to keep current name if you are not against.


================
Comment at: clang/lib/Sema/SemaExpr.cpp:2010-2013
+    OS << "R\"EFLPM("
+       << PredefinedExpr::ComputeName(getPredefinedExprKind(Tok.getKind()),
+                                      currentDecl)
+       << ")EFLPM\"";
----------------
tahonermann wrote:
> cor3ntin wrote:
> > tahonermann wrote:
> > > RIscRIpt wrote:
> > > > tahonermann wrote:
> > > > > "EFLPM"? I'm assuming this is an attempt to avoid an accidental clash with the computed name. Does this suffice to ensure a clash can't happen? If not, can we do better? Per http://eel.is/c++draft/lex.string#nt:r-char and http://eel.is/c++draft/lex.charset#1, we have a lot of flexibility regarding which characters to use.
> > > > At first I thought you were hinting me to use non-ascii characters for d-char-sequence. However, when I looked closely d-char-sequence is not as flexible as r-chars that you referenced: http://eel.is/c++draft/lex.string#nt:d-char and http://eel.is/c++draft/lex.charset#2
> > > > 
> > > > Here're my thoughts:
> > > > 1. I was afraid that there could be a possibility of a function name contain either quote (`"`) or a backslash (`\`) despite it seemed impossible.
> > > > 2. At first I used `Lexer::Stringify` to make it bullet-proof. I didn't like it, neither did you (reviewers).
> > > > 3. I replaced `Lexer::Stringify` with raw-string-literal (d-char-sequence was chosen as acronym of current function name).
> > > > 4. Current `EFLPM` looks weird and cryptic. And we are limited to [lex.charset.basic] with exceptions (space and earlier chars are not allowed). I think, using a raw-string-literal without d-char-sequence would be enough, because problem could be caused only by two consecutive characters `)"`, neither of which can appear in a function name.
> > > Sorry about that; you are absolutely right that `d-char-sequence` is (much) more constrained than `r-char-sequence`.
> > > 
> > > For `__FUNCSIG__`, the generated text can include arbitrary text. For example, given this declaration:
> > >   void f(decltype(sizeof("()"))*)
> > > the macro value is:
> > >   void __cdecl f(decltype(sizeof ("()")) *)
> > > which does contain `)"`.
> > > 
> > > I think we should do more to avoid any possibility of the resulting string being unparseable because the resulting diagnostics will be completely inscrutable. The best way to do that would be to avoid trying to produce a parseable string literal in the first place. Based on that and given that `Sema::ActOnStringLiteral()` immediately uses `StringLiteralParser` to produce an expanded string literal, I think we would be better off moving this expansion there such that these macros can be expanded to already cooked string literals that don't need to be parsed at all. This will mean having to modify the `StringLiteralParser` constructor to accept a `Decl*` pointer that will be passed `getCurLocalScopeDecl()` in `Sema::ActOnStringLiteral()`. `StringLiteralParser` can then assert if it ever encounters one of these function local predefined tokens when it hasn't been provided a corresponding `Decl*` to use with them.
> > I didn't think about that scenario but in that case the original idea to use Lexer::Stringify seems like a cleaner solution to me.
> > Adding that kind of logic to `LiteralSupport` creates a lot of coupling for not much gain that I can see.
> > 
> > 
> > In that light, `Lexer::Stringify` seems like the least worse option.
> Yeah, that might be the simpler solution. That matches how `__FILE__`, `__BASE_FILE__`, and `__FILE_NAME__` are handled in `Preprocessor::ExpandBuiltinMacro()`.
Thanks for pointing out good counter-example.

> In that light, Lexer::Stringify seems like the least worse option.
Unfortunately, yes.

In regards other possible implementations, I commented previously here: https://reviews.llvm.org/D153914#4500807

> to modify the `StringLiteralParser` constructor to accept a `Decl*`
This requires dependency (and linkage) of Lex with AST.

I am going to add your counter-examples to test, and change implementation back to `Lexer::Stringify`


================
Comment at: clang/test/Sema/ms_predefined_expr.cpp:156
+    static_assert(__FUNCTION__ ""_len == 14); // expected-warning{{string literal concatenation of predefined identifier '__FUNCTION__' is a Microsoft extension}}
 }
----------------
tahonermann wrote:
> Here is another case to test. MSVC accepts this.
>   void test_static_assert() {
>     static_assert(true, __FUNCTION__);
>   }
Thanks for finding this! I had to do some adjustments to support it. In addition I changed diagnostics message because it no longer made sense, and removed `assert` about "single predefined identifiers shall be handled by ActOnPredefinedExpr". Other code around references of `err_expected_string_literal` do not seem to require support of this MS ext.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153914/new/

https://reviews.llvm.org/D153914



More information about the cfe-commits mailing list