[PATCH] D130416: [Clang] Add support for Unicode identifiers (UAX31) in C2x mode.

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 28 12:52:10 PDT 2022


cor3ntin added inline comments.


================
Comment at: clang/lib/Lex/Lexer.cpp:1490-1492
     static const llvm::sys::UnicodeCharSet XIDStartChars(XIDStartRanges);
     // '_' doesn't have the XID_Start property but is allowed in C++.
     return C == '_' || XIDStartChars.contains(C);
----------------
tahonermann wrote:
> Perhaps this should permit `$` when `LangOpts.DollarIdents` is true. However, my attempts to produce a test case that illustrates such a need failed. I don't know why; I guess `isAllowedInitiallyIDChar()` isn't called for all identifiers? Or there is special handling of `LangOpts.DollarIdents` somewhere? I wonder if that might imply an issue elsewhere; such as a missing call to `isAllowedInitiallyIDChar()` somewhere?
> 
> I did verify that '$' (U+003F) is not in `XID_Start`: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:XID_Start=Yes:]
Thanks for the review. I think checking for underscore here is actually the issue, it's most likely never true.
`isAllowedInitiallyIDChar` is never called on ascii codepoints as the fast path uses `isAsciiIdentifierStart` - which is where `$` is checked for.

(`LexIdentifierContinue` handles  ascii characters too and correctly deal with `$` )


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130416/new/

https://reviews.llvm.org/D130416



More information about the cfe-commits mailing list