[clang] [clang-format] Support of TableGen identifiers beginning with a number. (PR #78571)
Hirofumi Nakamura via cfe-commits
cfe-commits at lists.llvm.org
Thu Jan 18 08:26:56 PST 2024
================
@@ -804,6 +806,46 @@ void FormatTokenLexer::handleTableGenMultilineString() {
FirstLineText, MultiLineString->OriginalColumn, Style.TabWidth, Encoding);
}
+void FormatTokenLexer::handleTableGenNumericLikeIdentifier() {
+ FormatToken *Tok = Tokens.back();
+ // TableGen identifiers can begin with digits. Such tokens are lexed as
+ // numeric_constant now.
+ if (Tok->isNot(tok::numeric_constant))
+ return;
+ StringRef Text = Tok->TokenText;
+ // Identifiers cannot begin with + or -.
+ if (Text.size() < 1 || Text[0] == '+' || Text[0] == '-')
+ return;
+ // The following check is based on llvm::TGLexer::LexToken.
+ if (isdigit(Text[0])) {
+ size_t I = 0;
+ char NextChar = (char)0;
+ // Identifiers in TalbleGen may begin with digits. Skip to first non-digit.
+ do {
+ NextChar = Text[I++];
+ } while (I < Text.size() && isdigit(NextChar));
+ // All the characters are digits.
+ if (I >= Text.size())
+ return;
+ // Base character. But it does not check the first 0 and that the base is
+ // the second character.
----------------
hnakamura5 wrote:
Yes for the both question. This is about TableGen compiler's lexer.
As you wonder, this comment may be not precise enough. Later I will fix it.
For example,
`0x1234x` is regarded as integer because the lexer assumes it is a integer at the point it have got `0x1` part. This is an syntax error example written in the unittest.
I want to note here by this comment is,
`1x1234x` is also regarded as integer (and syntax error). This behavior comes from the lexer does not check the character before 'x' is 0 or other number.
(FYI, `1y1234x ` is a valid identifier. Such a ambiguity is only when the first non-digit character is 'x' or 'b'. )
https://github.com/llvm/llvm-project/pull/78571
More information about the cfe-commits
mailing list