[clang] [clang-format] Handle C++ keywords in other languages better (PR #132941)
via cfe-commits
cfe-commits at lists.llvm.org
Wed Apr 9 08:01:02 PDT 2025
https://github.com/sstwcw updated https://github.com/llvm/llvm-project/pull/132941
>From 560f9e4a981b67482e32a60c4b373dc18d242dc3 Mon Sep 17 00:00:00 2001
From: sstwcw <su3e8a96kzlver at posteo.net>
Date: Tue, 25 Mar 2025 14:10:08 +0000
Subject: [PATCH] [clang-format] Handle C++ keywords in other languages better
There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords. Right now, the kind
is set to identifier, and the identifier info is cleared. The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures. But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++, but without checking for the language
setting. However, there are places where the code checks whether the
identifier info field is null or not in places where an identifier and a
keyword are treated the same way. For example, the name of a function
in JavaScript. This patch removes the lines that clear the identifier
info. This way, a C++ keyword gets treated in the same way as an
identifier in those places.
JavaScript
New
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Old
```JavaScript
async function
union(
myparamnameiswaytooloooong) {
}
```
Java
New
```Java
enum union { ABC, CDE }
```
Old
```Java
enum
union { ABC, CDE }
```
---
clang/lib/Format/FormatTokenLexer.cpp | 3 ---
clang/unittests/Format/FormatTestJS.cpp | 10 ++++++++++
clang/unittests/Format/FormatTestJava.cpp | 2 ++
3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/clang/lib/Format/FormatTokenLexer.cpp b/clang/lib/Format/FormatTokenLexer.cpp
index eed54a11684b5..014b10b206d90 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -1306,15 +1306,12 @@ FormatToken *FormatTokenLexer::getNextToken() {
FormatTok->isOneOf(tok::kw_struct, tok::kw_union, tok::kw_delete,
tok::kw_operator)) {
FormatTok->Tok.setKind(tok::identifier);
- FormatTok->Tok.setIdentifierInfo(nullptr);
} else if (Style.isJavaScript() &&
FormatTok->isOneOf(tok::kw_struct, tok::kw_union,
tok::kw_operator)) {
FormatTok->Tok.setKind(tok::identifier);
- FormatTok->Tok.setIdentifierInfo(nullptr);
} else if (Style.isTableGen() && !Keywords.isTableGenKeyword(*FormatTok)) {
FormatTok->Tok.setKind(tok::identifier);
- FormatTok->Tok.setIdentifierInfo(nullptr);
}
} else if (FormatTok->is(tok::greatergreater)) {
FormatTok->Tok.setKind(tok::greater);
diff --git a/clang/unittests/Format/FormatTestJS.cpp b/clang/unittests/Format/FormatTestJS.cpp
index 78c9f887a159b..6fedf1e2c0079 100644
--- a/clang/unittests/Format/FormatTestJS.cpp
+++ b/clang/unittests/Format/FormatTestJS.cpp
@@ -834,6 +834,11 @@ TEST_F(FormatTestJS, AsyncFunctions) {
"}",
"async function hello(myparamnameiswaytooloooong) {}",
getGoogleJSStyleWithColumns(10));
+ verifyFormat("async function\n"
+ "union(\n"
+ " myparamnameiswaytooloooong) {\n"
+ "}",
+ getGoogleJSStyleWithColumns(10));
verifyFormat("class C {\n"
" async hello(\n"
" myparamnameiswaytooloooong) {\n"
@@ -1369,6 +1374,7 @@ TEST_F(FormatTestJS, WrapRespectsAutomaticSemicolonInsertion) {
getGoogleJSStyleWithColumns(10));
verifyFormat("await theReckoning;", getGoogleJSStyleWithColumns(10));
verifyFormat("some['a']['b']", getGoogleJSStyleWithColumns(10));
+ verifyFormat("union['a']['b']", getGoogleJSStyleWithColumns(10));
verifyFormat("x = (a['a']\n"
" ['b']);",
getGoogleJSStyleWithColumns(10));
@@ -2500,6 +2506,10 @@ TEST_F(FormatTestJS, NonNullAssertionOperator) {
TEST_F(FormatTestJS, CppKeywords) {
// Make sure we don't mess stuff up because of C++ keywords.
verifyFormat("return operator && (aa);");
+ verifyFormat("enum operator {\n"
+ " A = 1,\n"
+ " B\n"
+ "}");
// .. or QT ones.
verifyFormat("const slots: Slot[];");
// use the "!" assertion operator to validate that clang-format understands
diff --git a/clang/unittests/Format/FormatTestJava.cpp b/clang/unittests/Format/FormatTestJava.cpp
index 33998bc7ff858..e01c1d6d7e684 100644
--- a/clang/unittests/Format/FormatTestJava.cpp
+++ b/clang/unittests/Format/FormatTestJava.cpp
@@ -158,6 +158,8 @@ TEST_F(FormatTestJava, AnonymousClasses) {
TEST_F(FormatTestJava, EnumDeclarations) {
verifyFormat("enum SomeThing { ABC, CDE }");
+ // A C++ keyword should not mess things up.
+ verifyFormat("enum union { ABC, CDE }");
verifyFormat("enum SomeThing {\n"
" ABC,\n"
" CDE,\n"
More information about the cfe-commits
mailing list