[clang] [clang][ASTMatcher] Add `matchesString` for `StringLiteral` which matches literals on given `RegExp` (PR #102152)

Wed Nov 13 09:36:15 PST 2024

================
@@ -2503,6 +2503,28 @@ TEST_P(ASTMatchersTest, IsDelegatingConstructor) {
       cxxConstructorDecl(isDelegatingConstructor(), parameterCountIs(1))));
 }
 
+TEST_P(ASTMatchersTest, MatchesString) {
+  StatementMatcher Literal = stringLiteral(matchesString("foo.*"));
+  EXPECT_TRUE(matches("const char* a = \"foo\";", Literal));
+  EXPECT_TRUE(matches("const char* b = \"foobar\";", Literal));
+  EXPECT_TRUE(matches("const char* b = \"fo\"\"obar\";", Literal));
+  EXPECT_TRUE(notMatches("const char* c = \"bar\";", Literal));
+  // test embedded nulls
+  StatementMatcher Literal2 = stringLiteral(matchesString("bar"));
+  EXPECT_TRUE(matches("const char* b = \"foo\\0bar\";", Literal2));
+  EXPECT_TRUE(notMatches("const char* b = \"foo\\0b\\0ar\";", Literal2));
+}
+
+TEST(MatchesString, MatchesStringPrefixed) {
+  StatementMatcher Literal = stringLiteral(matchesString("foo.*"));
+  EXPECT_TRUE(matchesConditionally("const char16_t* a = u\"foo\";", Literal,
+                                   true, {"-std=c++11"}));
+  EXPECT_TRUE(matchesConditionally("const char32_t* a = U\"foo\";", Literal,
+                                   true, {"-std=c++11"}));
+  EXPECT_TRUE(matchesConditionally("const wchar_t* a = L\"foo\";", Literal,
+                                   true, {"-std=c++11"}));
----------------
AaronBallman wrote:

Sorry, this PR dropped off my radar entirely!

I think that for right now, we should maybe prohibit matching so that we allow `char` and `char8_t` but prohibit `char16_t` and `char32_t` matching. But @cor3ntin brings up a good point about numeric escape sequences. We should probably have test coverage for something like matching `"foo"` against `"\x66\x6f\x6f"` to ensure we match. Or do we expect that to not match because we want `matchesString()` to match on the syntactic form of the characters in the source rather than the semantic form of the characters?

https://github.com/llvm/llvm-project/pull/102152