[PATCH] D112914: Misleading identifier detection
Richard Smith - zygoloid via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Nov 1 15:44:52 PDT 2021
rsmith accepted this revision.
rsmith added a comment.
This revision is now accepted and ready to land.
Looks good as far as it goes (though I've not checked your classification functions are correct).
We should have some detection for RTL characters in string literals and comments too, at least when there is no subsequent character with strong LTR directionality. Neither `"` nor `*/` has strong directionality, so both `א"` and `א*/` can potentially leave us in an RTL state. For example:
// This is "א" then + then "ג".
"א" + "ג"
// This is `a` then `/*א*/` then `*` then `-` then `/*ג*/` then `b`, aka "a * - b" not "a - * b".
a /*א*/ * - /*ג*/ b
================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:2
+//===--- MisleadingIdentifier.cpp - clang-tidy
+//--------------------------------===//
+//
----------------
Please fix
================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:19
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedAL(llvm::UTF32 CP) {
----------------
(throughout).
================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:134-135
+ llvm::strictConversion);
+ if (Result != llvm::conversionOK)
+ break;
+ if (isUnassignedAL(CodePoint) || isUnassignedR(CodePoint) || isR(CodePoint))
----------------
As in the other patch, we could recover here by skipping until we find a valid UTF-8 lead byte. It's not important here since invalid UTF-8 won't be accepted in an identifier anyway, but might be important if we start checking strings and comments for RTL characters too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112914/new/
https://reviews.llvm.org/D112914
More information about the cfe-commits
mailing list