[PATCH] D112914: Misleading identifier detection

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Nov 1 15:44:52 PDT 2021


rsmith accepted this revision.
rsmith added a comment.
This revision is now accepted and ready to land.

Looks good as far as it goes (though I've not checked your classification functions are correct).

We should have some detection for RTL characters in string literals and comments too, at least when there is no subsequent character with strong LTR directionality. Neither `"` nor `*/` has strong directionality, so both `א"` and `א*/` can potentially leave us in an RTL state. For example:

  // This is "א" then + then "ג".
  "א" + "ג"
  
  // This is `a` then `/*א*/` then `*` then `-` then `/*ג*/` then `b`, aka "a * - b" not "a - * b".
  a /*א*/ * - /*ג*/ b



================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:2
+//===--- MisleadingIdentifier.cpp - clang-tidy
+//--------------------------------===//
+//
----------------
Please fix


================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:19
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedAL(llvm::UTF32 CP) {
----------------
(throughout).


================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:134-135
+        llvm::strictConversion);
+    if (Result != llvm::conversionOK)
+      break;
+    if (isUnassignedAL(CodePoint) || isUnassignedR(CodePoint) || isR(CodePoint))
----------------
As in the other patch, we could recover here by skipping until we find a valid UTF-8 lead byte. It's not important here since invalid UTF-8 won't be accepted in an identifier anyway, but might be important if we start checking strings and comments for RTL characters too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914



More information about the cfe-commits mailing list