[PATCH] D112916: Confusable identifiers detection

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 11 07:05:34 PST 2022


cor3ntin added a comment.

@aaron.ballman Thanks for the ping.

One one hand, I agree with you, on the other hand, this tries to stick to TR39, and I think we should stick with that. It might be worth checking with the Unicode consortium what they think of i/l as confusable.

I think this is a rather good approach as a clang-tidy plugin (and we can gain experience from it to iterate).

I'm however rather concern that we are not following the spec in that there is no NFD decomposition happening (and I am aware this increases the workload a lot for the author).
But as it stands, composed characters may not be confusable even if the decomposed form would be, and the prescribed algorithm actually do 2 passed of decomposition, which is certainly necessary (I would need to think more about it)
The reasonable approach is certainly to stick to the algorithm to the letter, unless we have good motivation not to.



================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:35
+ * described in http://www.unicode.org/reports/tr39/#def-skeleton
+ */
+std::string Homoglyph::skeleton(StringRef Name) {
----------------
This does not seem to do NFD decomposition, so at the very least it is not consistent with TR39
Identifiers in C++ are mandated to be NFC.

This means that in the presence of composed characters ( for example digraphs ), the algorithms will simply not work.


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:38
+  std::string SName = Name.str();
+  std::string Skeleton;
+  Skeleton.reserve(1 + Name.size());
----------------
It's probably more efficient to keep Skeleton as a sequence of char32_t, to avoid multiple round of conversions


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:82
+
+void Homoglyph::check(const ast_matchers::MatchFinder::MatchResult &Result) {
+  if (const auto *ND = Result.Nodes.getNodeAs<NamedDecl>("nameddecl")) {
----------------
I think i like this idea of only complaining if there similar-looking identifiers


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112916/new/

https://reviews.llvm.org/D112916



More information about the cfe-commits mailing list