[PATCH] D112913: Misleading bidirectional detection

Tue Jan 11 09:23:49 PST 2022

serge-sans-paille added a comment.

In D112913#3233699 <https://reviews.llvm.org/D112913#3233699>, @upsuper wrote:

> I'd like to clarify that what I think is correct now is the algorithm to detect unclosed explicit formatting scopes in a given string.

Thanks for confirming. This check only detects unterminated bidi sequence within comments and string literals. Its scope limits to that aspect.

> I haven't been following very closely with the whole spoofing issue, so I can't say that there is no other ways to construct a spoof that this algorithm is not designed to detect.

Agreed. FYI we already have a check for RTL characters ending identfiers, and a pending one for confusable identifiers

> As you have found, `RLM`, and `ALM` can be used to confuse code reader, but they are not much different than a string with other strong RTL characters inside, and I don't quite see how that can be linted without hurting potentially legitimate code. Maybe if the compiler supports treating `LRM` as whitespace (I'm not sure whether Clang does), a lint may be added to ask wrapping any string with outermost strong characters being RTL in the form of `{LRM}"string"{LRM}` so that the RTL characters don't affect outside. Other than that, I don't think there is anyway to lint against such a confusion.

I agree that allowing `LRM` is a good step forward, and that's part of the official recommendation, but orthogonal to that review.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112913/new/

https://reviews.llvm.org/D112913