[PATCH] D124726: Suggest typoed directives in preprocessor conditionals

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 11 04:59:21 PDT 2022


aaron.ballman added inline comments.


================
Comment at: clang/test/Preprocessor/suggest-typoed-directive.c:10
+// expected-warning at +11 {{'#elfidef' directive not found, did you mean '#elifdef'?}}
+// expected-warning at +11 {{'#elfindef' directive not found, did you mean '#elifdef'?}}
+// expected-warning at +11 {{'#elsi' directive not found, did you mean '#else'?}}
----------------
ken-matsui wrote:
> aaron.ballman wrote:
> > erichkeane wrote:
> > > aaron.ballman wrote:
> > > > ken-matsui wrote:
> > > > > aaron.ballman wrote:
> > > > > > ken-matsui wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > ken-matsui wrote:
> > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything that can be done for that?
> > > > > > > > > > 
> > > > > > > > > > Also, one somewhat interesting question is whether we want to recommend `#elifdef` and `#elifndef` outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO.
> > > > > > > > > > It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything that can be done for that?
> > > > > > > > > 
> > > > > > > > > I found I have to use `std::min_element` instead of `std::max_element` because we are finding the nearest (most minimum distance) string. Meanwhile, `#elfindef` has 2 distance with both `#elifdef` and `#elifndef`. After replacing `std::max_element` with `std::min_element`, I could suggest `#elifndef` from `#elfinndef`.
> > > > > > > > > 
> > > > > > > > > > Also, one somewhat interesting question is whether we want to recommend `#elifdef` and `#elifndef` outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO.
> > > > > > > > > 
> > > > > > > > > I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like `this is a C2x/C++2b feature but compilable on Clang`?)
> > > > > > > > > 
> > > > > > > > > ---
> > > > > > > > > 
> > > > > > > > > According to the algorithm of `std::min_element`, we only get an iterator of the first element even if there is another element that has the same distance. So, `#elfindef` only suggests `#elifdef` in accordance with the order of `Candidates`, and I don't think it is beautiful to depend on the order of candidates. I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable:
> > > > > > > > > 
> > > > > > > > > ```
> > > > > > > > > #elfindef // diag: unknown directive, did you mean #elifdef or #elifndef?
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?)
> > > > > > > > 
> > > > > > > > I may be changing my mind on this a bit. I now see we don't issue an extension warning when using `#elifdef` or `#elifndef` in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)
> > > > > > > > 
> > > > > > > > > I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable:
> > > > > > > > 
> > > > > > > > The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g.,
> > > > > > > > ```
> > > > > > > > #elfindef // expected-warning {{invalid preprocessing directive}} \
> > > > > > > >              expected-note {{did you mean '#elifdef'?}} \
> > > > > > > >              expected-note {{did you mean '#elifndef'?}}
> > > > > > > > ```
> > > > > > > > WDYT?
> > > > > > > > I may be changing my mind on this a bit. I now see we don't issue an extension warning when using `#elifdef` or `#elifndef` in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)
> > > > > > > >
> > > > > > > 
> > > > > > > Certainly, it would be less confusing to the user to avoid suggestions regarding those two.
> > > > > > > I'm going to fix my code to avoid suggesting them in not C2x mode.
> > > > > > > 
> > > > > > > Thank you for submitting the issue, I also would like to work on it.
> > > > > > > 
> > > > > > > > The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g.,
> > > > > > > > ```
> > > > > > > > #elfindef // expected-warning {{invalid preprocessing directive}} \
> > > > > > > >              expected-note {{did you mean '#elifdef'?}} \
> > > > > > > >              expected-note {{did you mean '#elifndef'?}}
> > > > > > > > ```
> > > > > > > > WDYT?
> > > > > > > 
> > > > > > > This is really cool, but don't you care about the redundancy of `did you mean` in terms of the llvm team culture?
> > > > > > > If not, I will implement notes like the above.
> > > > > > > Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode.
> > > > > > 
> > > > > > +1, thank you!
> > > > > > 
> > > > > > > This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture? If not, I will implement notes like the above.
> > > > > > 
> > > > > > I would care if the list were potentially unbounded (like, say, with identifiers in general), but because we know this list will only have a max of two entries on it in this case, it seems reasonable to me. I double-checked with @erichkeane to see if he thought it would be an issue, and he agreed that it being a fixed list makes it pretty reasonable.
> > > > > > 
> > > > > > However, that discussion did raise a question -- why are there two suggestions? elifdef requires a swap + delete and elifndef requires just a swap, so we would have thought that it would have been the only option in the list.
> > > > > With the implementation of Lev distances used in llvm, I could simply suggest `#elifdef` from `#elfidef` and `#elifndef` from `#elfindef`.
> > > > > 
> > > > > So, in this situation, do you think that we still need to add two notes for conflicted distances?
> > > > No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll just leave off the "did you mean?" part of the diagnostic entirely.
> > > Might I suggest we just emit the 'first' suggestion?  This isn't really perfect, but I would think that our users put very little thought into it when we suggest the wrong thing.  
> > Notionally that makes sense, but the situation I'm worried about is when users pass `-fixit` to the compiler -- then it automatically does whatever, but we have no idea whether that whatever is reasonable or not when we have more than one option.
> I could suggest `#if` from `#id`, but I had to change `#elfindef` to `#elfinndef` to suggest `#elifndef`.
> After setting the `AllowReplacements` option to `true`, `#elfindef` suggested `#elifdef` (I think it might have the same distance with `#elifndef`).
Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of suggestions, and so whatever suggestions we get are quite likely good enough. However, I think it would be helpful to also add the tests where the behavior seems a bit strange, like `#elfindef`, just to show that we know the behavior is what it is and it doesn't do bad things like crash or whatnot.


================
Comment at: llvm/lib/Support/StringRef.cpp:102
+// Find a similar string in `Candidates`.
+Optional<std::string> StringRef::find_similar_str(const std::vector<StringRef> &Candidates, size_t Dist) const {
+  // We need to check if `rng` has the exact case-insensitive string because the Levenshtein distance match does not
----------------
I am less convinced that we want this as a general interface on `StringRef`. I think callers need to decide whether something is similar enough or not. For example, case sensitivity may not matter for this case but it might matter to another case. Others may wish to limit the max edit_distance for each of the candidates for performance reasons, others may not. We already saw there were questions about whether to allow replacements or not. That sort of thing.

I think this functionality should be moved to PPDirectives.cpp as a static helper function that's specific to this use. Also, because this returns one of the candidates passed in, and those are a `StringRef`, I think this function should return a `StringRef` -- this removes allocation costs. I'd also drop the `Dist` parameter as the function is no longer intended to be a general purpose one.


================
Comment at: llvm/lib/Support/StringRef.cpp:118-135
+  std::vector<std::pair<std::string, size_t>> Cand;
+  for (StringRef C : Candidates) {
+    size_t CurDist = edit_distance(C, true);
+    if (CurDist <= MaxDist) {
+      Cand.emplace_back(C, CurDist);
+    }
+  }
----------------
I think we can skip the vector entirely by keeping track of the min edit distance we've seen thus far and which candidate that corresponds to when looping over the candidates.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124726/new/

https://reviews.llvm.org/D124726



More information about the cfe-commits mailing list