[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

DonĂ¡t Nagy via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Aug 10 04:57:10 PDT 2023


donat.nagy marked 4 inline comments as done.
donat.nagy added a comment.

Thanks for the suggestions, I answered those where I had something to say and I'll implement the rest of them.

In D156312#4576120 <https://reviews.llvm.org/D156312#4576120>, @steakhal wrote:

> Have you considered checking the shift operands for taint?
> I wonder if we could warn if we cannot prove the correct use of the shift operator when either of the operands is tainted.

It would be straightforward to add taint handling; but I'd prefer to leave it for a follow-up commit. Some experimentation would be needed to see whether it produces useful results (do real-world projects use shifts on tainted numbers? do we encounter false positives?).



================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:53
+class BitwiseShiftValidator {
+  // Primary mutable state:
+  CheckerContext &Ctx;
----------------
steakhal wrote:
> Where does this comment corresponds to?
The two data members (`Ctx` and `State`) that are directly below the comment.

I tried to split the list of data members into three groups (primary mutable state, secondary mutable state, `const`  members), but now I reconsider this I feel that these header comments are actually superfluous.

I'll return to a variant of the earlier layout where I put `NonNegFlags`/`UpperBound` to the end and comment that they are only used for note tag creation; but don't emphasize the (immediately visible) `const`ness / non-`const`-ness of other members with comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:57-59
+  // Secondary mutable state, used only for note tag creation:
+  enum { NonNegLeft = 1, NonNegRight = 2 };
+  unsigned NonNegFlags = 0;
----------------
steakhal wrote:
> How about using the same enum type for these two? It would signify that these are used together.
> Also, by only looking at the possible values, it's not apparent that these are bitflag values. A suffix might help the reader.
Using the enum type is a good idea, however I'd probably put the `Flag` suffix into the name of the type:
```
enum SignednessFlags : unsigned { NonNegLeft = 1, NonNegRight = 2 };
SignednessFlags NonNegOperands = 0;
```


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:60-63
+  // If there is an upper bound, it is <= the size of a primitive type
+  // (expressed in bits), so this is a very safe sentinel value:
+  enum { NoUpperBound = 999 };
+  unsigned UpperBound = NoUpperBound;
----------------
steakhal wrote:
> Have you considered using `std::optional`?
> Given that the dimension of this variable is "bits", I think it would be great to have that in its name.
I considered using `std::optional`, but using this "old-school" solution ensures that `NoUpperBound` is comparable to and larger than the "real" upper bound values, which lets me avoid some ugly boolean logic. I'll update the comment and probably also the variable name.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:106-109
+  if (auto BRP = checkOvershift()) {
+    Ctx.emitReport(std::move(BRP));
+    return;
+  }
----------------
steakhal wrote:
> I'd prefer `BR` or `Report`. Since we know already that we are in the path-sensitive domain, `P` has less value there.
> I'd apply this concept everywhere in the patch.
The `P` stands for the "Ptr" in the name of the type alias `BugReportPtr`; but your comment clearly highlights that this variable name is confusing ;).

I'll probably avoid the `auto` and use `BugReportPtr Bug = ...` to remind the reader that this is a pointer (that can be null) and if it's non-null, then we found a bug.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:190-191
+  std::string Msg =
+      llvm::formatv("The result of {0} shift is undefined because the right "
+                    "operand{1} is not smaller than {2}, the capacity of '{3}'",
+                    isLeftShift() ? "left" : "right", RightOpStr, LHSBitWidth,
----------------
steakhal wrote:
> 
This is an important distinction, I use "not smaller than {2}" as a shorter synonym of "larger than or equal to {2}". I can choose some other wording if you feel that this is not clear enough.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:263
+
+  const SVal Right = Ctx.getSVal(Op->getRHS());
+
----------------
steakhal wrote:
> I'd move this closer to the first "use" place.
Good point, previously it was used by the calculation that is now performed by `assumeRequirement`.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:315
+    default:
+      llvm_unreachable("this checker does not use other comparison operators");
+  }
----------------
steakhal wrote:
> 
I'd say that the current wording is correct because the the checker "use"s the comparison operators (in its implementation) instead of "accept"ing them from outside; but I can replace the message with e.g. "this //function// does not accept other comparison operators" if you'd prefer that.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302
+                      pluralSuffix(MaximalAllowedShift));
+    R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(),
+                                                Ctx.getLocationContext()});
+    Ctx.emitReport(std::move(R));
----------------
NoQ wrote:
> donat.nagy wrote:
> > donat.nagy wrote:
> > > donat.nagy wrote:
> > > > gamesh411 wrote:
> > > > > donat.nagy wrote:
> > > > > > NoQ wrote:
> > > > > > > Can we just append this to the warning? The `addNote()` is useful for notes that need to be placed in code outside of the execution path, but if it's always next to the warning, it probably doesn't make sense to display it separately.
> > > > > > I didn't append this to the warning because I felt that the warning text is already too long. (By the way, an earlier internal variant of this checker produced two separate notes next to the warning message.)
> > > > > > 
> > > > > > I tweaked the messages of this checker before initiating this review, but I feel that there's still some room for improvements. (E.g. in this particular case perhaps we could omit some of the details that are not immediately useful and could be manually calculated from other parts of the message...) 
> > > > > Just a thought on simplifying the diagnostics a bit:
> > > > > 
> > > > > Warning: "Right operand is negative in left shift"
> > > > > Note: "The result of left shift is undefined because the right operand is negative"
> > > > > Shortened: "Undefined left shift due to negative right operand"
> > > > > 
> > > > > Warning: "Left shift by '34' overflows the capacity of 'int'"
> > > > > Note: "The result of left shift is undefined because the right operand '34' is not smaller than 32, the capacity of 'int'"
> > > > > Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 bits)"
> > > > > 
> > > > > The more complex notes are a bit sketchy, as relevant information can be lost, and the following solution is probably a bit too much, but could prove to be an inspiration:
> > > > > 
> > > > > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > > > > CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> > > > > CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 bits for bitshift"
> > > > > C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
> > > > > C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits for bitshift"
> > > > > 
> > > > > Shortened:
> > > > > CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (32 bits, including sign)"
> > > > > C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (31 bits, excluding sign)"
> > > > Clarification about the `Msg`/`ShortMsg` distinction:
> > > > I'm intentionally separating the short warning messages and the longer note messages because `createBugReport()` enforces the convention that it will always emit a warning and a note at the bug location.
> > > > 
> > > > According to the comments in the source code, the intention is that the note contains all the relevant information, while the warning is a brief summary that can be displayed in situations where the notes wouldn't fit the UI.
> > > > 
> > > > IIUC many checkers ignore this distinction and emit the same (often long and cumbersome) message both as a note and as a warning (`createBugReport()` has a variant which takes only one message parameter and passes it to both locations), but I tried to follow it because I think it's ugly when the same message is repeated twice and there is some sense in providing both a brief summary and a full description that doesn't use potentially-ambiguous abbreviations to save space.
> > > > 
> > > > Of course I could also accept a community decision that this "brief warning / full note" distinction is deprecated and will be eliminated (because e.g. front-end utilities are not prepared to handle it), but in that case I'd strongly suggest a redesign where we eliminate the redundantly printed 'note' message. (There is no reason to say the same thing twice! There is no reason to say the same thing twice!)
> > > > 
> > > > On the other hand, in addition to this `Msg`/`ShortMsg` distinction, this part of the code also adds the extra `LeftNote` (as a remnant from an earlier internal version of this checker), and that's that's what I'd like to merge into `Msg` (following NoQ's suggestion and keeping it distinct from the `ShortMsg`).
> > > Among notes, my only planned change is that instead of 
> > > 
> > > > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > > > CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> > > > CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 bits for bitshift"
> > > > C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
> > > > C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits for bitshift"
> > > 
> > > I'll implement something like
> > > 
> > > > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > > > CXX Note: "Left shift of '1024' (represented by 11 bits) is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> > > > C Note: "Left shift of '1024' (represented by 11 bits) is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
> > Correction: now I'm leaning towards just discarding the secondary note, because the message examples listed in the previous comment are just the variants where the right operand is unknown. In the more detailed message template "The shift '{0} << {1}' is undefined {2}, so {3} bit{4} overflow{5}" there is no place to insert the "represented by {} bits" message.
> > 
> There's nothing wrong with long, multi-sentence diagnostic messages!
> 
> Unlike the compiler proper, we aren't typically used from the command line, so we aren't trying to fit into 80 columns. So we start our warnings with a capital letter and expect them to be, at the very least, complete sentences.
Even if the message length is not limited, I think it's better to delete the additional note line (instead of merging it with the "regular" note line) because it's just //too much information// and the gist of the issue is lost behind the flood of redundant numbers.

However, this is just a subjective decision and I'm open to extending the message if you feel that something important is missing from it.


================
Comment at: clang/test/Analysis/bitwise-shift-common.c:108
+  *p += 1 << a;
+  // expected-note at -1 {{Assuming right operand of bit shift is non-negative but less than 32}}
+  *p += 1 << (a + 32);
----------------
steakhal wrote:
> Hmm, does this "but" actually mean "and" here?
> Anyway, both using "but" and "and" there looks awkward.
> It might be only me though.
Yes, this sentence fragment is like "not too small but not too large", where "but" is equivalent to "and" and both are a bit awkward.


================
Comment at: clang/test/Analysis/bitwise-shift-sanity-checks.c:20-22
+// RUN:    -Wno-shift-count-negative -Wno-shift-negative-value \
+// RUN:    -Wno-shift-count-overflow -Wno-shift-overflow \
+// RUN:    -Wno-shift-sign-overflow
----------------
steakhal wrote:
> How about simplifying these lines to `-Wno-shift`? Have you considered it?
Unfortunately there is no such shortcut: when I tried to execute trunk clang with it, I got "//warning: unknown warning option '-Wno-shift'//".



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312



More information about the cfe-commits mailing list