[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

DonĂ¡t Nagy via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 14 08:55:13 PDT 2023


donat.nagy marked 13 inline comments as done.
donat.nagy added a comment.

I handled the inline comments, I'll check the example code and edit the release notes tomorrow.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:272
+  if (const auto ConcreteRight = Right.getAs<nonloc::ConcreteInt>()) {
+    const std::int64_t RHS = ConcreteRight->getValue().getExtValue();
+    assert(RHS > MaximalAllowedShift);
----------------
steakhal wrote:
> `getExtValue()`, I believe, asserts that it "fits", thus the concrete int is at most 64 bits wide. Does it work for `_BigInt`s? (or whatever we have like that :D)
This final check is executed when we already know that 0 <= ConcreteRight < 128 //(or the bit width of the largest integer type)//. I added a comment that mentions this and replaced the type `std::int64_t` with a plain `unsigned`.

I also updated the testcase `large_right` to ensure that later changes don't introduce crashes related to absurdly oversized right arguments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:335-339
+    auto R =
+        std::make_unique<PathSensitiveBugReport>(BT, ShortMsg, Msg, ErrNode);
+    bugreporter::trackExpressionValue(ErrNode, Op->getLHS(), *R);
+    bugreporter::trackExpressionValue(ErrNode, Op->getRHS(), *R);
+    return R;
----------------
steakhal wrote:
> AHA! Now I got you. You also call this `R`.
> Just use `R` at the callsite as well. Job done.
This is a code fragment that survived my refactoring efforts without significant changes... until now :)

Now that you highlighted it, I'm renaming `R` to `BR` just to be a contrarian :)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:355-357
+    if (!BTPtr)
+      BTPtr = std::make_unique<BugType>(this, "Bitwise shift",
+                                        "Suspicious operation");
----------------
steakhal wrote:
> How about eagerly inline initializing this field? And avoiding `mutable` as well.
This pattern systematically appears in all checkers that I've seen because the constructor of `BugType` queries and stores the name of the checker, which is only initialized when the checker (`*this`) is registered. This dependency means that the `BugType` cannot be initialized in the constructor of the `Checker`.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:53
+class BitwiseShiftValidator {
+  // Primary mutable state:
+  CheckerContext &Ctx;
----------------
steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > Where does this comment corresponds to?
> > The two data members (`Ctx` and `State`) that are directly below the comment.
> > 
> > I tried to split the list of data members into three groups (primary mutable state, secondary mutable state, `const`  members), but now I reconsider this I feel that these header comments are actually superfluous.
> > 
> > I'll return to a variant of the earlier layout where I put `NonNegFlags`/`UpperBound` to the end and comment that they are only used for note tag creation; but don't emphasize the (immediately visible) `const`ness / non-`const`-ness of other members with comments.
> One way to emphasize this is by choosing names like `FoldedState`, or anything else that has something to do with "motion" or "change".
> Given that if we have `ProgramStateRefs` in helper classes they usually remain still. Think of BugReportVisitors for example. Consequently,it's important to raise awareness when it's not like that. A different name would achieve this.
My intuition is that the variable name "State" already implies that it will be regularly changed and updated as we are going forward; but you're right that in this project we've a tradition of storing snapshots of "old" states in various helper classes.

(By the way, I'm a bit worried when I read code that stores and passes around random stale snapshots of the state. Of course we cannot avoid the branching nature of the exploded graph, but other than that it'd be good to store the State in language constructs where you cannot accidentally lose information by building onto an older state variable.) 


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:60-63
+  // If there is an upper bound, it is <= the size of a primitive type
+  // (expressed in bits), so this is a very safe sentinel value:
+  enum { NoUpperBound = 999 };
+  unsigned UpperBound = NoUpperBound;
----------------
steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > Have you considered using `std::optional`?
> > > Given that the dimension of this variable is "bits", I think it would be great to have that in its name.
> > I considered using `std::optional`, but using this "old-school" solution ensures that `NoUpperBound` is comparable to and larger than the "real" upper bound values, which lets me avoid some ugly boolean logic. I'll update the comment and probably also the variable name.
> What "ugly boolean logic" are you referring to?
> I thought one can always use `value_or(999)` anyway. To me, that means that the option approach is a refinement of yours. But depending on the situation, it might be possible to avoid the extrema value 999 altogether.
I'm prejudiced against `optional` in C++ because I feel that it's clunky and doesn't fit the rest of the language. (Note that I like `Maybe` in Haskell and don't have strong feelings about "using a variable that may be None" in Python; so this is specific to C++.) 

However, in this particular case I decided to switch to `std::optional<unsigned>`, because it lets me write `!UpperBoundBitCount` instead of the longer (but perhaps more readable) `UpperBoundBitCount != NoUpperBound`.

The "ugly boolean logic" that I spoke about is `(!UpperBoundBitCount || Limit < UpperBoundBitCount.value())`; however I think this is still better than `Limit < UpperBoundBitCount.value_or(999)` which falls to the ground between two chairs as it introduces both the type `optional` and the arbitrary placeholder number.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:106-109
+  if (auto BRP = checkOvershift()) {
+    Ctx.emitReport(std::move(BRP));
+    return;
+  }
----------------
steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > I'd prefer `BR` or `Report`. Since we know already that we are in the path-sensitive domain, `P` has less value there.
> > > I'd apply this concept everywhere in the patch.
> > The `P` stands for the "Ptr" in the name of the type alias `BugReportPtr`; but your comment clearly highlights that this variable name is confusing ;).
> > 
> > I'll probably avoid the `auto` and use `BugReportPtr Bug = ...` to remind the reader that this is a pointer (that can be null) and if it's non-null, then we found a bug.
> Oh I see. To me, it's fairly obvious now that bugreports are just smart-pointers.
> So I implicitly looked for some other reason. But just look at other checker, and call it the same way how they do, for consistency. That's all I wanted.
> There are two/three common options I believe: `B` or `BR`, and maybe `Report`.
Ok, I switched to the variable name "BR". I didn't have a strong preference for using "Bug", I just thought that your request is "the current name is bad, use anything else" and not "please use one of the traditional names".


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:218
+                                       Side == OperandSide::Left ? "Left" : "Right",
+                                       isLeftShift() ? "left" : "right")
+                             .str();
----------------
steakhal wrote:
> donat.nagy wrote:
> > steakhal wrote:
> > > The `isLeftShift() ? "left" : "right"` expression appears 4 times in total.
> > > I'd also recommend to not distinguish the capitalized "left" and "right" strings, and just post-process the messages to ensure that the first character is capitalized inside the `createBugReport()`.
> > > 
> > > Also note that you can reuse the same placeholder, so you would only need to pass the `side()` only once.
> > > ```lang=C++
> > > static void capitalizeBeginning(std::string &Str) {
> > >   if (Str.empty())
> > >     return;
> > >   Str[0] = llvm::toUpper(Str[0]);
> > > }
> > > 
> > > StringRef side() const {
> > >   return isLeftShift() ? "left" : "right";
> > > }
> > > ```
> > I introduced a method for `isLeftShift() ? "left" : "right"` (which is indeed repeated several times), but I didn't define `capitalizeBeginning` because that would've been used only once.
> > 
> > Your remark that [with the capitalization function] "you can reuse the same placeholder, so you would only need to pass the side() only once" seems to be incorrect, because `Side == OperandSide::Left ? "Left" : "Right"` (which operand are we talking about?) and `isLeftShift() ? "left" : "right"` (what kind of shift operator are we talking about?) are two independent values.
> Makes sense. It was so tempting though.
Yes, trivial string formatting can introduce a surprising amount of complications...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312



More information about the cfe-commits mailing list