[PATCH] D156312: [analyzer] Upstream BitwiseShiftChecker

DonĂ¡t Nagy via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 9 04:39:57 PDT 2023


donat.nagy planned changes to this revision.
donat.nagy marked 5 inline comments as done.
donat.nagy added a comment.

I'll soon upload a refactored version.



================
Comment at: clang/docs/analyzer/checkers.rst:44-50
+Moreover, if the pedantic mode is activated by
+``-analyzer-config core.BitwiseShift:Pedantic=true``, then this checker also
+reports situations where the _left_ operand of a shift operator is negative or
+overflow occurs during the right shift of a signed value. (Most compilers
+handle these predictably, but the C standard and the C++ standards before C++20
+say that they're undefined behavior. In the C++20 standard these constructs are
+well-defined, so activating pedantic mode in C++20 has no effect.)
----------------
steakhal wrote:
> I believe shifting out the "sign bit" is UB because the representation of the signed integer type was not defined.
> Prior C++20 (?), it was allowed to represent signed integer types as `two's-complement` (virtually every platform uses this), `one's complement`, `sign-magnitude`. [[ https://en.wikipedia.org/wiki/Signed_number_representations | Wiki ]].
> And it turns out, all these three representations behave differently with negative values.
> Check out [[ https://youtu.be/JhUxIVf1qok?t=2089 | JF's talk ]] from 2018 about this.
> 
> I'd need to think about the relevance of C++20 in this context, but here is what I think of now:
> My idea is that the observed behavior is not influenced by the "standard", rather by the time we released C++20, they realized that no actual platform uses weird signed integer representations.
> Given this observation, I'd argue that we shouldn't have this flag at all.
Thanks for the background information! I knew the rough outlines of the situation, but your links provided interesting details.

I agree that the non-pedantic mode is the "sane mode" that reflects the real behavior of the hardware; but the standards (with the exception of C++20) explicitly declare that signed shift overflow and shifting negative values is UB and there may be secondary requirements like [[ https://wiki.sei.cmu.edu/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow | SEI-CERT rule INT32-C  ]] that demand compliance to the standards.

I don't think that I'd spend time on implementing pedantic mode because there are more important things to do, but as we already have this feature, I think we should release it as an off-by-default option.



================
Comment at: clang/docs/analyzer/checkers.rst:81
+
+Ensure the shift operands are in proper range before shifting.
+
----------------
steakhal wrote:
> We should exactly elaborate on what "proper" means here.
What would you suggest? I could try to write a slightly longer suggestion, but I don't want to repeat the same conditions that I listed above the code example.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:142
+                                              BinaryOperator::Opcode Comparison,
+                                              long long Limit) {
+  SValBuilder &SVB = Ctx.getSValBuilder();
----------------
steakhal wrote:
> One does not frequently see the `long long` type.
> Do you have a specific reason why `int` or `unsigned` wouldn't be good?
I inherited this type choice from old code and didn't think much about it. My guess is that `LimitVal` needs to be a signed integer (because e.g. `-1 < 0u` would be evaluated to false) and it's a `long long` because that's guaranteed to represent all `unsigned` values. (However, the actual values of `Limit` are very small.)

I'll probably use `int` for `Limit` and `LimitVal` with some comments to clarify the reason for this type choice.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:152-156
+    if (!StTrue) {
+      // We detected undefined behavior (the caller will report it).
+      State = StFalse;
+      return false;
+    }
----------------
steakhal wrote:
> Generally, I don't favor mutating member functions.
> It makes it harder to track how we ended up with a given State.
> 
I also agree that mutating member functions are often problematic and may make the code more difficult to understand.

However, in this particular case I think it's important to ensure that we're always using the most recent state, and that's difficult to guarantee in the "pass everything as arguments and return values" style. Moreover, I do some parallel bookkeeping because I want to summarize the state changes in a single note tag, and that would make the "naive functional" solution even more convoluted.

In a functional language like Haskell I'd use computations in a [[ https://en.wikibooks.org/wiki/Haskell/Understanding_monads/State | State monad ]] to implement this logic; my C++ code is a direct translation of that solution. (As a secondary utility, my class also provides read-only access to some context like a Reader monad, because functions that take 5-8 arguments are also very difficult to understand. This way, the relevant arguments are highlighted as actual arguments, while the shared context is described by the `const` data members.)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302
+                      pluralSuffix(MaximalAllowedShift));
+    R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(),
+                                                Ctx.getLocationContext()});
+    Ctx.emitReport(std::move(R));
----------------
donat.nagy wrote:
> gamesh411 wrote:
> > donat.nagy wrote:
> > > NoQ wrote:
> > > > Can we just append this to the warning? The `addNote()` is useful for notes that need to be placed in code outside of the execution path, but if it's always next to the warning, it probably doesn't make sense to display it separately.
> > > I didn't append this to the warning because I felt that the warning text is already too long. (By the way, an earlier internal variant of this checker produced two separate notes next to the warning message.)
> > > 
> > > I tweaked the messages of this checker before initiating this review, but I feel that there's still some room for improvements. (E.g. in this particular case perhaps we could omit some of the details that are not immediately useful and could be manually calculated from other parts of the message...) 
> > Just a thought on simplifying the diagnostics a bit:
> > 
> > Warning: "Right operand is negative in left shift"
> > Note: "The result of left shift is undefined because the right operand is negative"
> > Shortened: "Undefined left shift due to negative right operand"
> > 
> > Warning: "Left shift by '34' overflows the capacity of 'int'"
> > Note: "The result of left shift is undefined because the right operand '34' is not smaller than 32, the capacity of 'int'"
> > Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 bits)"
> > 
> > The more complex notes are a bit sketchy, as relevant information can be lost, and the following solution is probably a bit too much, but could prove to be an inspiration:
> > 
> > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> > CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 bits for bitshift"
> > C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
> > C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits for bitshift"
> > 
> > Shortened:
> > CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (32 bits, including sign)"
> > C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity (31 bits, excluding sign)"
> Clarification about the `Msg`/`ShortMsg` distinction:
> I'm intentionally separating the short warning messages and the longer note messages because `createBugReport()` enforces the convention that it will always emit a warning and a note at the bug location.
> 
> According to the comments in the source code, the intention is that the note contains all the relevant information, while the warning is a brief summary that can be displayed in situations where the notes wouldn't fit the UI.
> 
> IIUC many checkers ignore this distinction and emit the same (often long and cumbersome) message both as a note and as a warning (`createBugReport()` has a variant which takes only one message parameter and passes it to both locations), but I tried to follow it because I think it's ugly when the same message is repeated twice and there is some sense in providing both a brief summary and a full description that doesn't use potentially-ambiguous abbreviations to save space.
> 
> Of course I could also accept a community decision that this "brief warning / full note" distinction is deprecated and will be eliminated (because e.g. front-end utilities are not prepared to handle it), but in that case I'd strongly suggest a redesign where we eliminate the redundantly printed 'note' message. (There is no reason to say the same thing twice! There is no reason to say the same thing twice!)
> 
> On the other hand, in addition to this `Msg`/`ShortMsg` distinction, this part of the code also adds the extra `LeftNote` (as a remnant from an earlier internal version of this checker), and that's that's what I'd like to merge into `Msg` (following NoQ's suggestion and keeping it distinct from the `ShortMsg`).
Among notes, my only planned change is that instead of 

> Warning: "Left shift of '1024' overflows the capacity of 'int'"
> CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 bits for bitshift"
> C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"
> C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits for bitshift"

I'll implement something like

> Warning: "Left shift of '1024' overflows the capacity of 'int'"
> CXX Note: "Left shift of '1024' (represented by 11 bits) is undefined because 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> C Note: "Left shift of '1024' (represented by 11 bits) is undefined because 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312



More information about the cfe-commits mailing list