[clang] [clang] [Static analyzer]: add initial support for builtin overflow (PR #102602)

Fri Aug 9 06:39:07 PDT 2024

================
@@ -50,6 +101,44 @@ class BuiltinFunctionChecker : public Checker<eval::Call> {
 
 } // namespace
 
+void BuiltinFunctionChecker::HandleOverflowBuiltin(const CallEvent &Call,
+                                                   CheckerContext &C,
+                                                   BinaryOperator::Opcode Op,
+                                                   QualType ResultType) const {
+  // All __builtin_*_overflow functions take 3 argumets.
+  assert(Call.getNumArgs() == 3);
+
+  ProgramStateRef State = C.getState();
+  SValBuilder &SVB = C.getSValBuilder();
+  const Expr *CE = Call.getOriginExpr();
+
+  SVal Arg1 = Call.getArgSVal(0);
+  SVal Arg2 = Call.getArgSVal(1);
+
+  SVal RetVal = SVB.evalBinOp(State, Op, Arg1, Arg2, ResultType);
+
+  // TODO: Handle overflows with values that known to overflow. Like INT_MAX + 1
+  // should not produce state for non-overflow case and threat it as
----------------
NagyDonat wrote:

I thought about this for some time and checked the implementation of `evalBinOp` (more precisely `SValBuilder:evalBinOpNN`, the case that's relevant here), but unfortunately I don't see a clear solution.

You are right that `SVB` _should_ take care of this, but unfortunately that does not happen currently.

Currently the analyzer (and `SValBuilder` in particular) has its own rigid opinion about overflow handling: it allows overflow on all numeric operations (including signed ones, where it's undefined behavior according to the standard) and models it without splitting the state, so e.g. if `x` and `y` are `signed char` values within the range `50 <= x, y <= 100`, then it calculates that the possible values of `x+y` are $[-128, -56] \cup [100, 127]$.

It would be good to enhance this behavior by tagging the ranges within the `RangeSet` with "did overflow happen?" bits that would let us perform a state split. This is theoretically doable (I don't see anything that would block this), but would be a complex refactoring effort (we'd need to pass along this "did we overflow?" bit through all the spaghetti code).

Without this, you can implement a workaround that calls `SValBuilder::getMinValue()` and `SValBuilder::getMaxValue()` to get the minimal and maximal values of the operands (as `llvm::APSInt` values, that is, **a**rbitrary **p**recision **s**igned **int**egers) and then uses these to determine which of the two branches (overflow, no-overflow) are possible.

This workaround solution would be better than the current "always act as if both branches were possible", but it is not sufficient for deducing the set of possible results on the no-overflow branch. (That is, if `x` and `y` are `signed char` values within the range `50 <= x, y <= 100`, then this workaround would create two branches: one where `__builtin_add_overflow` returns true, and and one where it returns `false` and stores a symbol with range $[-128, -56] \cup [100, 127]$ within the result pointer; while the correct assumption would be that the result has range $[100, 127]$ on this no-overflow branch.)

https://github.com/llvm/llvm-project/pull/102602