[clang] [analyzer] Model overflow builtins (PR #102602)

Mon Sep 2 03:41:38 PDT 2024

================
@@ -50,6 +118,75 @@ class BuiltinFunctionChecker : public Checker<eval::Call> {
 
 } // namespace
 
+std::pair<bool, bool>
+BuiltinFunctionChecker::checkOverflow(CheckerContext &C, SVal RetVal,
+                                      QualType Res) const {
+  ProgramStateRef State = C.getState();
+  SValBuilder &SVB = C.getSValBuilder();
+  ASTContext &ACtx = C.getASTContext();
+
+  // Calling a builtin with a non-integer type result produces compiler error.
+  assert(Res->isIntegerType());
+
+  unsigned BitWidth = ACtx.getIntWidth(Res);
+  auto MinVal =
+      llvm::APSInt::getMinValue(BitWidth, Res->isUnsignedIntegerType());
+  auto MaxVal =
+      llvm::APSInt::getMaxValue(BitWidth, Res->isUnsignedIntegerType());
+
+  SVal IsLeMax =
+      SVB.evalBinOp(State, BO_LE, RetVal, nonloc::ConcreteInt(MaxVal), Res);
+  SVal IsGeMin =
+      SVB.evalBinOp(State, BO_GE, RetVal, nonloc::ConcreteInt(MinVal), Res);
+
+  auto [MayNotOverflow, MayOverflow] =
+      State->assume(IsLeMax.castAs<DefinedOrUnknownSVal>());
+  auto [MayNotUnderflow, MayUnderflow] =
+      State->assume(IsGeMin.castAs<DefinedOrUnknownSVal>());
+
+  return {MayOverflow || MayUnderflow, MayNotOverflow && MayNotUnderflow};
+}
+
+void BuiltinFunctionChecker::handleOverflowBuiltin(const CallEvent &Call,
+                                                   CheckerContext &C,
+                                                   BinaryOperator::Opcode Op,
+                                                   QualType ResultType) const {
+  // Calling a builtin with an incorrect argument count produces compiler error.
+  assert(Call.getNumArgs() == 3);
+
+  ProgramStateRef State = C.getState();
+  SValBuilder &SVB = C.getSValBuilder();
+  const Expr *CE = Call.getOriginExpr();
+
+  SVal Arg1 = Call.getArgSVal(0);
+  SVal Arg2 = Call.getArgSVal(1);
+
+  SVal RetValMax = SVB.evalBinOp(State, Op, Arg1, Arg2,
+                                 getSufficientTypeForOverflowOp(C, ResultType));
+  SVal RetVal = SVB.evalBinOp(State, Op, Arg1, Arg2, ResultType);
+
+  auto [Overflow, NotOverflow] = checkOverflow(C, RetValMax, ResultType);
+  if (NotOverflow) {
+    ProgramStateRef StateNoOverflow =
+        State->BindExpr(CE, C.getLocationContext(), SVB.makeTruthVal(false));
+
+    if (auto L = Call.getArgSVal(2).getAs<Loc>()) {
+      StateNoOverflow =
+          StateNoOverflow->bindLoc(*L, RetVal, C.getLocationContext());
+
+      // Propagate taint if any of the argumets were tainted
+      if (isTainted(State, Arg1) || isTainted(State, Arg2))
+        StateNoOverflow = addTaint(StateNoOverflow, *L);
+    }
+
+    C.addTransition(StateNoOverflow);
----------------
NagyDonat wrote:

Thanks for the ping, but I think that in this concrete case `trackExpressionValue` is probably OK.

I mostly advocated against its use when we were talking about ambitious plans that would've required extending the current code and/or using it in new, unusual situations. The code of `trackExpressionValue` is ugly and convoluted, but it is working (in the sense of "it's working, don't touch it").


By the way, if you want to explain the origin of _symbolic_ values, then calling `markInteresting()` on the relevant symbols is another option that may be sufficient for your goals. It is less general than `trackExpressionValue()` (e.g. it doesn't track concrete values, it doesn't track control dependencies etc.) but it doesn't apply random heuristics which are not relevant for you.

https://github.com/llvm/llvm-project/pull/102602