[PATCH] D103096: [analyzer] Implement cast for ranges of symbolic integers

Gabor Marton via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 19 08:06:24 PDT 2022


martong added a subscriber: vabridgers.
martong added a comment.

First of all, thanks Denys for working on this, nice work! Here are my concerns and remarks.

I think this fixed set of types in `NominalTypeList` is too rigid. I don't like the fact that we have to iterate over all four types whenever we set a new constraint or when we try to infer. Also, I am thinking about downstream hardware architectures, where there might be integers with different bit-widths (@vabridgers). Also, at some point people will pursue us to support integers with arbitrary bitwidth (see _ExtInt <https://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html>)

Thus, I am proposing an alternative approach. We should have a SymExpr -> Set of SymExpr mapping in the State that represents the relation of symbols that are connected via some cast operations (see REGISTER_MAP_WITH_PROGRAMSTATE). Let's call this mapping as `CastMap`. The key should be the root symbol, i.e the symbol that is being declared first before all cast operations.

E.g.  Let's have

  int16 a = 128;

then we have a constraint [128,128] stored for `$a`. Then

  if ((int8)a < 0)

creates a new symbol `$a2` (SymbolCast) that has a new constraint [-128,-128] assigned to it. And we also keep track in the State, that `$a` and `$a2` refers the same root symbol (`a`). We now have in the CastMap $a -> [$a2].

Now, let's say we have

  if ((_ExtInt(7))a > 64)

then we can dig up the existing contraints from CastMap to check for the State's validity and we can update all the constraints of $a and $a2 as needed. Also, CastMap is updated: $a -> [$a2, $a3].



================
Comment at: clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp:421-426
+  // Unwrap symbolic expression to skip argument casts on function call.
+  // This is useful when there is no way for overloading function in C
+  // but we need to pass different types of arguments and
+  // implicit cast occures.
+  Sym = Sym->ignoreCasts();
+
----------------
Does it really matter? I mean, why do we need this change?


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2212
+
+bool ConstraintAssignor::assignSymExprToRangeSet(const SymExpr *Sym,
+                                                 RangeSet Constraint) {
----------------
Could you please rebase? The "simplification" code part had been merged already to llvm/main and it is not part of this change.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2227-2251
+  llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;
+  // Iterate over all equivalence classes and try to simplify them.
+  ClassMembersTy Members = State->get<ClassMembers>();
+  for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {
+    EquivalenceClass Class = ClassToSymbolSet.first;
+    State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
+    if (!State)
----------------
I think this hunk should remain in `assignSymExprToConst`. Why did you move it?


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2280-2290
+/// Return a symbol which is the best canidate to save it in the constraint
+/// map. We should correct symbol because in case of truncation cast we can
+/// only reason about truncated bytes but not the whole value. E.g. (char)(int
+/// x), we can store constraints for the first lower byte but we still don't
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
----------------
I think, we should definitely store the constraints as they appear in the analyzed code. This way we mix the infer logic into the constraint setting, which is bad.
I mean, we should simply store the constraint directly for the symbol as it is. And then only in `VisitSymbolCast` should we infer the proper value from the stored constraint (if we can).

(Of course, if we have related symbols (casts of the original symbol) then their constraints must be updated.)


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2281
+/// Return a symbol which is the best canidate to save it in the constraint
+/// map. We should correct symbol because in case of truncation cast we can
+/// only reason about truncated bytes but not the whole value. E.g. (char)(int
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2284
+/// x), we can store constraints for the first lower byte but we still don't
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2285
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2286
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
+/// constraints of cast symbol but the root symbol in `if` expression
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2322
+  if (IsTruncated) {
+    // Trancation occurred. High bits lost. We can't reason about ranges of
+    // the original(root) operand in this case, so we should not add it to the
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2346-2350
+/// FIXME: Update bigger casts. We only can reason about ranges of smaller
+/// types, because it would be too complicated to update, say, the entire
+/// `int` range if you only have knowledge that its lowest byte has been
+/// changed. So we don't touch bigger casts and they may be potentially
+/// invalid. For future, for:
----------------
Instead of a noop we should be more conservative in this case. We should invalidate (remove) the constraints of all the symbols that have more bits than the currently set symbol. However, we might be clever in cases of special values (e.g `0` or in case of the `true` rangeSet {[MIN, -1], [1, MAX]}).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103096/new/

https://reviews.llvm.org/D103096



More information about the cfe-commits mailing list