[PATCH] D103096: [analyzer] Implement cast for ranges of symbolic integers

Denys Petrov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 22 12:40:16 PDT 2022


ASDenysPetrov marked 10 inline comments as done.
ASDenysPetrov added a comment.

Thank you for the review @martong! Your work is not less harder then mine. I'll rework and update the revision ASAP.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp:421-426
+  // Unwrap symbolic expression to skip argument casts on function call.
+  // This is useful when there is no way for overloading function in C
+  // but we need to pass different types of arguments and
+  // implicit cast occures.
+  Sym = Sym->ignoreCasts();
+
----------------
martong wrote:
> Does it really matter? I mean, why do we need this change?
I investigated. This changes is not obligatory now. I'll remove it.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2227-2251
+  llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;
+  // Iterate over all equivalence classes and try to simplify them.
+  ClassMembersTy Members = State->get<ClassMembers>();
+  for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {
+    EquivalenceClass Class = ClassToSymbolSet.first;
+    State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
+    if (!State)
----------------
martong wrote:
> I think this hunk should remain in `assignSymExprToConst`. Why did you move it?
I'll remove. It's unrelated one.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2280-2290
+/// Return a symbol which is the best canidate to save it in the constraint
+/// map. We should correct symbol because in case of truncation cast we can
+/// only reason about truncated bytes but not the whole value. E.g. (char)(int
+/// x), we can store constraints for the first lower byte but we still don't
+/// know the root value. Also in case of promotion or converion we should
+/// store the root value instead of cast symbol, because we can always get
+/// a correct range using `castTo` metho. And we are not intrested in any
----------------
martong wrote:
> I think, we should definitely store the constraints as they appear in the analyzed code. This way we mix the infer logic into the constraint setting, which is bad.
> I mean, we should simply store the constraint directly for the symbol as it is. And then only in `VisitSymbolCast` should we infer the proper value from the stored constraint (if we can).
> 
> (Of course, if we have related symbols (casts of the original symbol) then their constraints must be updated.)
I see what you mean. I thought about this. Here what I've faced with.
# Let's say you meet `(wchar_t)x > 0`, which you store like a pair {(wchar_t)x, [1,32767]}.
# Then you meet `(short)x < 0`, where you have to answer whether it's `true` or `false`.
# So would be your next step? Brute forcing all the constraint pairs to find any `x`-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient.
What I propose is to "canonize" arbitrary types to a general form where this form could be a part of key along with `x` and we could get a constraint with a classic map complexity. So that:
# You meet `(wchar_t)x > 0`, which you convert `wchar_t` to `int16` and store like a pair {(int16)x, [1,32767]}.
# Then you meet `(short)x < 0`, where you convert `short` to `int16` and get a constraint.
That's why I've introduced `NominalTypeList`.
But now I admited your concern about arbitrary size of integers and will redesign my solution.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2346-2350
+/// FIXME: Update bigger casts. We only can reason about ranges of smaller
+/// types, because it would be too complicated to update, say, the entire
+/// `int` range if you only have knowledge that its lowest byte has been
+/// changed. So we don't touch bigger casts and they may be potentially
+/// invalid. For future, for:
----------------
martong wrote:
> Instead of a noop we should be more conservative in this case. We should invalidate (remove) the constraints of all the symbols that have more bits than the currently set symbol. However, we might be clever in cases of special values (e.g `0` or in case of the `true` rangeSet {[MIN, -1], [1, MAX]}).
No, it's incorrect. Consider next:
```
int x;
if(x > 1000000 || x < 100000) 
  return;
// x (100'000, 1000'000) 
if((int8)x != 42) 
  return;
// x (100'000, 1000'000) && (int8)x (42, 42) 
```
We can't just remove or invalidate `x (100'000, 1000'000)` because this range will still stay //true//.
Strictly speaking `x` range should be updated with values 100394, 102442, 108586, ...,, 960554 and any other value within the range which has its lowest byte equals to 42.
We can't just update the `RangeSet` with such a big amount of values due to performance issues. So we just assume it as less accurate.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103096/new/

https://reviews.llvm.org/D103096



More information about the cfe-commits mailing list