[PATCH] D103440: [WIP][analyzer] Introduce range-based reasoning for addition operator

Valeriy Savchenko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 16 02:36:01 PDT 2021


vsavchenko added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:1401-1402
+
+    // FIXME: This case in particular is resulting in failed assertion.
+    Range First = Range(Tmin, Max);
+    Range Second = Range(Min, Tmax);
----------------
vsavchenko wrote:
> manas wrote:
> > I changed the logic from using getCrucialPoints (which I mentioned in the e-mail thread) to simple checks to determine overflows using signedness. But the failed assertions again popped up. And I was unable to pin-point the issue here. Can someone help me?
> > 
> > Although, I am thinking of revising the above logic by using bitwise methods to detect overflows.
> This is actually one place that I won't expect an assertion failure.
> Can we get a bit more detail on it?  Is it again `From > To` (which will indeed be weird) or something different?
> 
> NOTE: Asking for help (either here or on StackOverflow) is a lot like filing a bug report, try to put as much information as possible, try to structure this information so it is easy to follow.  It's also good to tell people what you tried, instead of saying that you tried something and it didn't work.
OK, I downloaded your patch and ran the debugger.

It complains about different bit-width for ranges that the analyzer tries to intersect.  What I checked next: what are those ranges, so I took `begin()` from one of them and checked what are `From` and `To` there.
Here is one of them:
```
(const llvm::APSInt) $8 = {
  llvm::APInt = {
    U = {
      VAL = 22
      pVal = 0x0000000000000016
    }
    BitWidth = 4022311424
  }
  IsUnsigned = true
}
```
Woah, this `BitWidth` seems ridiculous!  What does it tell us?  It definitely didn't get there as part of any reasonable logic, right?  So, what can it be instead?  Only one answer here - garbage.  We have some sort of memory issue with integers that we use as part of our ranges!  So, let's see what type of information `Range` class actually stores:
```
class Range {
public:
  Range(const llvm::APSInt &From, const llvm::APSInt &To) : Impl(&From, &To) {
    assert(From <= To);
  }

...

private:
  std::pair<const llvm::APSInt *, const llvm::APSInt *> Impl;
};
```

What we have here is a pair of pointers, and you have:
```
  llvm::APSInt Min = LHS.From() + RHS.From();
  llvm::APSInt Max = LHS.To() + RHS.To();
  llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType);
  llvm::APSInt Tmax = ValueFactory.getMaxValue(ResultType);
```
These are ALL stack variables. So, pointer to those point to adequate data ONLY while we are inside of our functions.  When we call another function, these pointers point into some data from some other function in the call stack, it doesn't point to anything even remotely resembling `llvm::APSInt` and we get `BitWidth = 4022311424`.

OK, we figured out the problem, what about a solution?  If you look at other implementations, you can notice a couple of interesting things.
When you do:
```
llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType);
```
other code does something like:
```
const llvm::APSInt &Tmin = ValueFactory.getMinValue(ResultType);
```
Or when you do:
```
Range Second = Range(Min, Tmax);
```
The sibling code does:
```
return {RangeFactory, ValueFactory.getValue(Min), ValueFactory.getValue(Max)};
```

It looks like `ValueFactory` is the key here!  It actually manages the lifetime of those integers for you and, whenever you ask it about values, it gives you `llvm::APSInt &` (note the reference part) that will have the correct lifetime.

I hope this gives you some insight into how you can debug things like this on your own and how you can reason about what you see.

Another piece of advice is to look around, other `VisitBinaryOperator` methods have all the information you actually needed.  If you don't understand why we need `ValueFactory`, - experiment, ask us!  It's bad to just ignore it.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:1405-1406
+    RangeSet ResultRangeSet = RangeFactory.getRangeSet(First);
+    RangeSet ResultRangeSet2 = RangeFactory.add(ResultRangeSet, Second);
+    return ResultRangeSet2;
+  }
----------------
Why not just `return RangeFactory.add(ResultRangeSet, Second)`?

NOTE: variables with integers in their names is a big no-no.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103440/new/

https://reviews.llvm.org/D103440



More information about the cfe-commits mailing list