[PATCH] D35110: [Analyzer] Constraint Manager Negates Difference

Fri May 25 14:53:06 PDT 2018

NoQ added inline comments.

================
Comment at: lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:265-276
+      const llvm::APSInt &from = i->From(), &to = i->To();
+      const llvm::APSInt &newFrom = (to.isMinSignedValue() ?
+                                     BV.getMaxValue(to) :
+                                     (to.isMaxSignedValue() ?
+                                      BV.getMinValue(to) :
+                                      BV.getValue(- to)));
+      const llvm::APSInt &newTo = (from.isMinSignedValue() ?
----------------
baloghadamsoftware wrote:
> NoQ wrote:
> > baloghadamsoftware wrote:
> > > NoQ wrote:
> > > > Hmm, wait a minute, is this actually correct?
> > > > 
> > > > For the range [-2³¹, -2³¹ + 1] over a 32-bit integer, the negated range will be [-2³¹, -2³¹] U [2³¹ - 1, 2³¹ - 1].
> > > > 
> > > > So there must be a place in the code where we take one range and add two ranges.
> > > The two ends of the range of the type usually stands for +/- infinity. If we add the minimum of the type when negating a negative range, then we lose the whole point of this transformation.
> > > 
> > > Example: If `A - B < 0`, then the range of `A - B` is `[-2³¹, -1]`, If we negate this, and keep the `-2³¹` range end, then we get `[-2³¹, -2³¹]U[1, 2³¹-1]`. However, this does not mean `B - A > 0`. If we make assumption about this, we get two states instead of one, in the true state `A - B`'s range is `[1, 2³¹-1]` and in the false state it is `[-2³¹, -2³¹]`. This is surely not what we want.
> > Well, we can't turn math into something we want, it is what it is.
> > 
> > Iterator-related symbols are all planned to be within range [-2²⁹, -2²⁹], right? So if we subtract one such symbol from another, it's going to be in range [-2³⁰, 2³⁰]. Can we currently infer that? Or maybe we should make the iterator checker to enforce that separately? Because this range doesn't include -2³¹, so we won't have any problems with doing negation correctly.
> > 
> > So as usual i propose to get this code mathematically correct and then see if we can ensure correct behavior by enforcing reasonable constraints on our symbols.
> I agree that the code should do mathematically correct things, but what I argue here is what math here means. Computer science is based on math, but it is somewhat different because of finite ranges and overflows. So I initially regarded the minimal and maximal values as infinity. Maybe that is not correct. However, I am sure that negating `-2³¹` should never be `-2³¹`. This is mathematically incorrect, and renders the whole calculation useless, since the union of a positive range and a negative range is unsuitable for any reasoning. I see two options here:
> 
> 1. Remove the extension when negating a range which ends at the maximal value of the type. So the negated range begins at the minimal value plus one. However, cut the range which begins at the minimal value of the type by one. So the negated range ends at the maximal value, as in the current version in the patch.
> 
> 2. Remove the extension as in 1. and disable the whole negation if we have the range begins at the minimal value.
> 
> Iterator checkers are of course not affected because of the max/4 constraints.
> However, I am sure that negating `-2³¹` should never be `-2³¹`. This is mathematically incorrect, and renders the whole calculation useless, since the union of a positive range and a negative range is unsuitable for any reasoning.

Well, that's how computers already work. And that's how all sorts of abstract algebra work as well, so this is totally mathematically correct. We promise to support the [[ https://en.wikipedia.org/wiki/Two's_complement | two's complement ]] semantics in the analyzer when it comes to signed integer overflows. Even though it's technically UB, most implementations follow this semantics and a lot of real-world applications explicitly rely on that. Also we cannot simply drop values from our constraint ranges in the general case because the values we drop might be the only valid values, and the assumption that at least one non-dropped value can definitely be taken is generally incorrect. Finding cornercases like this one is one of the strong sides of any static analysis: it is in fact our job to make the user aware of it if he doesn't understand overflow rules. If it cannot be said that the variable on a certain path is non-negative because it might as well be -2³¹, we should totally explore this possibility. If for a certain checker it brings no benefit because such value would be unlikely in certain circumstances, that checker is free to cut off the respective paths, but the core should perform operations precisely. I don't think we have much room for personal preferences here.

https://reviews.llvm.org/D35110