[cfe-dev] PR16833 - Analyzer hangs trying to analyze large case ranges

Thu Sep 4 19:22:08 PDT 2014

Looks like Aleksei's taken some of these ideas and run with them in http://reviews.llvm.org/D5102. Aleksei, I haven't had time to look at your implementation yet, but if you haven't seen it yet here's some of the issues Zach and I were discussing.

Jordan

On Aug 21, 2014, at 20:13 , Jordan Rose <jordan_rose at apple.com> wrote:

> Hm, okay. Here's some basic structure for what happens now:
> 
> 1. SimpleConstraintManager::canReasonAbout decides whether or not a symbolic expression is even worth processing.
> 2. SimpleConstraintManager::assumeAux (the one that takes a NonLoc) does some preprocessing and canonicalizing so that it can identify the parts of a constraint.
> 3. SimpleConstraintManager::assumeSymRel makes sure all the numbers involved are compatible, and then dispatches to the various RangeConstraintManager methods
> 4. RangeConstraintManager::assumeSym* handle the six possible relational operators and the minute tricky differences between them, applying the new constraints as intersections with the existing range sets.
> 
> If you really want to put this into the constraint manager, I think I would still do this the "sneaky way", by canonicalizing your "LOW < $sym < HIGH" into "$sym - LOW < HIGH - low" in SimpleConstraintManager::assumeAux. That achieves maximal code reuse. If you weren't to do it this way, I think you'd have to introduce a new assumeInRange method that could be overridden; this would probably be easier to get right but I'm not sure it would actually add value.
> 
> As a side note, I would consider writing your disjunction with & rather than &&. Because of &&'s short-circuiting, you'd never actually see that expression show up in assume() otherwise, and handling & would make it possible to handle similar cases in user code in the future.
> 
> Please continue to e-mail with any questions you may have!
> Jordan
> 
> 
> On Aug 18, 2014, at 7:41 , Zach Davis <zdavkeos at gmail.com> wrote:
> 
>> Let me see if I understand what you are saying:
>> 
>> We are able to symbolically evaluate (correctly) the "x >= lower && x
>> <= upper" expression with SVal's. However, we can't 'remember' this
>> result because we don't construct a Constraint on the expression, and
>> even if we did the existing RangeConstraint manager doesn't support
>> it.
>> 
>> I feel like the RangeConstraint manager is really the best way to fix
>> this, even if it's not easy.  It doesn't seem to be a huge issue
>> anyway, if someone _really_ wants to analyse a file with such an
>> expression, they can use the patch i supplied in the mean time.
>> 
>> 
>> Unfortunately, looking through the Constraint manager(s) code, I have
>> no idea where to start.
>> 
>> Zach
>> 
>> On Tue, Aug 12, 2014 at 9:38 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>>> That FIXME, and indeed the code to support case ranges, actually predates
>>> RangeConstraintManager, which implements most of what the FIXME is referring
>>> to. (That is, the technical/historical reason we couldn't do it symbolically
>>> is "it wasn't implemented yet".) So your change is in fact on the right
>>> track.
>>> 
>>> However, what you've done is construct the expression "$x >= LOWER && $x <=
>>> UPPER". This is correct, but it's not a form that RangeConstraintManager
>>> will actually handle right now:
>>> 
>>> void clang_analyzer_eval(int);
>>> void test(int i) {
>>> switch (i) {
>>> case 1 ... 5:
>>> return;
>>> }
>>> clang_analyzer_eval(x == 2); // should print FALSE, but will print UNKNOWN
>>> with your patch.
>>> }
>>> 
>>> (clang_analyzer_eval is a special function used to probe the analyzer state,
>>> handled by the debug.ExprInspection checker.)
>>> 
>>> The easiest way to fix this would be to use assumeDual on each expression
>>> individually. Then you end up with three cases: $x < LOWER, $x is in range,
>>> and $x > UPPER. Unfortunately, you'd then have to restructure the whole
>>> function, which currently assumes that you end up with exactly one state
>>> that can end up in the "default" branch. That wouldn't be too hard.
>>> 
>>> The harder way to solve this is to actually teach [Range]ConstraintManager
>>> to recognize constraints like this, and do the right thing. The
>>> implementation is already set up to handle disjoint sets of ranges, so you
>>> could actually do this without changing everything. It's probably more messy
>>> than you'd like, though.
>>> 
>>> The sneaky way to do this is to realize that "$x >= LOWER && $x <= UPPER" is
>>> actually equivalent to "$x - (LOWER - MIN) <= UPPER - (LOWER - MIN)", where
>>> MIN is the minimum value for the integer type, for integers with wraparound
>>> semantics. (RangeConstraintManager actually takes advantage of this in
>>> reverse to translate "$x - OFFSET <= VALUE" back into a pair of range
>>> disjunctions.) The only trouble with doing something like this is that
>>> wraparound is Hard™ and you/we will want to be very careful that this
>>> actually handles everything correctly, for all the integer types we care
>>> about. A definite benefit of this is that we're back to only having one
>>> state, which prevents potential exponential explosion as analysis continues.
>>> 
>>> What do you think we should do here?
>>> Jordan
>>> 
>>> 
>>> On Aug 12, 2014, at 12:39 , Zach Davis <zdavkeos at gmail.com> wrote:
>>> 
>>> All-
>>> 
>>> I was looking over 16833 and had a few questions.
>>> 
>>> The issue is the analyzer tries to concretize all the values in a case
>>> statement with a range value:
>>> 
>>>   switch (a) {
>>>       case 1 ... 0xfffff: // hang's trying each value in the range
>>>          return 1;
>>>       default:
>>>          return 0;
>>>   }
>>> 
>>> Is there any reason we can't symbolically evaluate 1 <= a <= 0xfffff,
>>> rather than trying each 'a == v for v in 1..0xfffff'? I've attached a
>>> patch that attempts to do this and it seems to work (it no longer
>>> hangs anyway).  Is there a technical/historical reason for doing it
>>> the current way?
>>> 
>>> The FIXME above the block mentions using '"ranges" for NonLVals'. I'm
>>> not sure what this means, or if any work has been done toward this.
>>> If not, would there be any value in pursing a patch like this in the
>>> meantime to resolve the performance issue?
>>> 
>>> Thanks,
>>> Zach
>>> <16833_hang_on_large_case_range.patch>_______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>> 
>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140904/f489e29e/attachment.html>