[cfe-dev] PR16833 - Analyzer hangs trying to analyze

Wed Aug 13 02:35:39 PDT 2014

Sorry, my fault. I have investigated a problem deeper to write a PR and 
I found that I hit the same problem as the thread starter. The problem 
was a case:
case OPC_J ... OPC_JAL:
where
     OPC_J = (0x02 << 26),
     OPC_JAL = (0x03 << 26),

Sorry for a wrong information.

> This sounds like a bug.  We bound the amount of work the analyzer does, and for it to run out of memory in this way seems like we possibly have abandonment of memory or something.
>
> Can you file an LLVM PR with a preprocessed file that reproduces the issue?
>
>> On Aug 12, 2014, at 11:57 PM, Aleksei Sidorin <a.sidorin at samsung.com> wrote:
>>
>> Hello. I think the problem is a bit more complicated and is related to every case statement with a big amount of cases, not range cases only. Here is a sample file:
>> https://android.googlesource.com/platform/external/qemu/+/master/target-mips/translate.c
>> Analyzer can consume over 70 Gb of RAM while analyzing it and it is not enough - it crashes due to lack of memory.
>> It seems like the problem is an exponential growth of number of ExplodedNodes generated for every 'switch' but I'm not sure (I didn't investigate it yet).
>>
>>> That FIXME, and indeed the code to support case ranges, actually predates RangeConstraintManager, which implements most of what the FIXME is referring to. (That is, the technical/historical reason we couldn't do it symbolically is "it wasn't implemented yet".) So your change is in fact on the right track.
>>> T
>>> However, what you've done is construct the expression "$x >= LOWER && $x <= UPPER". This is correct, but it's not a form that RangeConstraintManager will actually handle right now:
>>>
>>> void clang_analyzer_eval(int);
>>> void test(int i) {
>>>     switch (i) {
>>>     case 1 ... 5:
>>>         return;
>>>     }
>>>     clang_analyzer_eval(x == 2); // should print FALSE, but will print UNKNOWN with your patch.
>>> }
>>>
>>> (clang_analyzer_eval is a special function used to probe the analyzer state, handled by the debug.ExprInspection checker.)
>>>
>>> The easiest way to fix this would be to use assumeDual on each expression individually. Then you end up with three cases: $x < LOWER, $x is in range, and $x > UPPER. Unfortunately, you'd then have to restructure the whole function, which currently assumes that you end up with exactly one state that can end up in the "default" branch. That wouldn't be too hard.
>>>
>>> The harder way to solve this is to actually teach [Range]ConstraintManager to recognize constraints like this, and do the right thing. The implementation is already set up to handle disjoint sets of ranges, so you could actually do this without changing everything. It's probably more messy than you'd like, though.
>>>
>>> The sneaky way to do this is to realize that "$x >= LOWER && $x <= UPPER" is actually equivalent to "$x - (LOWER - MIN) <= UPPER - (LOWER - MIN)", where MIN is the minimum value for the integer type, for integers with wraparound semantics. (RangeConstraintManager actually takes advantage of this in reverse to translate "$x - OFFSET <= VALUE" back into a pair of range disjunctions.) The only trouble with doing something like this is that wraparound is Hard? and you/we will want to be very careful that this actually handles everything correctly, for all the integer types we care about. A definite benefit of this is that we're back to only having one state, which prevents potential exponential explosion as analysis continues.
>>>
>>> What do you think we should do here?
>>> Jordan
>>>
>>>
>>>> On Aug 12, 2014, at 12:39 , Zach Davis <zdavkeos at gmail.com> wrote:
>>>>
>>>> All-
>>>>
>>>> I was looking over 16833 and had a few questions.
>>>>
>>>> The issue is the analyzer tries to concretize all the values in a case
>>>> statement with a range value:
>>>>
>>>>     switch (a) {
>>>>         case 1 ... 0xfffff: // hang's trying each value in the range
>>>>            return 1;
>>>>         default:
>>>>            return 0;
>>>>     }
>>>>
>>>> Is there any reason we can't symbolically evaluate 1 <= a <= 0xfffff,
>>>> rather than trying each 'a == v for v in 1..0xfffff'? I've attached a
>>>> patch that attempts to do this and it seems to work (it no longer
>>>> hangs anyway).  Is there a technical/historical reason for doing it
>>>> the current way?
>>>>
>>>> The FIXME above the block mentions using '"ranges" for NonLVals'. I'm
>>>> not sure what this means, or if any work has been done toward this.
>>>> If not, would there be any value in pursing a patch like this in the
>>>> meantime to resolve the performance issue?
>>>>
>>>> Thanks,
>>>> Zach
>>>> <16833_hang_on_large_case_range.patch>_______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>> -- 
>> Best regards,
>> Aleksei Sidorin
>> Software Engineer,
>> IMSWL-IMCG, SRR, Samsung Electronics
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-- 
Best regards,
Aleksei Sidorin
Software Engineer,
IMSWL-IMCG, SRR, Samsung Electronics