[cfe-dev] [analyzer] exploration strategies and paths

Wed Jan 31 21:59:51 PST 2018

On sqlite there are 8 new reports and one removed reports.

I think it’s natural that the number is high. In many cases we are saving 3x of the analyzer budget,
which is then spent finding new bugs.

> On Jan 31, 2018, at 9:52 PM, Anna Zaks <ganna at apple.com> wrote:
> 
> 
> 
>> On Jan 31, 2018, at 9:41 PM, George Karpenkov <ekarpenkov at apple.com <mailto:ekarpenkov at apple.com>> wrote:
>> 
>> 
>> 
>>> On Jan 31, 2018, at 8:46 PM, Anna Zaks <ganna at apple.com <mailto:ganna at apple.com>> wrote:
>>> 
>>> 
>>>> On Jan 31, 2018, at 5:00 PM, George Karpenkov <ekarpenkov at apple.com <mailto:ekarpenkov at apple.com>> wrote:
>>>> 
>>>> The list did not like a posting with many images,
>>>> so I have posted an evaluation to phabricator: https://reviews.llvm.org/M1 <https://reviews.llvm.org/M1>
>>>> 
>>>> The text part was:
>>>> 
>>>> After fixing a few bugs, another evaluation of the approach shows considerably better results.
>>>> 
>>>> On openssl:
>>>> 
>>>> 9 reports added
>>>> 1 report removed
>>>> On postgresql:
>>>> 
>>>> 377 reports added
>>>> 43 reports removed
>>>> On sqlite3 + a few other misc files:
>>>> 
>>>> 239 reports added
>>> This is a lot of additional reports! Are there actually that many bugs in that codebase that need to be fixed?
>> 
>> Out of 239 or in general? The 239 reports mostly come from non-sqlite files.
> 
> Can you provide data just for sqlite?
> 
> For postgresql the number of additional reports also seems very high.
> 
>> In general, for most practical purposes C codebases provide an infinite supply of bugs :P
>> 
>> On a more serious note, I don’t think this is very surprising.
>> The previous emails in this chain have shown that for loops, the analyzer has a very high chance of entering
>> a degenerate behavior where the longest path through the loop will be evaluated first,
>> and a large chunk of the analyzer budget is then spent on going around the loop in circles.
>> 
>> Under new exploration strategy, paths which increase coverage are explored first, and then we can actually find
>> bugs with the budget which is no longer spent going in circles.
>> 
>>> I think we need to manually evaluate these reports.
>> 
>> Yes, I’ve looked through quite a few of them, the false positive ratio seems to be actually getting lower,
>> as the probability of the report being a false positive grows with the path length,
>> and path lengths are getting much shorter.
>> 
>>> Also, we have to make sure uniquing works. Do all regression tests pass?
>> 
>> Yes.
>>>> 1 report removed
>>>> Note on histograms (here and below)
>>>> 
>>>> -> Histograms only show the ratio for same bugs (compared using issue hash),
>>>> that is, if the histogram says “decrease by a factor of three”, it means the new approach finds the *same* bug
>>>> with a path size 1/3d of the original
>>>> -> Histograms omit data points where the path length has remained the same
>>>> (as otherwise they completely dominate the histogram)
>>>> -> Relative histograms are provided as both ratio and logarithm of the ratio.
>>>> Logarithms of the ratio are convenient as they are symmetric in case changes balance out
>>>> (e.g. log(1/2) = -log(2/1))
>>>> 
>>>>> On Jan 30, 2018, at 4:23 PM, George Karpenkov <ekarpenkov at apple.com <mailto:ekarpenkov at apple.com>> wrote:
>>>>> 
>>>>> Preliminary evaluation of a patch which prefers exploring nodes associated with statements which weren’t seen before first:
>>>>> 
>>>>> On openssl:
>>>>> 
>>>>>  - Adds four reports
>>>>>  - Removes four reports 
>>>>>  - Path lengths before: 317, 75, 75, 72, 70, 58, 50, 50, 44, 36, 23, 23, 21, 21, 20, 20, 19, 19, 19, 19, 18, 18, 18, 16, 15, 15, 15, 14, 13, 13, 12, 11, 11, 9, 7, 7, 6, 4
>>>>>  - Path lengths after: 72, 60, 59, 53, 53, 52, 46, 38, 37, 30, 29, 28, 23, 21, 20, 19, 19, 19, 19, 19, 18, 16, 15, 15, 15, 15, 13, 13, 12, 12, 11, 9, 8, 7, 7, 7, 6, 4
>>>>> 
>>>>> The quality of the added reports seems higher, mainly due to the fact that report length is shorter.
>>>>> 
>>>>> On postgresql:
>>>>> 
>>>>>  - Added 80 reports
>>>>>  - Removed 154 reports
>>>>>    -> Out of those, 72 are reports on the yacc/bison autogenerated files, so whatever the cause is, good thing they are gone
>>>>>  - The overall number of reports is 1188
>>>>>  - Path lengths are lower on overall, but not in such a dramatic way
>>>>>  - For many reports, I am quite confused as to why they got removed
>>>>> 
>>>>> On sqlite:
>>>>> 
>>>>>  - 7 inserted, 7 removed
>>>>> 
>>>>>> On Jan 30, 2018, at 1:10 PM, Artem Dergachev <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 30/01/2018 12:40 PM, Gábor Horváth via cfe-dev wrote:
>>>>>>> Hi George, Artem,
>>>>>>> 
>>>>>>> I am glad that you are looking into this problem!
>>>>>>> 
>>>>>>> On 30 January 2018 at 01:12, George Karpenkov via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> <mailto:cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>> wrote:
>>>>>>> 
>>>>>>>    Hi All,
>>>>>>> 
>>>>>>>    I was investigating recently bug reports with very long analyzer
>>>>>>>    paths (more than a few hundred nodes).
>>>>>>>    In many of such cases the path is long for no good reason: namely,
>>>>>>>    the analyzer would go 3 times around the loop before
>>>>>>>    going further.
>>>>>>>    The issue is surprisingly common, and it was exacerbated with a
>>>>>>>    recent bump of analyzer thresholds.
>>>>>>> 
>>>>>>>    The problem is reproduced on the following file:
>>>>>>> 
>>>>>>>    ```
>>>>>>>    extern int coin();
>>>>>>> 
>>>>>>>    int foo() {
>>>>>>>        int *x = 0;
>>>>>>>        while (coin()) {
>>>>>>>            if (coin())
>>>>>>>                return *x;
>>>>>>>        }
>>>>>>>        return 0;
>>>>>>>    }
>>>>>>> 
>>>>>>>    void bar() {
>>>>>>>        while(coin())
>>>>>>>            if (coin())
>>>>>>>                foo();
>>>>>>>    }
>>>>>>>    ```
>>>>>>> 
>>>>>>>    While a shortest path to the error does not loop around, the
>>>>>>>    current version of the analyzer
>>>>>>>    will go around the loop three times before going further.
>>>>>>>    (and we are quite fortunate that the unrolling limit for loops is
>>>>>>>    three, otherwise it would keep going
>>>>>>>    until the unrolling limit is reached).
>>>>>>> 
>>>>>>>    Multiple issues were discovered during the investigation.
>>>>>>> 
>>>>>>>    1. Analyzer queue does not have a concept of priority, and
>>>>>>>    performs a simple DFS by default.
>>>>>>>    Thus if the successor of the if-branch under the loop in “bar"
>>>>>>>    containing the desired destination is generated second,
>>>>>>>    it will never be evaluated until the loop exploration limit is
>>>>>>>    exhausted.
>>>>>>> 
>>>>>>>    2. The previous issue slows down the exploration, but is not
>>>>>>>    enough to get a pathological behavior of ultra-long paths.
>>>>>>>    The second problem is a combination of:
>>>>>>>    a) Block counter is not a part of a node's identity, and node A
>>>>>>>    with a small block counter can be merged into a node B with a
>>>>>>>    large block counter,
>>>>>>>    and the resulting node will have a block counter associated with B.
>>>>>>> 
>>>>>>> 
>>>>>>> Sorry for the questions, just wanted to clarify some things. You mean ExplodedNodes? By merge, you mean the same thing as "caching-out"?
>>>>>> 
>>>>>> Yeah, George notices that if we construct the same ExplodedNode on two different paths that have different block counts, we'd cache-out on the latter path, while the worklist element of the first path would still possess the original block count.
>>>>>> 
>>>>>> Which happens a lot when we're evaluating foo() conservatively in this example.
>>>>>> 
>>>>>> This isn't directly related to our problem though, as i noticed in http://lists.llvm.org/pipermail/cfe-dev/2018-January/056719.html <http://lists.llvm.org/pipermail/cfe-dev/2018-January/056719.html> .
>>>>>> 
>>>>>> 
>>>>>>>    b) The issue in (a) is triggered due to our heuristic to abandon
>>>>>>>    the function’s exploration and switch to conservative evaluation
>>>>>>>    if we are already *inside* the function and the block limit has
>>>>>>>    been reached.
>>>>>>> 
>>>>>>>    Issue (1) combined with (2-b) causes the problematic behavior: the
>>>>>>>    issue is discovered on the longest path first,
>>>>>>>    and by the time the shortest path gets to “bar”, the block limit
>>>>>>>    is already reached, and the switch to conservative evaluation is
>>>>>>>    performed.
>>>>>>> 
>>>>>>>    Thus there are two mitigation strategies currently being evaluated:
>>>>>>> 
>>>>>>>    i) Remove the heuristic in (2-b)
>>>>>>>    ii) Use a priority queue to hold nodes which should be explored;
>>>>>>>    prefer nodes which give new source code coverage over others
>>>>>>>    (or alternatively prefer nodes with least depth of loop stack)
>>>>>>> 
>>>>>>>    Me and Artem have evaluated the option (i) and the results were
>>>>>>>    surprisingly good: some reports disappear, and slightly more
>>>>>>>    reports reappear.
>>>>>>>    The quality of the new reports seems to be slightly better, and I
>>>>>>>    am still trying to figure out exact reasons.
>>>>>>>    I suspect merges resulting from heuristic (2-b) cause us to lose
>>>>>>>    some actually valid reports.
>>>>>>> 
>>>>>>> 
>>>>>>> I also find the results surprising. If you have more information about the reasons please do not forget to follow up this thread. We are curious :)
>>>>>>> 
>>>>>>> 
>>>>>>>    Option (ii) has not been evaluated fully yet, but current
>>>>>>>    experiments show slightly more reports (5-10%), and a radical
>>>>>>>    decline in report lengths
>>>>>>>    (e.g. from 400+ to <100 for largest reports)
>>>>>>> 
>>>>>>>    Are there any thoughts on the matter?
>>>>>>> 
>>>>>>>    Personally I think we should do both (i) and (ii), even if they
>>>>>>>    would shake up the results.
>>>>>>>    - The original idea for heuristics (2-b) was to be able to produce
>>>>>>>    a report even if we are out of budget, but since it actually
>>>>>>>    results in less reports,
>>>>>>>    I think the data does not validate the approach.
>>>>>>> 
>>>>>>>    - Option (ii) is AFAIK how most similar engines work, and should
>>>>>>>    get us much larger coverage (and shorter paths) for the same node
>>>>>>>    budget,
>>>>>>>    even at the cost of log(N) overhead of the priority queue.
>>>>>>>    Moreover, not having the priority queue will bite us later if we
>>>>>>>    ever decide to further
>>>>>>>    increase the analyzer budget or to increase the unroll limit.
>>>>>>> 
>>>>>>> 
>>>>>>> I wonder what will the performance implication be. But I also like the idea of having a priority queue. If we find that we get more and better report
>>>>>>> but also have worse performance, we can also consider reducing the analysis budget slightly.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Gábor
>>>>>>> 
>>>>>>> 
>>>>>>>    George
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    _______________________________________________
>>>>>>>    cfe-dev mailing list
>>>>>>>    cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> <mailto:cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>
>>>>>>>    http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>>>>>    <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180131/05fe9c5a/attachment.html>