[cfe-dev] Process in UncheckedReturn

Wed Jun 8 20:06:28 PDT 2011

On Jun 8, 2011, at 7:08 PM, 章磊 wrote:

> Hi Ted,
> 
> 在 2011年6月9日 上午7:38，Ted Kremenek <kremenek at apple.com>写道：
> On Jun 6, 2011, at 8:14 PM, 章磊 wrote:
> 
>> Hi Ted,
>> 
>> Thank you very much for your reply and sorry for misunderstanding the word "aggregate".
>> 
>> I thought that we can do aggregation only when a path is fully analyzed, so when you mention the aggregation in checkDeadSymbols, i didn't get what you mean to do.
>> 
>> I think this counting strategy is great, but i have a little problem: 
>> 
>> IMO, it's a trade-off between counting statistics along individual paths and counting statistics in your way. When there is no path explosion, they give the same counts. But when the path explodes, they works different.
>> 
>> In the first way, we will fully analyze part of all the paths and we can give the accurate results of those paths. But we will leave out some paths because of the step limit and the analyzer should work in a particular way.
>> In the second way, we do not fully analyze a path, we will call checkEndPath while we are not meet the actual path end, so we can not guarantee the states we are counting are totally right(the aggregation in checkDeadSymbols is ok, since all the dead symbols will not be checked; but in checkEndPath some symbols may be checked after this "step limited end path"). And could this results be the "representative evidence of behavioral trends" ?
>> 
>> If the second way is ok, and since we should not force the analyzer to exploring the state space in a particular way, i will do it in the second way.
> 
> Hi Lei,
> 
> Let me focus around this particular comment:
> 
>>> In the first way, we will fully analyze part of all the paths and we can give the accurate results of those paths. 
> 
> I think this is a misconception.  We never will fully analyze part of all the paths in a manner where each count in our statistics can be considered to have come from a single path.  Consider:
> 
> 
>   void foo() {
>     int x = bar();
>     if (x > 0)
>      x++;
>     else
>      x--;
>    // never use 'x' afterwards
>    ...
>  }
> 
> The exact operations here are not important.  What I want to stress is that, regardless of path exploration, we will merge the two paths after the if..else... because 'x' is not used afterwards.  When a variable is no longer live, the analysis engine prunes that variable binding, and any associated symbolic constraints, from the state.  In the example above, this means that the GRState along the two paths (one for the true branch, the other for the false branch) will be the same.  At this point, the paths are *merged* in the ExplodedGraph, because analyzing either of them from that point on would produce the same analysis results.  This is a key optimization in the analyzer engine to reduce the exponential growth from path exploration.
> 
> My point is that "counting paths" just doesn't make sense.  Paths get merged all the time, or we may come up with other ways to cut out paths from the ExplodedGraph that don't need to be explicitly represented or explicitly explored.  Definitely this impacts how we count statistics, but I'm suggesting we pick a scheme that seems less sensitive to how we enumerate paths, and more on the behavior we observed from those paths.
>  
>  
> Thank you very much to expound this and sorry for taking this long time to get your idea.

No need to apologize.  This is something very worth discussing.

BTW, my opinion is just my opinion.  You are free to experiment with this any way you wish.

>  
> The motivation behind my suggesting for how to aggregate statistics was simple: forgo any notion that we are enumerating paths, and instead accumulate distinct observations where behavior was consistent or different.  Here I actually try and marginalize out the effects of path explosion or merging, and instead focus on the behavior at particular program points.
> 
>> In the second way, we do not fully analyze a path, we will call checkEndPath while we are not meet the actual path end, so we can not guarantee the states we are counting are totally right(the aggregation in checkDeadSymbols is ok, since all the dead symbols will not be checked; but in checkEndPath some symbols may be checked after this "step limited end path").
> 
> I don't fully understand this comment.  All I am suggesting is that aggregate statistics when a symbol no longer is being tracked.  When a symbol "dies" early, we do that aggregation in checkDeadSymbols.  If a symbol lives to the end of a path, when do it in checkEndPath.
> 
> Can you clarify, precisely, what you mean by 'but in checkEndPath some symbols may be checked after this "step limited end path"'?  checkEndPath will never be called when we are step limited.  It is only called when we actually hit the end of a function.  When we are step limited, we simply stop analyzing a path.
>  
> Sorry, my bad, I mistaken the checkEndAnalysis as checkEndPath.
>  
> 
> Cheers,
> Ted
> 
> I still corcern about that how the second way really works, so i will take this way and do some test.

Absolutely.  With these kinds of statistical tricks, experimentation is really key to finding what works best.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110608/673ba06a/attachment.html>