[cfe-dev] Process in UncheckedReturn

Thu Sep 22 01:42:13 PDT 2011

Hi Ted,

Sorry for this late reply.

I modified the UncheckedReturnChecker as you suggested, and test it on some
small code base(larbin, cgicc, skia, vcg...). It gives some bugs when i
simply set the unchecked ratio to 75%. It seems OK but still has some
problems can not fix right now.

1. Because lack of taint analysis in the engine, we can not pass some
checkedreturn state through SVals. Here are some examples:

bool b = f() == 0;

In this example, we can not pass the checked state(f() was checked) to decl
b right now.

int a = f();
if (-a) {
  ...
}
return;

In this example, decl "a" has a SVal bind to it. I use this SVal as a symbol
and record the checked state in the form DenseMap<symbol, bool> . When our
engine visit the unaryoperator "-a", it will remove the binding form SVal to
the decl "a" because there is no further use of decl "a". So later when the
engine visit the branch condition, there is no binding which i need to get
the SVal as a symbol.

Right now, i leave these cases behind and they will make some miscount.

2. Bug report and count strategy need to be improved.

What you think about these?

在 2011年6月9日 上午11:06，Ted Kremenek <kremenek at apple.com>写道：

>
> On Jun 8, 2011, at 7:08 PM, 章磊 wrote:
>
> Hi Ted,
>
> 在 2011年6月9日 上午7:38，Ted Kremenek <kremenek at apple.com>写道：
>
>>   On Jun 6, 2011, at 8:14 PM, 章磊 wrote:
>>
>> Hi Ted,
>>
>> Thank you very much for your reply and sorry for misunderstanding the word
>> "aggregate".
>>
>> I thought that we can do aggregation only when a path is fully analyzed,
>> so when you mention the aggregation in checkDeadSymbols, i didn't get what
>> you mean to do.
>>
>> I think this counting strategy is great, but i have a little problem:
>>
>> IMO, it's a trade-off between counting statistics along individual paths
>> and counting statistics in your way. When there is no path explosion, they
>> give the same counts. But when the path explodes, they works different.
>>
>> In the first way, we will fully analyze part of all the paths and we can
>> give the accurate results of those paths. But we will leave out some paths
>> because of the step limit and the analyzer should work in a particular way.
>> In the second way, we do not fully analyze a path, we will call
>> checkEndPath while we are not meet the actual path end, so we can not
>> guarantee the states we are counting are totally right(the aggregation in
>> checkDeadSymbols is ok, since all the dead symbols will not be checked; but
>> in checkEndPath some symbols may be checked after this "step limited end
>> path"). And could this results be the "representative evidence of
>> behavioral trends" ?
>>
>> If the second way is ok, and since we should not force the analyzer to exploring
>> the state space in a particular way, i will do it in the second way.
>>
>>
>> Hi Lei,
>>
>> Let me focus around this particular comment:
>>
>>   In the first way, we will fully analyze part of all the paths and we
>> can give the accurate results of those paths.
>>
>>
>> I think this is a misconception.  We never will fully analyze part of all
>> the paths in a manner where each count in our statistics can be considered
>> to have come from a single path.  Consider:
>>
>>
>>   void foo() {
>>     int x = bar();
>>     if (x > 0)
>>      x++;
>>     else
>>      x--;
>>    // never use 'x' afterwards
>>    ...
>>  }
>>
>> The exact operations here are not important.  What I want to stress is
>> that, regardless of path exploration, we will merge the two paths after the
>> if..else... because 'x' is not used afterwards.  When a variable is no
>> longer live, the analysis engine prunes that variable binding, and any
>> associated symbolic constraints, from the state.  In the example above, this
>> means that the GRState along the two paths (one for the true branch, the
>> other for the false branch) will be the same.  At this point, the paths are
>> *merged* in the ExplodedGraph, because analyzing either of them from that
>> point on would produce the same analysis results.  This is a key
>> optimization in the analyzer engine to reduce the exponential growth from
>> path exploration.
>>
>> My point is that "counting paths" just doesn't make sense.  Paths get
>> merged all the time, or we may come up with other ways to cut out paths from
>> the ExplodedGraph that don't need to be explicitly represented or explicitly
>> explored.  Definitely this impacts how we count statistics, but I'm
>> suggesting we pick a scheme that seems less sensitive to how we enumerate
>> paths, and more on the behavior we observed from those paths.
>>
>>
>
> Thank you very much to expound this and sorry for taking this long time to
> get your idea.
>
>
> No need to apologize.  This is something very worth discussing.
>
> BTW, my opinion is just my opinion.  You are free to experiment with this
> any way you wish.
>
>
>
>
>>  The motivation behind my suggesting for how to aggregate statistics was
>> simple: forgo any notion that we are enumerating paths, and instead
>> accumulate distinct observations where behavior was consistent or different.
>>  Here I actually try and marginalize out the effects of path explosion or
>> merging, and instead focus on the behavior at particular program points.
>>
>>  In the second way, we do not fully analyze a path, we will call
>> checkEndPath while we are not meet the actual path end, so we can not
>> guarantee the states we are counting are totally right(the aggregation in
>> checkDeadSymbols is ok, since all the dead symbols will not be checked; but
>> in checkEndPath some symbols may be checked after this "step limited end
>> path").
>>
>>
>> I don't fully understand this comment.  All I am suggesting is that
>> aggregate statistics when a symbol no longer is being tracked.  When a
>> symbol "dies" early, we do that aggregation in checkDeadSymbols.  If a
>> symbol lives to the end of a path, when do it in checkEndPath.
>>
>> Can you clarify, precisely, what you mean by 'but in checkEndPath some
>> symbols may be checked after this "step limited end path"'?  checkEndPath
>> will never be called when we are step limited.  It is only called when we
>> actually hit the end of a function.  When we are step limited, we simply
>> stop analyzing a path.
>>
>
> Sorry, my bad, I mistaken the checkEndAnalysis as checkEndPath.
>
>
>>
>> Cheers,
>> Ted
>>
>
> I still corcern about that how the second way really works, so i will take
> this way and do some test.
>
>
> Absolutely.  With these kinds of statistical tricks, experimentation is
> really key to finding what works best.
>

-- 
Best regards!

Lei Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110922/027d7f07/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UncheckedReturn.patch
Type: application/octet-stream
Size: 20665 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110922/027d7f07/attachment.obj>