[cfe-commits] [Patch]LocationContext Support for CheckedReturn

Mon Apr 18 02:01:44 PDT 2011

Hi Ted,

Sorry for this late reply, i focused on something else these days.

I think it's great to keep a statistical checker working fine in IPA mode,
so if the statistics for a checker can store in the checker instance, i'm
glad to make it happen.

IMO, statistical checker should use (1) "global" data. When i started this
job, as you say the lifetime of a Checker object was bound to the lifetime
of an ExprEngine. But my checker want to gather "global" data, so i creat a
global DenseMap to store them.

Later, after your reply to my patch, i thought maybe it should be done in
two steps, first step in the LocationContexts, second step in some structure
has the lifetime of whole translation unit analysis.

Now, since the Checkers managed by CheckerManager, cam we store the "global"
data rather than the "local" data in the Checker instance?

Then we can focus on how to correctly gather the statistics in specific
checker.

What i am afraid most is to miscount the statistics, this will make our
statistical checker meaningless.

here are two examples, the first one:

int f1();

void f2() {
  f1();
}

int main() {
  f2();
  f2();
}

So there are two f2() calls in main. When we inlining the first f2(), f1()'s
unchecked return count +1; but when we inlining the second f2(), f1()'s
unchecked return count should not change.
The unchecked return count should depends on how many times the programmer
miswrite the code, not how many times a function was called. So it likes a
"textual" problem.

Then comes the second example:

int f1();
int f2(int arg);
int f3(int arg);

int f4(int arg) {
  int i = f1();
  if (arg) {
    f2(i);
    return 0;
  }

  if (i) {
    f3(i);
    return 1;
  }

  return 2;
}

There are several paths in the Example, in some paths i is checked, while
not in others. How do we count the statistics?
IMO, unless in all the paths i was checked, we do the checked count +1;
otherwise the unchecked count +1;

After some more thought, i think statistical checkers(unchecked return) need
not to deactivate IPA. Instead, we can do something like:

When enter a function body in the first time, we do the unchecked return
analysis and update the statistic.
When enter a function body what we have analysised before, we deactivate the
unchecked return checker.

What you say about all this?

在 2011年4月15日 下午2:17，Ted Kremenek <kremenek at apple.com>写道：

> Hi Lei,
>
> Just to follow up, I know this conversation has been a bit roundabout.  I'm
> trying to get a clear design for these kind of checkers, and I'm gradually
> understanding the requirements here.
>
> I'm a bit nervous about caching data in the checker itself, since we've
> recently been changing the lifetime of a Checker object.  Previously the
> lifetime of a Checker object was bound to the lifetime of an ExprEngine, but
> now Checkers are managed by CheckerManager, and can conceptually be used by
> multiple ExprEngines.
>
> Keeping data in LocationContexts is a both hit and miss because they were
> intended to model context-sensitivity.  I can completely buy the argument
> that checkers may wish to record information in LocationContexts that are,
> well, tied to a given location context.  Your statistical checker, however,
> will be making assumptions about the LocationContext that doesn't seem true
> in all cases.
>
> I'm also a bit skeptical that all statistical checks would run in a strict
> "textual" (or one-function-at-a-time, no inling) versus the other extreme of
> whole program IPA.  It seems like different statistical checkers will want
> the freedom to collect, and possibly merge, different kinds of statistics,
> possibly from different LocationContexts.  That said, how statistics are
> collected is strictly a policy decision of the individual checker, but I'd
> be much happier that a statistical checker was fine with working in IPA mode
> as well; it just was smart about what statistics to collect.
>
> In a nutshell, I see three kinds of data the checker may wish to collect:
>
> 1) "global" data, that spans the entirety of all code analyzed by a Checker
> instance
>
> 2) "location context" specific data, that spans the analysis of a given
> *call* to a function
>
> 3) per-path/per-state data, which is stored in the GDM of a GRState
>
> I think your statistical checker will likely one to use (1) or (2).  You
> outlined the restriction of using (2) and assuming that a single
> LocationContext would be made for each analyzed function.  I think that's a
> fairly brittle restriction that's likely to bite us later.  It seems to me
> that if a Checker wants to keep statistics for just a given function
> (regardless of LocationContexts), then that data isn't context-sensitive,
> and should be stored in the Checker instance.  Essentially, the Checker
> object would have a DenseMap mapping from FunctionDecls to a statistics map.
>  If a Checker saw that it already had statistics for a given FunctionDecl,
> it could either collect/merge additional statistics, or decide not to
> collect any additional statistics.  This would allow a statistical checker
> to work fine in the case of multiple LocationContexts per FunctionDecl.
>
> What do you think?
>
> Ted
>
> On Apr 13, 2011, at 7:39 PM, Ted Kremenek wrote:
>
> Hi Lei,
>
> I think there are multiple possible approaches here.
>
> First, I think it's reasonable for us to add a "Generic Data Map"
> mechanism, similar to what we have in GRState, for keeping checker-specific
> data in LocationContexts.  It needs to be a general mechanism, however, so
> that the LocationContext API doesn't have anything relating to a specific
> checker; just a generic way to hold data.  That data also needs to be
> deallocated once the LocationContext object is freed.
>
> That said, a simpler mechanism may just be to keep a DenseMap in the
> statistical checker from LocationContexts -> statistics, and then notify a
> checker when the LocationContext object gets destroyed.  Actually, we can
> possibly implement the Generic Data Map mechanism in this way.
>
> Concerning "inlining", what you call "textual" analysis versus WPA mode
> reduces down to us having a "policy" object that dictates when we inline and
> when we don't.  With a policy object, we could change when and how IPA is
> done, and reuse the same analysis engine.  Different checkers could then
> articulate that they require a specific IPA policy in place (in the case of
> this checker, no IPA).  This idea is only half-formed, but I think there are
> multiple places in the analyzer engine where various "policies" could be
> used to help guide how analysis is done (e.g., how are loops handled, etc.).
>
> If your checker truly won't work in IPA, then it is probably okay to just
> use a DenseMap in the Checker itself, and just invalidate that DenseMap when
> we start analyzing a function.  That seems like the least complicated
> solution, although it prohibits a single instance of the Checker from being
> used simultaneously by multiple ExprEngines.  After thinking about this
> some, I think this is a reasonable approach, just as long as you understand
> the restrictions here and that the statistics really will only work when
> analyzing a single function at a time.
>
> On Apr 11, 2011, at 11:10 PM, 章磊 wrote:
>
> Hi Ted,
>
> I think statistical checker is some different from the checkers we have.
>
> IMO, when doing statistical check, the static analyzer should not work in
> the "whole-program-analysis" mode, but in the "textual" statistic mode.
>
> In this mode, we treat every top level functiondecl (also the inline
> function) has a body as a cell. Then we do symbolic execution in every cell
> to check CheckedReturn state for calls (i.e, a call to foo()), and textually
> count a function decl's (i.e, foo()) CheckedReturn state from all the
> occurrences (calls).
>
> So we need a new action to activate this "textual" mode, and in this mode
> the inline IPA should be deactivated (i think the function summary IPA also
> works). Then every cell will have only one LocationContext, right?
>
> Here is a example to make it clearer.
>
> int f1();
>
> int main() {
>   f2();
>   f3();
> }
>
> void f2() {
>   f1();
> }
>
> void f3() {
>   f2();
>   if (f1())
>     return;
> }
>
> There are two "textual" calls to f1(), the one in f3() is checked and the
> one in f2() is unchecked. But if we do the check in WPA Mode(IMO, as viewed
> from "execute"), there are 3 calls to f1(), one of them is checked and the
> other two are not.
>
> We should do it in the "textual" way, right?
>
> What you say about this?
>
> 2011/4/12 Ted Kremenek <kremenek at apple.com>
>
>> Lei,
>>
>> Just to make this a little clearer, suppose we had:
>>
>> unsigned factorial(unsigned n) {
>>  if (n == 0)
>>   return 1;
>>  return n * factorial(n - 1);
>> }
>>
>> Currently when we analyze this function, we create only one
>> LocationContext.  However, one way to do IPA is to due inlining of function
>> calls.  In this case, we might create a second LocationContext during the
>> first recursive call to factorial.  If we wanted to further inline, we might
>> create a second LocationContext, etc., each one capturing the "context" of
>> the recursive call and simulating an abstract stack frame (obviously
>> bounding the amount of LocationContexts we create).
>>
>> On Apr 11, 2011, at 10:53 AM, Ted Kremenek wrote:
>>
>> > I thought about this some more, and using the LocationContext might be a
>> reasonable place to put checker-specific data that represents "global"
>> information (i.e not specific to a GRState) but also limited in scope (i.e.
>> the data is limited to the scope of analyzing a given *call* to a function).
>>  Once we support analysis inlining, it will be possible for multiple
>> LocationContext objects to be around at the once for the same function.  Are
>> the statistics you are interested in specific to a given LocationContext?
>> >
>> > On Apr 11, 2011, at 10:19 AM, Ted Kremenek <kremenek at apple.com> wrote:
>> >
>> >> Hi Lei,
>> >>
>> >> LocationContext should not contain any checker-specific data.  It is
>> only intended to model context-sensitivity (i.e., it simulates an abstract
>> stack frame, or "location" in an abstract call chain).
>> >>
>> >> What are you trying to do?
>> >>
>> >> Ted
>> >>
>> >> On Apr 10, 2011, at 8:26 PM, 章磊 wrote:
>> >>
>> >>> Hi Clang,
>> >>>
>> >>> This patch add a DenseMap to record the CheckedReturn count for a
>> functiondecl(fielddecl, vardecl as function pointer) in LocationContext.
>> >>>
>> >>> ImmutableMap seems not alright here, is DenseMap ok?
>> >>>
>> >>> And how to make it checker-specific?
>> >>>
>> >>> This patch is preparation for statistical UncheckedRenturn checker.
>> >>> I'll appreciate it if there are any advice about this patch.
>> >>>
>> >>> --
>> >>> Best regards!
>> >>>
>> >>> Lei Zhang
>> >>> <CRResultMap.patch>_______________________________________________
>> >>> cfe-commits mailing list
>> >>> cfe-commits at cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-commits mailing list
>> >> cfe-commits at cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>>
>
>
> --
> Best regards!
>
> Lei Zhang
>
>
>
>

-- 
Best regards!

Lei Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110418/278c8615/attachment.html>