[cfe-dev] Proposal for statistic UncheckedReturnValue checker

Sun Oct 24 22:15:19 PDT 2010

On Oct 24, 2010, at 7:37 PM, 章磊 wrote:

> Hi clang,
> 
> Now in UncheckedReturnValue checker we check whether some functions drop privileges successfully. It's the most critical case of UncheckedReturnValue. But it seems that UncheckedReturnValue checker should check many other cases. For example, the return values of "malloc", "pthread_mutex_lock".

Hi Lei,

We may wish to extend this check to accept a list of functions whose values need to be checked (and how they should be checked).  This could possibly be provided by the user, or a something we just '#include'.  I disagree that checking the return value of 'malloc' is an invariant that all codebases care about, so some fine-grain control from the user might be necessary in order to make this useful.

> 
> And there may be another aspect that UncheckedReturnValue checker should check. In user defined function, there are some that their return value should always be checked, but we can't figure out whether a return value should be checked without user specification. So here is a proposal for statistic UncheckedReturnValue checker: when most(e.g. 85%) return values of a function are checked, this checker flags the unchecked return values as defects.
> 
> For the second part, IMO we may need some thing like a simple symbol table to store how may return values of a function are checked and also the reference loc(is there any structure i can use, or i should store this in GDM? ). And then do the check and bug report in VisitEndAnalysis.
> 
> How does these sounds?

I'm a huge fan of these kinds of statistical checks (for background, it was a big part of my thesis work in graduate school).  I don't think the approach, however, is going to work without accumulating data across multiple functions (and most likely, multiple files) since your sample size will be way too small to draw statistically significant conclusions.

For example, without a sample size of at least 7 you won't even be able to flag anything, as with 5/6 ≈ 0.83, and 6/7 ≈ 0.857.  The means in order to get a frequency of greater than 0.85 (but less than 1) you will need to see the return value checked at least 7 times within a function.  Except for very large functions you're simply not going to get enough data.  Right now 'VisitEndAnalysis' is only called when a GRExprEngine instance is done analyzing a single function; to properly do this check, you'd probably need to look at the functions analyzed across an entire codebase.

In general, I also think sample size needs to be taken into account to accurately take into account the statistical confidence.  While 7 observations is probably good enough for most cases, it might not always be, and in other cases it might be way more data than you need to flag something as a warning (i.e., in the cases that "coincidence" isn't very likely at all).  By taking into account sample size (either using traditional statistical or Bayesian techniques), one can actually compute an accurate confidence measure of how likely the deviation is truly a bug.

Accurately reasoning about sample size, however, is mostly a matter of refinement.  The immediate problem is collecting the data.  Right now we have no good apparatus for doing whole-program checks, or even checks that aggregate results from analyzing all the functions in a translation unit.  Both are things we want to be able to do, but it requires extending the current checking infrastructure.

Ted