[cfe-dev] Inter procedural analysis across translation unit in clang static analyzer

Karthik Bhat blitz.opensource at gmail.com
Fri Feb 1 04:58:22 PST 2013


Hi Anna,

Thanks for the clarification.

For IPA across translation unit, to start off for my project i do not
require full fledged functionality but few basic ones such as divByZero
Checker, memory leak check, null deference.

Regarding summary based approach i was planning to take
the following approach -

1) Run LibTooling and collect information for all functions across
translation unit. The Summary can have things like -

Struct Summary
{

  char* MangledFnName;
  bool doesReturnZeroConst;  //Summary to check null deref and div by zero
  bool doesReturnMallocedMemory;  //Check for potential memory leak
  bool doesFreeParam(int i);  //Check for free of param i
  bool doesDivideParam(int i); // To check if div by zero

  .....
  .....
}

Maintain a Hash for MangledFnName and it's Summary and dump to a file after
each translation unit analysis. ( Summary )

2) Run Static Analyzer normally. If at any point we are not able to inline
a call load Summary file and get corresponding function summary using
mangledName as key.
Do the required state changes as per the summary of the function and report
issues if found.

For example if we have -
 k = fun() and summary of fun say doesReturnZeroConst then assign
k ZeroConst and continue analysis to detect divide by zero.

I would like to get few clarifications from experts before diving into the
coding part -

1) Does the above approach (i.e. running 1st pass to collect summary for
all translation units and dumping to a file and later do our normal static
analysis and use summary were ever we are not able to inline.)
 seem feasible or are there any design issues i'm overlooking?

2) Can i extract the required information(i.e. the rough  summary mentioned
above) from current LibTooling infrastructure? Or should these be collected
these in some other manner?

3) The above example(div by zero) is a bit simple but is this kind of
summary enough to support memory leak detection, resource leak and
null deference?


Thanks
Karthik




On Wed, Jan 23, 2013 at 8:10 AM, Anna Zaks <ganna at apple.com> wrote:

>
> On Jan 17, 2013, at 11:31 AM, Karthik Bhat <blitz.opensource at gmail.com>
> wrote:
>
> Thanks Anna for the prompt replies. Will go through those documents.
>
> Sure will contribute the checkers after refining it a bit. It is a simple
> SocketStream checker (checks socket open/close issues etc)  inspired by
> SimpleStream checker from Building a Checker in 24 Hours video
>
> I had one more query currently we terminate analysis of a path as soon as
> we detect  a critical bug. (for e.g.) in the code -
>
> int one =1;
> int zero = 0;
>
> int k = one/zero;
>
> // Some other bugs
>
> In the above code we are generating a sink node as soon as we see a
> divbyzero and do not analyse other bugs along the path.
> Since static analysis of large code usually takes a lot of time is it not
> a good idea to report as much bugs as possible on a particular path?
>
> Good point.
>
> For e.g in the above code instead of creating a sink node for divide by
> zero can we assign value for k as unknown and continue simulation?
>
> I tried the same by modifying evalBinOp function and creating a transition
> node instead of sink node  -
>
>   if ((op == BO_Div || op == BO_Rem ||op == BO_DivAssign ||op ==
> BO_RemAssign) && rhs.isZeroConstant())  // in case of div by zero assign
> result as unknown and continue execution
>  return UnknownVal();
>
> so that if we have a divide operation with rhs as 0 assign the result as
> unknown and continue simulation to detect other bugs on the path.
>
>
> There are several reasons why this approach might not work well in general.
>
> For example, if you encounter a null pointer dereference(or div by zero),
> report it, and continue the execution, chances are that the same pointer
> would get used again and again. So you would end up reporting multiple
> instances of the same issue. Even if you came up with ways to unique those
> reports, analyzing after a "severe error" could lead to other issues being
> reported. How would one model the state after the error? Recovering from
> errors without loosing state/causing side effects might not always be as
> easy as in this case. (And even here we introduce some inconsistency into
> the analyzer.)
>
> Another argument is that what is optimal for performance depends on the
> workflow in which one is using the analyzer. For example, it might be
> preferable to have a faster turnaround when issues are found rather than
> running the tool until as much as possible is covered. For example, if you
> are analyzing a small project regularly and expect a very fast turn around.
>
> Anna.
>
> Thanks
>
>
> On Fri, Jan 18, 2013 at 12:29 AM, Anna Zaks <ganna at apple.com> wrote:
>
>>
>> On Jan 17, 2013, at 5:33 AM, Karthik Bhat <blitz.opensource at gmail.com>
>> wrote:
>>
>> Thanks Anna.. Can you suggest me some place were i can find document for
>> static analyzer core module. I found many documents for checkers and how to
>> write checkers, but couldn't find document explaining functioning of static
>> analyzer core.
>>
>>
>> Unfortunately, there is not much documentation about the analyzer core
>> yet (code is our documentation). The only additional documentation I know
>> of is in clang/docs/analyzer; specifically, we have the IPA.txt there,
>> which describes how we approach cross function analyzes. You could also
>> search the archives from this list (in particular, emails from Ted
>> Kremenek).
>>
>> The analyzer utilizes the idea of path-sensitive dataflow analysis, which
>> can be tackled with different specific techniques, but they all boil down
>> to trying to compute a set of reachable program states. Our LLVM Dev
>> meeting talk Building a Checker in 24 Hours gives a very high level
>> overview of how it works (http://llvm.org/devmtg/2012-11/). Here are
>> some relevant academic papers, but there are many more papers in the area,
>> and the analyzer is inspired by many of them:
>>   A System and Language for Building System-Specific, Static Analyses
>> (Hallem et al)
>>   Precise interprocedural dataflow analysis via graph reachability (Reps
>> et al)
>>
>> As I had mentioned, cross translation unit analyzes is a huge project;
>> for example, it would most likely take more than a year to complete.
>> However, it can be split into subtasks. There are also many other
>> directions for improving the analyzer core.
>>
>> Please, feel free to ask questions.
>> Anna.
>>
>> Any help appreciated.
>>
>> Thanks
>>
>> On Tue, Jan 15, 2013 at 12:43 AM, Anna Zaks <ganna at apple.com> wrote:
>>
>>>
>>> On Jan 14, 2013, at 1:52 AM, Karthik Bhat <blitz.opensource at gmail.com>
>>> wrote:
>>>
>>> > Hi All,
>>> >
>>> > I was going through clang project and found static analyzer to be a
>>> quite useful tool. I would like to work and contribute on the same. I went
>>> through the code and developed few basic checkers(Socket stream checker
>>> etc) to start with.
>>>
>>
>> It would be great if you plan to contribute those checkers back!
>>
>>  >
>>> > I had a doubt which i wanted to clarify from the community.
>>> >
>>> > If i'm not wrong Clang static tool currently supports only one
>>> translation unit at a time and so inter procedural analysis across
>>> translation unit is not supported.
>>>
>>> That is correct.
>>>
>>> > Is there any plan to support the same in clang static analyzer?
>>>
>>> This is something we would definitely like to address as it is one of
>>> the main missing pieces. I am not sure when we are going to address it.
>>>
>>> > What kind of infrastructure would be required in static analyzer core
>>> to support this feature?
>>>
>>> We have not designed this in detail yet. However, this is going to be a
>>> lot of work. We would probably go with summary based approach, where one
>>> constructs summaries for the analyzed functions; the summaries are then
>>> used when modeling the calls.
>>>
>>> > Will it require detailed understanding of clang front end(AST etc)?
>>> >
>>>
>>> This project would require understanding the analyzer very well.
>>>
>>> > Thanks
>>> > Karthik
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > cfe-dev mailing list
>>> > cfe-dev at cs.uiuc.edu
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130201/243c0cb6/attachment.html>


More information about the cfe-dev mailing list