[cfe-dev] [analyzer] Whole Program Analysis - Full Call Graph

Tue Dec 1 08:29:02 PST 2015

On 12/01/2015 03:23 PM, Aleksei Sidorin via cfe-dev wrote:
> Hello Phil,
>
> I met this problem while implementing inter-unit analysis
> proof-of-concept for CSA. I'll describe my solution here. As I
> understand, this problem cannot be solved inside CSA, so I used
> multi-step approach.
>
> First, for all translation units we should collect a list of 'required'
> and 'exported' functions. To build a list of required declarations, we
> iterate over AST to find CallExprs that use callee decl without body. To
> build a list of exported declarations we just dump all the functions
> that are visible externally.
>
> I used a simple text signature of function as
> mangled_name at target_triple_arch for dumping. I used additional arch mark
> to distinguish between functions in multi-arch builds.
>
> Note that some function may be defined in a headers. One should not
> merge them because they may have different bodies due to macro
> expansion. These calls are local and the approach below should solve
> this problem.
>
> After this, we have two lists. List of required functions is just a list
> of signatures; list of external functions is a list of entries looking
> like a map: function_signature::file_name at target_triple_arch. Final
> export map contains items from exported functions whose key was listed
> in required functions. Resulting mapping is an inter-unit call graph.

You might also want to track linker calls and then model static/dynamic 
linking when matching symbols to improve precision and avoid collisions.

> You may also need to use C++ mangling-style for C if you build multiple
> projects that may contain functions with the same name. However, if you
> need to do this, you may need more complicated approach.
>
> You can take a look at the code. This code also dumps local calls but
> marks as local explicitly so it may be used to build a whole program
> call graph. You can find our github repo at
> https://github.com/haoNoQ/clang/tree/summary-ipa-draft. See
> tools/clang-func-mapping
> (https://github.com/haoNoQ/clang/blob/summary-ipa-draft/tools/clang-func-mapping/ClangFnMapGen.cpp)
> and tools/scan-build/xtu-analyze.py
> (https://github.com/haoNoQ/clang/blob/summary-ipa-draft/tools/scan-build/xtu-analyze.py)
> for some code.
>
> Hope it will help.
>
>> Hello cfe-dev,
>>
>> I am interested in researching possible static analyzer schemes on whole
>> program states. Initially I would like to build a whole program call
>> graph. I plan to develop program wide checkers for calls to a
>> proprietary SDK. To begin with these checkers would rely on simple call
>> graph/AST based analysis rather than ‘full’ static analysis. These
>> checkers would include items like simple flow analysis of SDK calls that
>> should occur in a particular order (e.g. open()/read()/write()/close()).
>> I would be very grateful if anyone can offer any guidance as to how to
>> achieve this.
>>
>> My initial thoughts are to use the current AST call graph feature within
>> clang. I would create a checker that builds and serialises the call
>> graph to a file for each source file. I plan to add this serialisation
>> to the clang CallGraph class along the lines of the current AST
>> serialisation. The available serialised call graph files would
>> subsequently be read and a merged, creating a full program graph  (or
>> partial program graph if the whole program has not been serialised).
>>
>> I have a couple of questions regarding development style:
>>
>> a) Would the clang CallGraph class be the best place for the
>> serialisation, or would it better to place it in the checker(s)?
>>
>> b) For the read and merge process, should this be an external utility or
>> use the regular clang executable?
>>
>> Any feedback most welcome. Thank you for your time.
>