[cfe-dev] [analyzer] Summary IPA thoughts

Wed Oct 21 17:42:38 PDT 2015

> On Oct 21, 2015, at 4:50 AM, hao/NoQ via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> Now, the design we implemented as a quick proof-of-concept is roughly
> as follows:
>  (a) Analyze the function without any context, as top-level;
>  (b) Save its ExplodedGraph, which *is* *the* summary of the function;
>  (c) When modeling the call, take the final states in leaf nodes of
> its graph, and
>    (c.i) Rename the symbolic values inside them according to the context;
>    (c.ii) Merge the renamed range constraints into the current state;
> (c.iii) Merge the GDM into the current state, with inevitable help
> from checkers;
>    (c.iv) Throw bug reports if a bug was revealed while applying the summary;
>    (c.v) Merge the renamed store bindings into the current state,
> including invalidation markers.

One consequence of analyzing the function at the top level is that the summary produced makes no assumptions about context. This means the summary is always applicable (i.e., the precondition holds) but also that applying it may yield many more paths than are actually feasible. That is, viewing the the summary as an implication, it looks like:

    true => PostState1 OR PostState2 OR PostState3 OR … OR PostStateN

If I understand your approach correctly, you mitigate this to some extent by keeping extra range information about dead symbols in the PostState so you can check for contradictions in the actualized PostState.

Did you consider the “functional” (in the sense of Sharir and Pnueli 81) approach that discovers and maintains tables of input-output summaries (i.e., pre-states and post-states) for functions? 

This would allow for more precise summaries. With this approach, you would apply the post-state at a call site when it finds a pre-state that yields a hit in the table — if not it would essentially inline the function and then populate the table with the pre- and post-state from running the inlined functions. 

This results in summaries that, when viewed as an implication, look like:

    (Prestate1 => PostState1_1 OR Postate1_2 OR ... OR PostState1_N)
	AND
    (Prestate2 => PostState2_1 OR Postate2_2 OR ... OR PostState2_M)
	AND   
   …

The precondition used to compute these summaries can be generalized in a variety of ways to increase the chance of a hit — with the most-general precondition being ‘true’.

One nice thing about this approach is that it can be viewed as a performance optimization for inlining, where summaries “cache” the (approximate) result of inlining a function. With this view, you can trade off the memory requirements of summaries in this cache with running time from avoiding inlining. (On the other hand, because the pre-states for these summaries are informed by analysis state at call sites, this approach is harder to use for cross-translation-unit summarization.)

Devin