[cfe-dev] [analyzer] Summary IPA thoughts

Wed Oct 21 06:41:46 PDT 2015

Awesome!

So this implementation doesn't give the performance gain you were expecting, but what kind of performance gain does it give?
Do you get roughly the same or better quality of analysis?  I would measure this in terms of number of bugs from before vs. number of bugs after, but maybe there is a better metric for quality of analysis.
How is memory usage affected?  Based on your design, I suspect that you are using more memory, as you aren't throwing out engines and graphs after each function, but I don't know that for sure.

Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-----Original Message-----
From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of hao/NoQ via cfe-dev
Sent: Wednesday, October 21, 2015 6:50 AM
To: cfe-dev at lists.llvm.org
Subject: [cfe-dev] [analyzer] Summary IPA thoughts

 Dear All,

We down there (mostly me and Alexey Sidorin, whom you might have also seen here in the lists), after some digging into the analyzer code, have coded a working proof-of-concept draft of summary-based inter-procedural analysis implementation for the static analyzer. The code is still too rough and dirty to publish as a review, however, as tl;dr of the current status, we reached the point where:

  (good.1) The implementation doesn't crash, on quite large test codebases, eg. Android;
  (good.2) It also doesn't seem to cause any noticable increase in false positive rate;
  (good.3) Path-sensitive bug reports are constructed more or less correctly.

Additionally, the implementation is not really extremely complicated or hard-wired together, so many things can be re-discussed and re-thought.

The bad parts, still many, but hopefully some of them would get sorted out in time:

  (bad.1) Doesn't yet offer as much performance gain, in bugs-per-second, as we initially expected;
  (bad.2) Requires significant support on checker side, unlike inlining, and most official checkers were not yet updated, instead we were tweaking our own checkers, yet the method seems universal enough;
  (bad.3) Could use some refactoring and also needs rebase on the latest clang, cause we diverged quite a bit, before we publish;
  (bad.4) Certain features are not yet supported:
    - Recursion is not yet modeled correctly;
    - Taint analysis is not yet supported;
    - Only default implementations of Store Manager and Constraint Manager are supported;
    - Objective-C is not yet supported (unfortunately, none of us understands it well enough).

Now, the design we implemented as a quick proof-of-concept is roughly as follows:
  (a) Analyze the function without any context, as top-level;
  (b) Save its ExplodedGraph, which *is* *the* summary of the function;
  (c) When modeling the call, take the final states in leaf nodes of its graph, and
    (c.i) Rename the symbolic values inside them according to the context;
    (c.ii) Merge the renamed range constraints into the current state;
(c.iii) Merge the GDM into the current state, with inevitable help from checkers;
    (c.iv) Throw bug reports if a bug was revealed while applying the summary;
    (c.v) Merge the renamed store bindings into the current state, including invalidation markers.

Step (a): We conduct a normal symbolic execution, as usual. When we encounter a call, we pause current analysis until we collect the summary of the callee function. If we encounter recursion, we go for conservative evaluation. If normal analysis exceeds node limit, we discard summary and evaluate the function conservatively.
Additionally, we add a few program state traits: most importantly, when SymbolReaper destroys a symbol, we need to keep its range in a separate trait, because it's a valuable piece of info on function branch reachibility conditions, which we want to have reflected in the final node.

Step (b): We simply don't delete the ExprEngine object, so that all entities, such as States and SVals, had no stale pointers to various managers and contexts inside them. Storing the whole graph, rather than only leaf states, has a single purpose of reconstructing the path-sensitive bug report; it's not useful for anything else. Of course, it still offers a lot of room for memory optimizations (clean unnecesary nodes, unnecessary program state traits, and that sort of stuff, or even delete the graph and re-construct it only if we need to actually throw the report).

Step (c): In ExplodedGraph of the caller, we create a node with a special program point called CallSummaryPreApply. Out of this node, we create new nodes for each reachable branch of the summary, that is, for every leaf node of the summary graph which is reachable in current context. Reachability is determined on step (c.ii). After a few technical intermediate transitions, we reach another node, program point of which is CallSummaryPostApply.

Step (c.i): So far, we did well without complete alpha-renaming support. Instead, all we needed was to recursively scan through the structure of the SVal and replace context-dependent stuff stored deeply inside them with actual stuff, eg. SymbolRegionValue of a formal parameter with actual argument value. Hence, to avoid confusion with full-featured alpha-renaming, we called this "actualization"; it'd take extra work to see if we can actually generalize it to create a good full-featured alpha-renaming mechanism (replace arbitrary values with arbitrary values inside the whole states). This procedure is one of the most complicated parts of our work, as it needs to support the whole SVal hierarchy, and its implementation is also already quite isolated. For SymbolMetadata, we ask checkers to perform actualization through a callback, which seems reasonable. For SymbolConjured, we fake the block counts in order to avoid accidental collisions with existing symbols. Also, actualization isn't really a separate step in the process; it's performed regularly for all symbolic values on all steps while applying a summary.

Step (c.ii): Actualize constraint keys, intersect "new" and "old"
ranges, throw away the branch if intersection is empty, otherwise constrain the actual symbol with the intersected range. As we mentioned on step (a), this requires a separate pass through reaped symbols which are stored separately.

Step (c.iii): Call the checker callback, "eval::SummaryApply", in which checkers receive the final state in callee and the current state in caller, and figure out how to move significant stuff from the former to the latter. Checkers may add transitions or split states on this phase. Most of the time, they do three things: - Carry information that changed in the caller to the callee; - Check stuff in which they were uncertain during callee analysis, maybe they'll be sure now that they have a context (delayed checks); - Carry delayed checks through to the callee if they could not be resolved anyway.
Note that delayed checks would need to be manually stored in the state so that checkers could note them during apply. So overally, if the checker has ~5 or so GDM traits, implementing a summary-apply procedure may be pretty complicated.

Step (c.iv): Path-sensitive bug reports are constructed by frankensteining the report path from different pieces of different exploded graphs, be it a report that originates from the callee (and hence thrown on step (c.iii) from eval::SummaryApply), or an intermediate call that exited before the bug (which may be interesting for the bug reporter visitors). The only trouble we currently have is the "Assuming..." pieces, which may be redundant, as no assumption was made, but rather the context determined the branch.

Step (c.v): Actualize the binding keys and values, overwrite existing values. Invalidation, as in writing some SymbolConjured to some memory space, is handled similarly. before all bindings. We do this last, because it would affect actualization (eg. SymbolRegionValue should actualize itself to the value of the region before the call, not after the call; quite a sensitive point here, but seems to work this way).

So our current plan, in case nobody minds, would be:
  (1) Implement an SValVisitor, which may be useful for other applications as well, as a separate patch;
  (2) Try to implement a full-featured alpha-renaming support, which we almost have, on top of SValVisitor, probably add some useful features to the constraint manager, as a separate patch;
  (3) Try to port the rest of the summary code, with heavy discussions on fixing existing problems, some of which were listed above;
  (4) Dunno, maybe something else shows up, probably more ways to split the patch into smaller pieces.

We'd be extremely greatful for [and promise to listen carefully to] any feedback!

Thanks in advance, Artem [and Alexey].
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev