[cfe-dev] [analyzer] Summary IPA thoughts

Thu Mar 31 04:37:14 PDT 2016

Hi!

On 21 October 2015 at 13:50, hao/NoQ via cfe-dev <cfe-dev at lists.llvm.org>
wrote:

>  Dear All,
>
> We down there (mostly me and Alexey Sidorin, whom you might have also
> seen here in the lists), after some digging into the analyzer code,
> have coded a working proof-of-concept draft of summary-based
> inter-procedural analysis implementation for the static analyzer. The
> code is still too rough and dirty to publish as a review, however, as
> tl;dr of the current status, we reached the point where:
>
>   (good.1) The implementation doesn't crash, on quite large test
> codebases, eg. Android;
>   (good.2) It also doesn't seem to cause any noticable increase in
> false positive rate;
>   (good.3) Path-sensitive bug reports are constructed more or less
> correctly.
>
> Additionally, the implementation is not really extremely complicated
> or hard-wired together, so many things can be re-discussed and
> re-thought.
>
> The bad parts, still many, but hopefully some of them would get sorted
> out in time:
>
>   (bad.1) Doesn't yet offer as much performance gain, in
> bugs-per-second, as we initially expected;
>   (bad.2) Requires significant support on checker side, unlike
> inlining, and most official checkers were not yet updated, instead we
> were tweaking our own checkers, yet the method seems universal enough;
>   (bad.3) Could use some refactoring and also needs rebase on the
> latest clang, cause we diverged quite a bit, before we publish;
>   (bad.4) Certain features are not yet supported:
>     - Recursion is not yet modeled correctly;
>     - Taint analysis is not yet supported;
>     - Only default implementations of Store Manager and Constraint
> Manager are supported;
>     - Objective-C is not yet supported (unfortunately, none of us
> understands it well enough).
>
> Now, the design we implemented as a quick proof-of-concept is roughly
> as follows:
>   (a) Analyze the function without any context, as top-level;
>   (b) Save its ExplodedGraph, which *is* *the* summary of the function;
>   (c) When modeling the call, take the final states in leaf nodes of
> its graph, and
>     (c.i) Rename the symbolic values inside them according to the context;
>     (c.ii) Merge the renamed range constraints into the current state;
> (c.iii) Merge the GDM into the current state, with inevitable help
> from checkers;
>     (c.iv) Throw bug reports if a bug was revealed while applying the
> summary;
>     (c.v) Merge the renamed store bindings into the current state,
> including invalidation markers.
>
> Step (a): We conduct a normal symbolic execution, as usual. When we
> encounter a call, we pause current analysis until we collect the
> summary of the callee function. If we encounter recursion, we go for
> conservative evaluation. If normal analysis exceeds node limit, we
> discard summary and evaluate the function conservatively.
> Additionally, we add a few program state traits: most importantly,
> when SymbolReaper destroys a symbol, we need to keep its range in a
> separate trait, because it's a valuable piece of info on function
> branch reachibility conditions, which we want to have reflected in the
> final node.
>
> Step (b): We simply don't delete the ExprEngine object, so that all
> entities, such as States and SVals, had no stale pointers to various
> managers and contexts inside them. Storing the whole graph, rather
> than only leaf states, has a single purpose of reconstructing the
> path-sensitive bug report; it's not useful for anything else. Of
> course, it still offers a lot of room for memory optimizations (clean
> unnecesary nodes, unnecessary program state traits, and that sort of
> stuff, or even delete the graph and re-construct it only if we need to
> actually throw the report).
>
> Step (c): In ExplodedGraph of the caller, we create a node with a
> special program point called CallSummaryPreApply. Out of this node, we
> create new nodes for each reachable branch of the summary, that is,
> for every leaf node of the summary graph which is reachable in current
> context. Reachability is determined on step (c.ii). After a few
> technical intermediate transitions, we reach another node, program
> point of which is CallSummaryPostApply.
>
> Step (c.i): So far, we did well without complete alpha-renaming
> support. Instead, all we needed was to recursively scan through the
> structure of the SVal and replace context-dependent stuff stored
> deeply inside them with actual stuff, eg. SymbolRegionValue of a
> formal parameter with actual argument value. Hence, to avoid confusion
> with full-featured alpha-renaming, we called this "actualization";
> it'd take extra work to see if we can actually generalize it to create
> a good full-featured alpha-renaming mechanism (replace arbitrary
> values with arbitrary values inside the whole states). This procedure
> is one of the most complicated parts of our work, as it needs to
> support the whole SVal hierarchy, and its implementation is also
> already quite isolated. For SymbolMetadata, we ask checkers to perform
> actualization through a callback, which seems reasonable. For
> SymbolConjured, we fake the block counts in order to avoid accidental
> collisions with existing symbols. Also, actualization isn't really a
> separate step in the process; it's performed regularly for all
> symbolic values on all steps while applying a summary.
>

I was thinking about the actualization process that you are describing
here, and I think there might be some issues with it. As far as I remember,
when you analyze a function without context the analyzer assumes that none
of the arguments are aliased. So only those paths will be available in the
ExpoldedGraph that does not imply aliasing. This way you can both loose
coverage or take impossible branches. What do you think?

>
> Step (c.ii): Actualize constraint keys, intersect "new" and "old"
> ranges, throw away the branch if intersection is empty, otherwise
> constrain the actual symbol with the intersected range. As we
> mentioned on step (a), this requires a separate pass through reaped
> symbols which are stored separately.
>
> Step (c.iii): Call the checker callback, "eval::SummaryApply", in
> which checkers receive the final state in callee and the current state
> in caller, and figure out how to move significant stuff from the
> former to the latter. Checkers may add transitions or split states on
> this phase. Most of the time, they do three things: - Carry
> information that changed in the caller to the callee; - Check stuff in
> which they were uncertain during callee analysis, maybe they'll be
> sure now that they have a context (delayed checks); - Carry delayed
> checks through to the callee if they could not be resolved anyway.
> Note that delayed checks would need to be manually stored in the state
> so that checkers could note them during apply. So overally, if the
> checker has ~5 or so GDM traits, implementing a summary-apply
> procedure may be pretty complicated.
>
> Step (c.iv): Path-sensitive bug reports are constructed by
> frankensteining the report path from different pieces of different
> exploded graphs, be it a report that originates from the callee (and
> hence thrown on step (c.iii) from eval::SummaryApply), or an
> intermediate call that exited before the bug (which may be interesting
> for the bug reporter visitors). The only trouble we currently have is
> the "Assuming..." pieces, which may be redundant, as no assumption was
> made, but rather the context determined the branch.
>
> Step (c.v): Actualize the binding keys and values, overwrite existing
> values. Invalidation, as in writing some SymbolConjured to some memory
> space, is handled similarly. before all bindings. We do this last,
> because it would affect actualization (eg. SymbolRegionValue should
> actualize itself to the value of the region before the call, not after
> the call; quite a sensitive point here, but seems to work this way).
>
> So our current plan, in case nobody minds, would be:
>   (1) Implement an SValVisitor, which may be useful for other
> applications as well, as a separate patch;
>   (2) Try to implement a full-featured alpha-renaming support, which
> we almost have, on top of SValVisitor, probably add some useful
> features to the constraint manager, as a separate patch;
>   (3) Try to port the rest of the summary code, with heavy discussions
> on fixing existing problems, some of which were listed above;
>   (4) Dunno, maybe something else shows up, probably more ways to
> split the patch into smaller pieces.
>
> We'd be extremely greatful for [and promise to listen carefully to]
> any feedback!
>
> Thanks in advance, Artem [and Alexey].
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160331/1f89a57c/attachment.html>