[cfe-dev] cfe-dev Digest, Vol 47, Issue 60

Fri May 20 09:57:39 PDT 2011

Ted,
          very interesting...

when I worked at Sun Microsystems, I invented a generalization to the  
traditional Iterative-Dataflow-Analysis
algorithm that might provide a useful way to select within the memory- 
verses-accuracy tradeoff space.

-Peter Lawrence.

On May 19, 2011, at 12:28 PM, cfe-dev-request at cs.uiuc.edu wrote:

>
> Hi Lei,
>
> Running out of memory is a real problem that needs a general  
> solution.  I have been mulling this over myself, and have a few  
> ideas that match with #2 and #3.
>
> Before I get into those, I think it is worth pondering the fact  
> that unless we force the path exploration to combine paths, there  
> will always be cases where we don't exhaustively explore all  
> paths.  For example, consider the kind of dataflow analysis done in  
> many compiler analyses: paths are always merged at confluence  
> points, guaranteeing convergence and bounding memory usage.  The  
> path-sensitive analysis doesn't force this kind of convergence  
> (although paths converge when the states are the same), so if you  
> are interested in forcing convergence I think we'll need something  
> else in the analyzer that isn't there right now.
>
> Let's talk about #2.  Essentially, the wordlist algorithm (which  
> currently is either BFS or DFS) is all about prioritizing what part  
> of the state space gets explored first. My goal is that we  
> eventually have other intelligent algorithms besides BFS and DFS,  
> that allow us to exploit even more information.  Thus, switching  
> between BFS and DFS to solve this problem isn't the answer.   
> Switching between those should help if, say, you want to bound the  
> amount of work done by the analyzer (which we do) and want to  
> prioritize that some types of paths are prioritized over others.   
> This may help you find different kinds of bugs more effectively.   
> For "all paths" checkers, like yours, BFS will tend to cover paths  
> that end sooner, but it won't explore paths that require going  
> deeper into the state space (like DFS does).  This will directly  
> impact the results you get from your checker, but those are just  
> tradeoffs in the results.  It doesn't actually solve the  
> scalability problem.  If th!
>  e number of paths to explore are too costly to enumerate, we are  
> going to run into memory problems regardless.  Put another way, DFS  
> and BFS are just enumerating paths in a different order.  The cost  
> to enumerate all of them is the same.
>
> To tackle memory scalability, the direction I have been gradually  
> pushing the analyzer is to have the ability to gradually reclaim  
> ExplodedNodes  (and states) as the graph grows larger.  It use to  
> be that we would just generate GRStates, many which would just get  
> thrown away and not included in ExplodedNodes, and not reclaim that  
> memory until we reclaimed all the memory for an analysis.  Now  
> GRStates are reference counted, so we can reclaim some of the  
> memory more eagerly.  To take this further, we can possibly prune  
> out parts of the ExplodedGraph as it grows if we can deem those  
> parts "uninteresting."  I think this is very doable, but it's a lot  
> trickier than it sounds, and it requires consulting which  
> ExplodedNodes are currently referenced by BugReports.  If we can  
> identify paths that are essentially uninteresting, nor are deemed  
> critical by a checker, then we can potentially throw away those  
> nodes.  Note that when we do this, we are throwing away analysis  
> state.  If we th!
>  row away a path, only to explore another path that would have  
> merged with the discarded path, we are going to end up analyzing  
> the same path again.  That's an inherit time-space tradeoff.
>
> So, I think the solution to your problem requires a combination of  
> several approaches:
>
> a) Make path analysis more scalable in general by more eagerly  
> reclaiming memory when the ExplodedGraph starts to get large.  This  
> is essentially a "garbage collection" of uninteresting paths.
>
> b) Because (a) doesn't completely "solve" the path explosion  
> problem (i.e., it doesn't guarantee that all paths are explored),  
> if exploring all paths matters to you we will need to devise a  
> strategy where we can (optionally) merge paths.  This requires  
> being able to define a merge operation for everything in GRState,  
> including checker-specific state.  It will then be up to some  
> policy to decide when the merge takes place, as merging too  
> frequently will cause us to discard too much information (i.e., it  
> basically reduces to a flow-sensitive analysis).
>
> Ted
> -------------- next part --------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110520/a752195d/attachment.html>