[cfe-dev] Path explosion Problem

Thu May 19 12:27:55 PDT 2011

On May 19, 2011, at 2:06 AM, 章磊 wrote:

> So here's my problem, if we want to gather path-sensitive statistical infomation, we probably need to analyze all the paths. But the upper problem didn't allow us to do so.
> 
> IMO, there may be several ways overcome this:
> Increase my computer's memory...but i think it may not solve the problem.
> Change the worklist Algorithm form BFS to DFS, and after a path was analyzed, release the memory generated in current path analyze. Is this feasible or useful?
> Or is there any other way to compromise?
> ps:  We should not let clang crashed even if the memory exhausted, right?
> 
> -- 
> Best regards!
> 
> Lei Zhang

Hi Lei,

Running out of memory is a real problem that needs a general solution.  I have been mulling this over myself, and have a few ideas that match with #2 and #3.

Before I get into those, I think it is worth pondering the fact that unless we force the path exploration to combine paths, there will always be cases where we don't exhaustively explore all paths.  For example, consider the kind of dataflow analysis done in many compiler analyses: paths are always merged at confluence points, guaranteeing convergence and bounding memory usage.  The path-sensitive analysis doesn't force this kind of convergence (although paths converge when the states are the same), so if you are interested in forcing convergence I think we'll need something else in the analyzer that isn't there right now.

Let's talk about #2.  Essentially, the wordlist algorithm (which currently is either BFS or DFS) is all about prioritizing what part of the state space gets explored first. My goal is that we eventually have other intelligent algorithms besides BFS and DFS, that allow us to exploit even more information.  Thus, switching between BFS and DFS to solve this problem isn't the answer.  Switching between those should help if, say, you want to bound the amount of work done by the analyzer (which we do) and want to prioritize that some types of paths are prioritized over others.  This may help you find different kinds of bugs more effectively.  For "all paths" checkers, like yours, BFS will tend to cover paths that end sooner, but it won't explore paths that require going deeper into the state space (like DFS does).  This will directly impact the results you get from your checker, but those are just tradeoffs in the results.  It doesn't actually solve the scalability problem.  If the number of paths to explore are too costly to enumerate, we are going to run into memory problems regardless.  Put another way, DFS and BFS are just enumerating paths in a different order.  The cost to enumerate all of them is the same.

To tackle memory scalability, the direction I have been gradually pushing the analyzer is to have the ability to gradually reclaim ExplodedNodes  (and states) as the graph grows larger.  It use to be that we would just generate GRStates, many which would just get thrown away and not included in ExplodedNodes, and not reclaim that memory until we reclaimed all the memory for an analysis.  Now GRStates are reference counted, so we can reclaim some of the memory more eagerly.  To take this further, we can possibly prune out parts of the ExplodedGraph as it grows if we can deem those parts "uninteresting."  I think this is very doable, but it's a lot trickier than it sounds, and it requires consulting which ExplodedNodes are currently referenced by BugReports.  If we can identify paths that are essentially uninteresting, nor are deemed critical by a checker, then we can potentially throw away those nodes.  Note that when we do this, we are throwing away analysis state.  If we throw away a path, only to explore another path that would have merged with the discarded path, we are going to end up analyzing the same path again.  That's an inherit time-space tradeoff.

So, I think the solution to your problem requires a combination of several approaches:

a) Make path analysis more scalable in general by more eagerly reclaiming memory when the ExplodedGraph starts to get large.  This is essentially a "garbage collection" of uninteresting paths.

b) Because (a) doesn't completely "solve" the path explosion problem (i.e., it doesn't guarantee that all paths are explored), if exploring all paths matters to you we will need to devise a strategy where we can (optionally) merge paths.  This requires being able to define a merge operation for everything in GRState, including checker-specific state.  It will then be up to some policy to decide when the merge takes place, as merging too frequently will cause us to discard too much information (i.e., it basically reduces to a flow-sensitive analysis).

Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110519/e443d0ef/attachment.html>