<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div>On Nov 22, 2011, at 10:11 PM, Ted Kremenek wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div>Stepping back, I think it's worth re-evaluating the overall algorithm. Here's an abstraction that I think will help.</div><div><br></div><div>Essentially we are building a virtual function callgraph. Once we have the callgraph, we can determine if any of the constructors/destructors can call a virtual function. Here's a simple decoupling of the algorithm:</div><div><br></div><div>(1) Do a DFS worklist algorithm to build the callgraph. This is basically the walk you are doing now, but you don't have to do any error reporting. You also don't need to keep track of the current call stack; all you care about is the current function you are analyzing, and the functions it calls. The worklist just consists of enqueueing FunctionDecls, dequeing them and walking their bodies looking for calls. When you see a call, you add the called function to the set of functions called by that function. Of course we only want to restrict ourselves to calls on the 'this' object.</div><div><br></div><div>(2) Once you have the (sparse) callgraph, you can iterate over all the constructors and destructors (you can cache a list of them on the side while doing step 1). For each one, do a BFS over the callgraph, looking for virtual functions. The BFS can be done as a worklist. The advantage of using a BFS is you just report the shortest call chain from the constructor/destructor to the virtual function. For each constructor/destructor, only report calling a specific virtual function once.</div><div><br></div><div>(2a) As a refinement to (2), you can cache when certain functions *never* call a virtual function, thus pruning the search. This is an optional step to improve performance.</div><div><br></div><div>(2b) Error reporting comes down to just reporting a path through the callgraph. Thus error reporting just becomes centralized in one place. No code duplication.</div><div><br></div><div>The nice thing about this approach is you get a really nice separation of concerns. You decouple to problem of finding the call graph (which conceptually could be extended to incorporate path-sensitive information, etc.) from the process of finding a bug (a path through the callgraph from constructor to virtual function). You also decouple the string generation from the logic of finding bugs.</div><div><br></div><div>I hope my comments are not discouraging. I think you are on the right track. I think taking a step back and looking at how the algorithm could be restructured would be really beneficial to cleaning up the code and also potentially making the checker more powerful in the long run. I also think you should pay a bit more attention to the accesses to containers. While doing redundant hash table lookups on a error reporting case is not so bad, having redundant lookups in the fast path of the checker really adds unnecessary sluggishness to your code.</div><div><br></div><div>I hope this helps.</div></span></blockquote></div><br><div>Thinking about this more, I think this strategy, while generally, is more complicated then you need for a first go around.</div><div><br></div><div>Here is a much simpler approach, which basically matches what you are already doing:</div><div><br></div><div>(1) Starting from each constructor/destructor, do a DFS, using a worklist. The worklist just has a chain of FunctionDecls.</div><div><br></div><div> (a) Define 'Enqueue' to enqueue a method to the worklist. This..</div><div><br></div><div> (i) adds the method to the stack (push_back())</div><div> (ii) adds the method to a DenseMap that records two states: "PreVisit" and "PostVisit". Enqueue marks them as PreVisit. </div><div><br></div><div> (b) When processing a method from the worklist:</div><div><br></div><div> (i) dequeue the item from the end of the stack, but don't remove it yet from the stack</div><div> </div><div> IF the function is in the PreVisit state:</div><div><br></div><div> (ii) walk its body:</div><div> - If we find a call to a virtual function, emit a warning by simply passing the complete stack + the virtual function called to our error reporting routine.</div><div> - If we find a call to a non-virtual function, add it to the worklist IFF it is not already in the PreVisit or PostVisit state.</div><div> (iii) mark the decl as being in the PostVisit state</div><div><br></div><div> (iv) continue processing the worklist</div><div><br></div><div> IF the function is in the PostVisit state:</div><div><br></div><div> (v) pop it off the worklist.</div><div><br></div><div>The value of the PreVisit versus PostVisit is that we always keep a full call chain stack available.</div><div><br></div><div>This should provide a nice decoupling between the search and the error reporting, while completely unrolling the recursion. As a further optimization, we can potential recording as part of step (v) whether or not a function is never transitively calls a virtual method, and thus doesn't need to be considered as part of any further search.</div><div><br></div><div>I think this should be a vast simplification of your existing logic, and also is much simpler than the algorithm I proposed.</div><div><br></div><div>What do you think?</div></body></html>