[cfe-dev] Interprocedural data flow analysis

Sat Aug 23 10:07:13 PDT 2008

Hi!

Due to a relatively involved bughunt over the course of the last week,  
I've got the silly idea to look into static analysis to guard against  
the mistake in the future ;)

The problems I'm looking detect involve inter-procedural analysis  
since that's what has been biting us. Specifically I need to look for  
accesses to certain data structures and make sure those accesses are  
properly bracketed by calls to functions (only a small set of known  
functions, which makes it easier to detect - doesn't need to be fully  
generic for this problem at hand). It gets worse by the fact that the  
data structure is passed around quite a lot and that the name is  
reused all over the place.
Effectively I guess I need to perform some kind of "alias" analysis.  
I'd like to do it at the source level, though in theory I suppose I  
could compile the code and perform those checks on LLVM bitcode/IR,  
too. For the tool to be most helpful, it should be able to report  
source locations, though.

In order to get a better feeling for what's in clang today, I  
performed some experiments. Here's a very simple example I've been  
playing around with:

int first(int param1) {
	int first1 = param1;
	return param1 + 1;
}

int main(int argc, char *argv[]) {
	int main1;
	int main2 = main1;
	main1 = first(main2);
	return 0;
}

Running it with:
	clang -warn-dead-stores -warn-uninit-values -cfg-dump main.c
emits the warnings (three in total) in two separate blocks, and does  
not warn about an uninitialized value for param1 in first(). The order  
and format of the output suggests that each function is looked at  
separately right after it has been parsed. Is that a correct  
observation? So far I seem to have not enough knowledge about the code  
to verify it myself (note to self: spend more time in the debugger :))

Further, I've not been able to answer the question whether there is  
existing code for transfer functions that performs any chasing of  
CallExprs. LiveVariables does not seem to have special code to  
"follow" calls, although I see a bunch of other Visitxxx methods.

Would it be as "simple" as performing something like LiveVariables  
does, but implementing VisitCallExpr and hooking that up? It seems  
like it, but since I'm not familiar enough with it yet, I thought I'd  
ask first before programming myself into a corner...

Given the above observation about the time the analysis is performed,  
it seems to me that I at least need to postpone my analysis after  
everything is parsed. Does that make sense? Or is the order of the  
output purely coincidental in that it simply visits the blocks in that  
order and prints out diagnostics associated with the subgraph?

Thanks,
-k

P.S.: This is with the latest clang/llvm trunk versions.
-- 
Kay Röpke
http://classdump.org/