[cfe-dev] Two pass analysis framework: AST merging approach

Wed May 4 23:27:50 PDT 2016

 > Can you explain what the "two pass analysis" does?
 > Ex: Does it loop through each of the TU second time and
 > “inline” every call from other TUs? In which order are the other
 > TUs loaded? In which order the call sites are processed? Do you
 > repeat until no change? Did you measure coverage in some way?
 > Did you perform path-sensitive checks? (The time of analysis of 4X
 > seems much lower than what I would expect, given that we now
 > explore much deeper paths and the analyzer has exponential
 > running time.)

On first pass we get a bunch of -emit-ast dumps, on second pass we go 
ahead and analyze each translation unit old-style, but whenever we find 
an inter-unit CallEvent during path-sensitive analysis, we import the 
section of the AST dump containing the function body and all dependent 
sections, and inline the call. The inlined call may later trigger more 
imports if there are inter-unit calls we'd end up wanting to model.

Yeah, benchmarking is a bit more difficult than that. I think Alexey has 
some complicated numbers. I guess the slowdown is only-4x on 
path-sensitive checks simply because there are too many drops due to 
-analyzer-config max-nodes= limit. The most practical measurement would 
probably be to increase limits until the number of reports stops 
growing. It's also possible to count number of limit drops, number of 
exploded nodes constructed, number of bug reports with and without 
unification, we did some of this but not all.

________

My best idea on reducing AST loads is to relax "typedness" requirements 
on SVal hierarchy. For example, if an inter-unit function references an 
inter-unit static global variable, this variable can probably be 
represented as some kind of "untyped VarRegion" (let's call this class 
"XTUVarRegion", and inherit it from SubRegion directly, rather than from 
TypedValueRegion), and then its type (which may be a complicated class 
or template-instantiation declaration) doesn't need to be imported. The 
XTUVarRegion should still be uniquely determined by the variable - we 
need to know that two different functions imported from that translation 
unit refer to the same variable. Not sure - maybe some MemSpace magic 
may be employed to control invalidation, maybe we could use separate 
memory spaces for different translation units.