[cfe-dev] [analyzer] Summary IPA thoughts

Tue Nov 17 08:41:59 PST 2015

Skype call minutes - part. Anna, Devin, Alexey, and me:

- The idea of context-insensitive summaries (vs. context-sensitive 
summaries), is still controversal and non-mainstream in academic works, 
and causes too much intrusion into the checker API. While 
context-sensitive approach to the summaries (through abstraction of an 
inlined analysis graph, and re-application in another context, with a 
lattice of contexts by order of known-ness and heuristics to determine 
if existing known-ness is fine enough or needs to be improved through 
constructing a summary for a more specific context) requires a lot of 
work, which we would probably be unable to perform instantly, but the 
idea is clearly worth nurturing.

- In particular, there is still uncertainty regarding heap shape 
renaming correctness and invalidation support in the context-insensitive 
approach we propose.

- Alpha-renaming is worth it regardless of summary, and we're preparing 
the necessary patches, and there are still minor technical issues to be 
discussed. It can be implemented as a combination of 
constraint-solver-side equality checks for symbols, and 
whole-state-renaming passes, whichever turns out to be more performant.

- All of the above would be simplified greatly with implementation of a 
"Smart" Data Map, as opposed to the current Generic Data Map: such map 
would automate certain operations that currently require checker-side 
support. In fact, otherwise alpha-renaming itself, when implemented with 
whole-state-renaming passes, would require checker-side support for 
renaming GDM structures. The core needs to have access to checker 
structures in the program state. Designing a declarative language for 
coding typestate-analysis checkers would probably be a part of this project.

- Incremental development of any approved approaches "under a flag" is 
welcome, i guess that's the way we're going to proceed.

I hope i didn't forget anything important.

=========

A few thoughts that came up later:

Hmm, maybe a "smart" data map would allow completely avoiding 
checker-side support even in context-insensitive approach? Another idea 
worth nurturing, i guess - after all, that's what we originally tried.

Probably, implement both things - a smart data map (for maximum 
simplicity) and intrusive checker callbacks (for maximum flexibility)? 
Something like AST Matchers vs. AST Visitors - first is concise, second 
is omnipotent.

Regarding the context-insensitive heap shape, i think i managed to 
partially understand the question. The question was that if we only have 
heap shape in caller before the call, and no heap shape in summary 
before the call (since the original state in the summary is empty), then 
how do we match these two in order to obtain the final heap shape. Well, 
in fact, the original heap shape, that corresponds to the original empty 
state at start of the summary, is represented by the very structure of 
SymbolicRegion-SymbolRegionValue pairs: every SymbolRegionValue, which 
by definition (as we discussed here 
http://lists.llvm.org/pipermail/cfe-dev/2015-November/045976.html) 
represents the value of the region at the beginning of the analysis, and 
keeps representing it even in the final state. Hence, SymbolRegionValue, 
as a value that does not change during analysis, even if actual value 
stored in the region changes, forever captures the original heap shape. 
Similarly, SymbolDerived captures the heap shape after an invalidation. 
Which means that in the final state, actually we have the whole history 
of heap shape changes. The only problem, described with a minimal 
example in 
http://lists.llvm.org/pipermail/cfe-dev/2015-October/045704.html , seems 
trivial rather than blocking to me; timestamps seem to be more than 
enough to handle it.