[PATCH] D27710: [analyzer] Prohibit ExplodedGraph's edges duplicating

Wed Dec 21 08:13:42 PST 2016

ilya-palachev added a comment.

In https://reviews.llvm.org/D27710#628069, @zaks.anna wrote:

> Are there any negative effects from having the duplicates in edges?

Yes, you're right; in current implementation there seems to be no negative effects from this, since duplicate edges are quite rare, and they produce very small overhead on memory (just two additional pointers for each excessive edge).

> When does this occur? It's a bit suspicious since we have a FromN, and a path is split at that node, resulting in successors that are the same. Is it possible that whoever split the state did not encode enough interesting info?

Yes, this  does occur (for one of our checkers on Android).
This patch is more likely a request for comments for what to do in the situation discussed below...

Consider the checker that can emit warnings on multiple dead symbols (one warning for each buggy symbol). When the checker encounters such situation, it first emits a warning for the 1st symbol, then for the 2nd one. For the 2nd and other symbols the CheckerContext::addTransition returns `nullptr'. That means that the requested node already exist, because the State hasn't changed. And for each warning beginning from the 2nd, the additional edge is added for the ExplodedGraph. It can be even simply seen with DOT graph.

Yes, this problem can be resolved with checker tags for nodes, but in this case we'll need to create new tag for each such warning (because ProgramPoint and ProgramState are equal for them all).

Moreover, such cases can even lead to non-determinism in warnings.

The checker stores a set(map/list...) of buggy symbols (regions...) in GDM. When checkDeadSymbols callback happens, this set is iterated, and the checker tries to emit a warning for each of them. But actually the order in which symbols are stored in the set depends on their order as pointers, and it's controlled by allocator. The allocator contains non-determinism by design, and region for symbol A can be greater (as a pointer) that the region for the symbol B in one analysis run, and smaller during the another run.

That's why in such cases different warnings can be emitted from time to time. The discussed patch doesn't address this issue, but I'd like at least to discuss the situation.

Repository:
  rL LLVM

https://reviews.llvm.org/D27710