[PATCH] D53701: [Analyzer] Record and process comparison of symbols instead of iterator positions in interator checkers

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Nov 2 15:29:00 PDT 2018


NoQ added a comment.

Ok, so what this code does is, eg., for a call of `i1.operator==(i2)` that returns a symbol `$x` of type `bool`, it conserves the sacred checker-specific knowledge that `$x` is in fact equal to `$offset(i1) == $offset(i2)`, where `$offset(i)` is the offset symbol within `IteratorPosition` to which iterator `i` is mapped.

This looks like a renaming problem to me: what we really want to do is //rename// (i.e. update representation in the `SVal` hierarchy of) `$x` to `$offset(i1) == $offset(i2)`. And for now the single plausible approach to renaming problems in the Static Analyzer is to avoid them: //give the value a correct name from the start// so that you didn't have to rename it later.

What this means is that instead of waiting until `checkPostCall` to obtain `$x` and then trying to rename it into `$offset(i1) == $offset(i2)`, we should `evalCall` the comparison operator to return `$offset(i1) == $offset(i2)`. So that the symbol with the wrong name (i.e., `$x`) didn't appear in the program state in the first place.

The good thing about this approach is that it does not require any extra tracking at all - no comparison maps, no `evalAssume()`, nothing. The value is simply correct from the start.

As a side effect, you will have a chance (though still not forced) to avoid redundant invalidation of Store during evaluation of the operator call. This is the correct behavior for at least STL containers and probably for all containers ever created by mankind. Though of course you never know. I.e., what if the code under analysis measures complexity of vector sort procedure and increments a global counter every time two iterators are compared within that procedure? But at least for STL/boost/LLVM/WebKit containers this is probably the right thing to do.

***

Now, of course, `evalCall()` would suppress inlining. During `evalCall()` we currently do not know whether the function will be inlined or evaluated conservatively if none of the checkers evaluate it, but we can easily provide such information in `evalCall()`, so this is not a problem.

The problem with inlining is that we got names for iterator offsets wrong from the start, because we conjured them out of thin air and they are conflicting with the actual representation of offsets within the implementation of the container. Which brings us back to a renaming problem: the problem of renaming `$offset(i1) == $offset(i2)` into the actual return value `$x` that was computed via inlining. Moreover, this new renaming problem is ill-formed because renaming non-atomic symbols doesn't make any sense - we should instead rename `$offset(i1)` and `$offset(i2)` separately. Still, the problem is indeed, as you already noticed, solved easily when `$x` is concrete: we only need to assume that `$offset(i1) == $offset(i2)` or `$offset(i1) != $offset(i2)` depending on the concrete value of `$x`.

And if `$x` is symbolic, we could still //introduce a state split// here: on one branch both `$x` and `$offset(i1) == $offset(i2)` are false, on the other branch they both are true, and no additional tracking is ever be necessary. I believe that such state split is not invalid: it pretty much corresponds to the "eagerly assume" behavior, just for iterators. The question here is how much are we sure that both branches are possible. Even if neither inlining nor our custom iterator model managed to refute any of these two branches, one of the paths may still be infeasible. But the amount of error here is not larger than eagerly-assume, and for eagerly-assume it isn't that bad, so we could try. Of course, the alternative to state split is assuming things about `($offset(i1) == $offset(i2)) == $x`, but our constraint solver will not be able to handle such constraints, which is the very reason why we have problems with renaming (well, at least some of them; renaming temporary regions in C++ is slightly more complicated than that (understatement intended)). In fact, i think the reasoning behind having eager assumptions for numbers is exactly that: they wanted to make constraints simpler.

***

But still, i want to step back and ask the question that i really really want answered here: //if container methods are inlinable, do we actually want to track iterator positions manually?// Maybe just rely on inlining entirely when possible? Like, both for modeling and for bug-finding. Or only rely on `evalCall()`? Essentially, if inlining is not reliable enough for relying on it entirely (so we have to maintain a parallel checker-specific model of iterators and have these two models exchange information), why do we think that it is reliable enough for the purpose of evaluating iterator comparisons?

This question is in fact asked and answered, one way or another, intentionally or accidentally, with different levels of urgency, in every checker that tries to model effects of function calls. The most famous example of conscious approach to this problem (overstatement intended) so far is `RetainCountChecker` that has thousands of lines of code devoted solely to figuring out whether a function should be evaluated by the checker or inlined. Once the checker decides to rely on inlining while *also* modeling its own specific effects - inlining starts conflicting with modeling and the hell breaks loose.

So i believe it is very important to ask ourselves here: do we really want to //mix// our own symbolic model of iterators with the implicit model of iterators as plain structures in the Store that is automatically managed via inlining? Because if only we decide to either trust inlining entirely or not trust it at all, things become *much* easier. I expect that the amount of effort to keep these two models consistent with each other will explode very quickly as we add more features to the checker.

Now, the interesting part about keeping these two models consistent in this checker is that it is entirely a "constraint-like" problem: our effort consists entirely of adding new information about immutable entities (symbols) to the state. The state doesn't mutate - it is only being clarified. And when the state is actually mutated (eg., when we are modeling `operator++()`), no effort to maintain consistency is needed. This is very different from the problem we're having in `RetainCountChecker`, where it is a "store-like" problem, i.e. function calls we're trying to model are actively mutating the state that might have already been mutated within the inlined call, i.e. they're competing for the same data. And i believe that the difference is entirely in implementation details of these two checkers: there isn't that much difference in kinds of bugs they find, so i guess it's an interesting food for thought.

***

So, to summarize:

1. The eager state split solution is not all that bad here, and is also much easier to implement than the delayed state split you're trying to implement in this patch. Generally, any information produced by inlining is most likely pretty accurate and should, in a perfect world, be incorporated into the checker's state, given that the checker already decided to track some sort of internal information.
2. Still, i want to give you a heads up that the idea of maintaining an implicit (via checker traits and `evalCall()`) and an explicit (via Store and inlining) models of iterators simultaneously may result in very quick explosion in complexity. We already went into this direction because we didn't think about it this way, but i think it's never too late to reconsider.



================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:2066
            "Symbol comparison must be a comparison");
     return assumeNoOverflow(NewState, cast<SymIntExpr>(CompSym)->getLHS(), 2);
   }
----------------
P.S. What was the idea here? Like, `CompSym` was just computed via `BO_EQ` and has type of a condition, i.e. `bool` because we are in C++. Is this code trying to say that the result of the comparison is bounded by `true/2`?


Repository:
  rC Clang

https://reviews.llvm.org/D53701





More information about the cfe-commits mailing list