[cfe-dev] [analyzer] Zombie symbols finally getting buried!
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Thu Nov 1 18:06:35 PDT 2018
Just wanted to give a heads up that i'm finally planning to land the
ancient patch https://reviews.llvm.org/D18860 which makes sure that dead
symbols are no longer accessed from "properly written" checkers
(explained below) and that the respective leak warnings are issued.
This is an API breaking change, so downstream checkers are likely to be
affected, so i guess i should explain how to update the checkers,
because it might be hard to figure this out from the patch, as the
original problem is quite complicated. The patch is already updating the
existing checkers accordingly, so you can use it as an example.
The good news is, the API is simplified and the amount of boilerplate is
reduced and now there are less ways of doing the same thing and/or
shooting yourself in the foot :)
Of course there may be unexpected things happening in your checker that
make these advice not applicable, but in a nutshell, here's what changed
and how checkers should be updated:
** SymReaper.isDead(Sym) is now hard-wired to mean !SymReaper.isLive(Sym) **
This is intuitive and not really API-breaking. But i guess it's
important to realize that these two no longer mean different things. In
particular, there can no longer be symbols that are neither Dead nor
Live, and there can no longer be symbols that are both Dead and Live at
the same time (the latter are explained in
https://reviews.llvm.org/D18860#1284817).
** SymbolReaper::maybeDead() is removed **
It is no longer necessary to mark symbols that the checker is no longer
interested in as "maybe dead" in checkLiveSymbols(). If your code
actively marks symbols as dead with the help of this method, just remove
that code. Marking live symbols as live is, of course, still as
necessary as it used to be. Use SymbolReaper::isDead() when you
previously used SymbolReaper::imaybeDead()'s return value to figure out
whether the symbol is dead or not.
** SymbolReaper's dead set iterators (dead_begin(), dead_end()) are
removed. **
This is the most important part, i guess. It is no longer possible to
iterate through dead symbols like we usually did, eg.:
REGISTER_SET_WITH_PROGRAMSTATE(Trait, SymbolRef);
...
for (auto I = SymReaper.dead_begin(), E = SymReaper.dead_end(); I
!= E; ++I) {
SymbolRef Sym = *I;
if (State->contains<Trait>(Sym)) {
reportLeak(Sym);
}
}
This used to be a common idiom, but it never made sense in the first
place because the number of dying symbols is potentially infinite and
it's impossible to enumerate all of them.
Additionally, calling isLive() or maybeDead() within the outer loop over
Sym might have invalidated iterator I, causing a crash, as explained in
https://reviews.llvm.org/D18860#1284817 - but i doubt anybody ever wrote
such code.
The proposed approach, that is equivalent to the old approach up to
zombie symbols and is as such more correct, would be to re-order the
loops "inside out":
for (auto Sym : State->get<Trait>()) {
if (SymReaper.isDead(Sym)) {
reportLeak(Sym);
}
}
The performance of the new idiom is slightly different from what it used
to be. It's faster when the checker tracks few symbols (with the
noticeable example of checker not tracking anything at all, which is
most checkers on most of the code), but it's slower when the checker
tracks a lot of symbols. In practice, performance gains and losses
turned out to be marginal on all real-world code i've tested this patch
so far.
Usage of this new idiom is what i meant by "properly written" checkers
in the beginning. If the checker always cleans up its traits in
checkDeadSymbols() or keeps symbols alive in checkLiveSymbols() for as
long as it needs to, it should no longer ever encounter dead symbols,
and neither would the rest of the Static Analyzer.
** SymbolReaper::hasDeadSymbols() is removed **
It is impossible to figure out if there are dead symbols. So any code of
hasDeadSymbols() should be optimized out into "true", with subsequent
dead code elimination. I doubt it ever meant anything in the first place
because the checkDeadSymbols() callback will anyway be skipped entirely
if there are no dead symbols. And also hasDeadSymbols() never had any
well-defined meaning in checkLiveSymbols() because dead symbols are in
the middle of being computed.
** The checkDeadSymbols callback is never skipped **
It used to have been skipped when there were no dead symbols, i.e., when
SymReaper.hasDeadSymbols() yielded false. Now it cannot yield false (it
doesn't even exist, as explained above), so the callback is never
skipped. This is presumed to be the main reason why doesn't this change
give us performance gains.
While it sounds like a safe change, it did actually cause a slight
change in behavior in MallocChecker (https://reviews.llvm.org/D54013)
and in NullabilityChecker (https://reviews.llvm.org/D54017). Which means
that if your checker does something more than state cleanup or leak
report in checkDeadSymbols(), especially if it does things
unconditionally regardless of whether any symbols are dead, you might
need to audit the code to see if it behaves as desired and doesn't rely
on the callback not firing under certain circumstances (which most
likely indicates a bug anyway).
** I think that's it. **
Thanks to everybody who discussed and pushed this patch forward :)
More information about the cfe-dev
mailing list