[cfe-dev] [analyzer] Zombie symbols finally getting buried!

Thu Nov 1 18:06:35 PDT 2018

Just wanted to give a heads up that i'm finally planning to land the 
ancient patch https://reviews.llvm.org/D18860 which makes sure that dead 
symbols are no longer accessed from "properly written" checkers 
(explained below) and that the respective leak warnings are issued.

This is an API breaking change, so downstream checkers are likely to be 
affected, so i guess i should explain how to update the checkers, 
because it might be hard to figure this out from the patch, as the 
original problem is quite complicated. The patch is already updating the 
existing checkers accordingly, so you can use it as an example.

The good news is, the API is simplified and the amount of boilerplate is 
reduced and now there are less ways of doing the same thing and/or 
shooting yourself in the foot :)

Of course there may be unexpected things happening in your checker that 
make these advice not applicable, but in a nutshell, here's what changed 
and how checkers should be updated:

** SymReaper.isDead(Sym) is now hard-wired to mean !SymReaper.isLive(Sym) **

This is intuitive and not really API-breaking. But i guess it's 
important to realize that these two no longer mean different things. In 
particular, there can no longer be symbols that are neither Dead nor 
Live, and there can no longer be symbols that are both Dead and Live at 
the same time (the latter are explained in 
https://reviews.llvm.org/D18860#1284817).

** SymbolReaper::maybeDead() is removed **

It is no longer necessary to mark symbols that the checker is no longer 
interested in as "maybe dead" in checkLiveSymbols(). If your code 
actively marks symbols as dead with the help of this method, just remove 
that code. Marking live symbols as live is, of course, still as 
necessary as it used to be. Use SymbolReaper::isDead() when you 
previously used SymbolReaper::imaybeDead()'s return value to figure out 
whether the symbol is dead or not.

** SymbolReaper's dead set iterators (dead_begin(), dead_end()) are 
removed. **

This is the most important part, i guess. It is no longer possible to 
iterate through dead symbols like we usually did, eg.:

     REGISTER_SET_WITH_PROGRAMSTATE(Trait, SymbolRef);
     ...
     for (auto I = SymReaper.dead_begin(), E = SymReaper.dead_end(); I 
!= E; ++I) {
       SymbolRef Sym = *I;
       if (State->contains<Trait>(Sym)) {
         reportLeak(Sym);
       }
     }

This used to be a common idiom, but it never made sense in the first 
place because the number of dying symbols is potentially infinite and 
it's impossible to enumerate all of them.

Additionally, calling isLive() or maybeDead() within the outer loop over 
Sym might have invalidated iterator I, causing a crash, as explained in 
https://reviews.llvm.org/D18860#1284817 - but i doubt anybody ever wrote 
such code.

The proposed approach, that is equivalent to the old approach up to 
zombie symbols and is as such more correct, would be to re-order the 
loops "inside out":

     for (auto Sym : State->get<Trait>()) {
       if (SymReaper.isDead(Sym)) {
         reportLeak(Sym);
       }
     }

The performance of the new idiom is slightly different from what it used 
to be. It's faster when the checker tracks few symbols (with the 
noticeable example of checker not tracking anything at all, which is 
most checkers on most of the code), but it's slower when the checker 
tracks a lot of symbols. In practice, performance gains and losses 
turned out to be marginal on all real-world code i've tested this patch 
so far.

Usage of this new idiom is what i meant by "properly written" checkers 
in the beginning. If the checker always cleans up its traits in 
checkDeadSymbols() or keeps symbols alive in checkLiveSymbols() for as 
long as it needs to, it should no longer ever encounter dead symbols, 
and neither would the rest of the Static Analyzer.

** SymbolReaper::hasDeadSymbols() is removed **

It is impossible to figure out if there are dead symbols. So any code of 
hasDeadSymbols() should be optimized out into "true", with subsequent 
dead code elimination. I doubt it ever meant anything in the first place 
because the checkDeadSymbols() callback will anyway be skipped entirely 
if there are no dead symbols. And also hasDeadSymbols() never had any 
well-defined meaning in checkLiveSymbols() because dead symbols are in 
the middle of being computed.

** The checkDeadSymbols callback is never skipped **

It used to have been skipped when there were no dead symbols, i.e., when 
SymReaper.hasDeadSymbols() yielded false. Now it cannot yield false (it 
doesn't even exist, as explained above), so the callback is never 
skipped. This is presumed to be the main reason why doesn't this change 
give us performance gains.

While it sounds like a safe change, it did actually cause a slight 
change in behavior in MallocChecker (https://reviews.llvm.org/D54013) 
and in NullabilityChecker (https://reviews.llvm.org/D54017). Which means 
that if your checker does something more than state cleanup or leak 
report in checkDeadSymbols(), especially if it does things 
unconditionally regardless of whether any symbols are dead, you might 
need to audit the code to see if it behaves as desired and doesn't rely 
on the callback not firing under certain circumstances (which most 
likely indicates a bug anyway).

** I think that's it. **

Thanks to everybody who discussed and pushed this patch forward :)