[cfe-dev] CFRefCount Problem #4: Hybrid GC

Sun Aug 28 14:37:08 PDT 2011

[Background: Currently we're trying to eliminate the concept of "transfer functions" from the static analyzer, which has gradually been supplanted by the Checker interface. The only implementation of the transfer functions, CFRefCount, has two roles: general-purpose function and method evaluator, and retain-count checking for the Cocoa and CoreFoundation frameworks. The former is being moved to ExprEngine*, the latter to a regular checker called RetainReleaseChecker. Of course, there are always design issues; I'm trying to make sure they get cleared up on-list one by one.]

Almost done! This hasn't been nearly as bad as I've feared.

The last major issue with CFRefCount is the option to compile Objective-C code as "optionally garbage-collected", or "Hybrid GC" mode. For "GC required" and "non-GC" modes, we just look in the LangOptions and all is good. But for HybridGC, we have to analyze the code whether it ends up in a GC environment or not. Since HybridGC is the problem, the rest of this e-mail assumes we're analyzing in hybrid-GC mode.

Currently, the way analyses are run on each function/method looks like this:

- non-path-sensitive checks
- path-sensitive, GC off
- path-sensitive, GC on

In order to get RetainReleaseChecker to switch back and forth between GC and non-GC modes (which alternate, because there may be many functions/methods), CFRefCount is being (ab)used as a non-const "beginAnalysis" callback.

As discussed off-list, we're not really trying to optimize for the hybrid-GC case--analyzing a hybrid-GC codebase is basically the same to analyzing two different codebases, because the frameworks behave differently. Still, there's not one clear solution to make RetainReleaseChecker unprivileged. I came up with three different high-level approaches to the problem:

1. One hybrid path-sensitive run
- Advantage: the frontend doesn't care about the GC mode.

1A) Have two checkers that share code (like NSErrorChecker and CFErrorChecker), but use different GDM tags.
- Disadvantage: higher peak memory usage due to less state unification.
- Disadvantage (?): fatal errors in one checker's model will stop analysis of the other model. But if you have fatal errors, you'll have to re-run again anyway.
- Disadvantage: doesn't make any allowances for /other/ checkers that might care if we're running in GC mode.
- With arbitrary checkers this could cause a state explosion due to a Cartesian product, but RetainReleaseChecker rarely bifurcates the state, if ever; 1*1 is still 1.

1B) Bifurcate the state on demand with a GCEnabled tag. After the tag is set, just follow that mode.
- This /is/ essentially doing the analysis twice, but might cost us some memory.
- Possibly accessible to other checkers, especially if the bifurcation happens in ExprEngine(ObjC) and not RetainReleaseChecker.

2. Two runs, with a proper beginAnalysis callback that includes the GC mode.
- ...but who actually knows about the GC mode? ExprEngine? CheckerManager? AnalysisContext?
- do path-insensitive checks also need to run twice?

2A) One checker that alters its behavior
- Essentially what happens now.
- Advantage (?): RetainReleaseChecker's SummaryLog could also benefit from a beginAnalysis callback.

2B) Two checkers that share code, but only one is enabled each run
- Can't think of any advantages over 2A, really.

3. Two runs over the entire /translation unit/, since a checker's lifetime is at least as long as a translation unit.
- Advantage: no resetting between runs
- Disadvantage: bug reports will be out of order. This is pretty bad.
- Still have the "who knows about the GC mode" problem.

Personally, I think 1A is the cleanest solution, but not very extensible. 1B is what I'd actually go with: it knocks consideration of the GC mode down to the ExprEngine, it reuses the mechanisms we already have to deal with "two runs", and it's accessible to any checker who cares. I'm thinking a special ExprEngine::assumeObjCGC() or ProgramState::assumeObjCGC, which returns a pair of states for "GC off" and "GC on" (if feasible). It's a little heavy-handed, but it minimizes the impact on the rest of the analyzer, which shouldn't have to think about GC.

On the other hand, maybe it's /too/ heavy compared to what we have now. A beginAnalysis callback would be fairly simple to implement, and the GC mode isn't too hard to thread through ExprEngine instead of CFRefCount.

Not sure which way to go on this one.

Jordy

P.S. Everything would get simpler if we stopped distinguishing between "hybrid, non-GC" and "non-GC", and "hybrid, GC on" and "GC only", because then we could just change the LangOptions before they got to the checkers. But those are nice distinctions to make, and I think they're worth keeping.