Unavailable method checker

Tue Dec 3 15:24:01 PST 2013

On Nov 6, 2013, at 7:18 AM, Tarka T <tarka.t.otter at gmail.com> wrote:

> OK, I think the current patch covers everything then. Let me know if you see any more issues.
> 
> Regards,
> Richard

Hi Richard,

First, thank you so much for working on this.  I’m sorry I haven’t had a chance to comment sooner.  My comments are a mixture of nitpicky (which are minor, but probably worth addressing) as well as a few high-level comments.  Overall I think this is a great foundation for further work on what I think could be really valuable checker.

My comments will be roughly chronologically thought UnavailableChecker.cpp, with a few broader comments interlaced.  My first high-level comment is that this is a very algorithmic checker; it’s doing something quite interesting.  It would be great to see more liberal use of comments explaining the motivation behind various pieces.  It allows someone to better understand what the checker is doing, but also allows one (including myself) to better understand if the checker is doing quite what was intended.  I’ll provide specific comments throughout the file on where I think more information is needed.

> #include "ClangSACheckers.h"
> #include "clang/AST/Attr.h"
> #include "clang/AST/DeclObjC.h"
> #include "clang/Basic/TargetInfo.h"
> #include "clang/StaticAnalyzer/Core/Checker.h"
> #include "clang/StaticAnalyzer/Core/CheckerManager.h"
> #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
> #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
> #include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
> #include "llvm/ADT/SmallString.h”
> 

To start off, this file is missing the standard LLVM boilerplate, which mentions the name of the file and what its purpose is.  In this case, a brief description of the purpose of the checker seems warranted.

> using namespace clang;
> using namespace ento;
> 
> namespace {
>   class UnavailableMethodChecker
>       : public Checker< check::PreCall,
>                         eval::Assume,
>                         check::Event<ImplicitNullDerefEvent> > {
>     mutable OwningPtr<BugType> BT;
>   public:
>     void checkPreCall(const CallEvent &Call, CheckerContext &Ctx) const;
>     ProgramStateRef evalAssume(ProgramStateRef State, SVal Cond,
>                                bool Assumption) const;
>     void printState(raw_ostream &Out, ProgramStateRef State,
>                     const char *NL, const char *Sep) const;
>     void checkEvent(ImplicitNullDerefEvent Event) const;

These are standard API hooks in the Checker API, but it would be great to have some doxygen comments on these.

> 
>   private:

These next methods are internal to the checker, and they really would be well-served with some doxygen comments.  When reviewing the checker, I had to reverse engineer every piece.  Having some high-level comments here that explain what these methods do would be really helpful.

>     void displayError(const ProgramStateRef State, VersionTuple Introduced,
>                       const NamedDecl *D, ExplodedNode *N,
>                       BugReporter *BR) const;
>     const Decl *weakDeclForSymbol(SymbolRef Sym) const;
>     const Decl *weakDeclForRegion(const MemRegion *MR) const;

I’d prefer these to be named “getWeakDeclForSymbol” instead of “weakDeclForSymbol”.  This follows the LLVM naming conventions where functions should be verb phrases:

  "Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g. openFile() or isFoo()).”

For more information, see:

  http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

In “weakDeclForSymbol” there is no verb; “for” is a preposition.  I know this comment may seem a bit silly, and while I respect your style of names we should aim to keep them consistent with the rest of the codebase.

>     VersionTuple deploymentTarget(ProgramStateRef State) const;

I don’t think this method is even needed.  More comments below.

>     VersionTuple versionIntroduced(const Decl *D) const;
>     VersionTuple versionCheckForSymbol(SymbolRef S, bool Assumption) const;

Same comment as above: “getVersionIntroduced”, etc.  These also really need doxygen comments.

>     VersionTuple getCheckedForVersion(ProgramStateRef State) const;
>     ProgramStateRef setCheckedForVersion(ProgramStateRef State,
>                                          VersionTuple V) const;
>   };
> }
> 
> REGISTER_TRAIT_WITH_PROGRAMSTATE(CheckedVersionMajor, unsigned)
> REGISTER_TRAIT_WITH_PROGRAMSTATE(CheckedVersionMinor, unsigned)
> REGISTER_TRAIT_WITH_PROGRAMSTATE(CheckedVersionSubminor, unsigned)

Having three maps is far less efficient than having one map for VersionTuple.  It’s not just the size of the individual pieces of data; its the metadata to represent the map itself.  All of these seemed to be changed together in lockstep, it is is far more efficient to both retrieve and set data from one map instead of 3.  To understand the efficiency, each map is represented by a functional AVL binary tree.  Having three trees requires balancing and querying three trees, as well as the memory associated with three trees.

> 
> void UnavailableMethodChecker::checkPreCall(const CallEvent &Call,
>                                             CheckerContext &Ctx) const {
>   // Bail early if there is no deployment target.
>   ProgramStateRef State = Ctx.getState();
>   VersionTuple TargetMinVersion = deploymentTarget(State);
>   if (TargetMinVersion.empty())
>     return;

Since this check ultimately only relies on ASTContext, I think this check can be hoisted into registerUnavailableMethodChecker().  That way the checker doesn’t even get run if this information is not available.  It then allows you to remove ‘deploymentTarget()’ entirely, and not call it.

> 
>   if (!(isa<SimpleCall>(Call) || isa<ObjCMethodCall>(Call)))
>     return;
> 
>   if (isa<ObjCMethodCall>(Call)) {
>     const ObjCMethodCall &ObjCCall = cast<ObjCMethodCall>(Call);

This should be a dyn_cast, not an isa plus a cast.  For example:

  if (const ObjCMethodCal *ObjCCall = dyn_cast<ObjCMethodCall>(&Call))

> 
>     // ObjC methods with no type information about the receiver cannot be
>     // reasoned about accurately.
>     const ObjCMessageExpr *ME = ObjCCall.getOriginExpr();
>     if (!ME->getReceiverInterface())
>       return;
> 
>     // Array and dictionary subscripts can be ignored, they are implemented
>     // for lower deployment targets by the runtime.
>     if (ObjCCall.getMessageKind() == OCM_Subscript)
>       return;
>     
>     // If this method is calling self of super with the same selector, then we
>     // can ignore any version checking.
>     if (ObjCCall.isReceiverSelfOrSuper()) {
>       const StackFrameContext *SFC = Ctx.getStackFrame();
>       if (const ObjCMethodDecl *D = dyn_cast<ObjCMethodDecl>(SFC->getDecl())) {
>         if (D->getSelector() == ME->getSelector())
>           return;
>       }
>     }
>   }
> 
>   const Decl *D = Call.getDecl();
>   if (!D)
>     return;
> 
>   VersionTuple Introduced = versionIntroduced(D);
> 
>   // If we have an ObjC method with no introduced version, check the class's or
>   // the protocol's introduced version.
>   const NamedDecl *VersionDecl = cast<NamedDecl>(D);
>   if (Introduced.empty()) {
>     if (const ObjCMethodDecl *MD = dyn_cast<ObjCMethodDecl>(D)) {
>       const DeclContext *DC = MD->getDeclContext();
>       if (const ObjCProtocolDecl *PD = dyn_cast<ObjCProtocolDecl>(DC)) {
>         Introduced = versionIntroduced(PD);
>         VersionDecl = PD;
>       } else {
>         const ObjCInterfaceDecl *ID = MD->getClassInterface();
>         Introduced = versionIntroduced(ID);
>         VersionDecl = ID;
>       }
>     }
>   }

This looks like this can be hoisted directly into “versionIntroduced” (or rather, “getVersionIntroduced”).  That way getVersionIntroduced() computes the truth as a self-encapsulated information, regardless of whether it consults the map or has to compute some of the information lazily.  Also, it doesn’t look like all of the callers to getVersionIntroduced() compute this information consistently; hoisting it into getVersionIntroduced() would solve that problem.

> 
>   if (TargetMinVersion >= Introduced)
>     return;
> 
>   // We have a method being called that is available later than the minimum
>   // deployment target, so we need to check for any availability checks.
>   VersionTuple CheckedVersion = getCheckedForVersion(State);
>   if (CheckedVersion >= Introduced)
>     return;
> 
>   if (ExplodedNode *N = Ctx.addTransition(State)) {
>     displayError(State, Introduced, VersionDecl, N, &Ctx.getBugReporter());
>   }

This is good, but maybe we should only emit a warning once along a path.  I’m concerned about a cascade of warnings.  For example, say I had 10 function calls, one after the other, that were used unsafely, but would be safely guarded with a single check.  We really don’t need 10 warnings; we only need one.  If we stopped tracking version information along a path, we could perhaps only emit the warning once?  One way to do that would be to pump up the checked version to something really high and store that in the state.  A call to ‘addTransition()’ doesn’t stop the path; the analyzer will keep exploring a path.

> }
> 
> ProgramStateRef UnavailableMethodChecker::evalAssume(ProgramStateRef State,
>                                                      SVal Cond,
>                                                      bool Assumption) const {
>   // Bail early if there is no deployment target set.
>   VersionTuple TargetMinVersion = deploymentTarget(State);
>   if (TargetMinVersion.empty())
>     return State;

I believe this can be removed if we do this check during checker registration.

> 
>   SymbolRef S = Cond.getAsSymbol();
>   if (!S)
>     return State;
> 
>   // Walk through the symbol's children looking for a version check.
>   VersionTuple V = versionCheckForSymbol(S, Assumption);
>   return setCheckedForVersion(State, V);
> }
> 
> void UnavailableMethodChecker::checkEvent(ImplicitNullDerefEvent Event) const {
>   if (Event.IsLoad)
>     return;
> 
>   // Bail early if there is no deployment target.
>   ProgramStateRef State = Event.SinkNode->getState();
>   VersionTuple TargetMinVersion = deploymentTarget(State);
>   if (TargetMinVersion.empty())
>     return;

I believe this can be removed if we do this check during checker registration.

> 
>   const MemRegion *MR = Event.Location.getAsRegion();
>   if (!MR)
>     return;
> 
>   // If this is a weak function, it is not OK to dereference if availability
>   // has not been checked for.
>   if (const Decl *D = weakDeclForRegion(MR)) {
>     VersionTuple Introduced = versionIntroduced(D);
>     VersionTuple Checked = getCheckedForVersion(State);
>     if (Introduced > Checked)
>       // TODO: get the highlight range for the event.
>       displayError(State, Introduced, cast<NamedDecl>(D),
>                    Event.SinkNode, Event.BR);
>   }
> }

Looks good.

> 
> void UnavailableMethodChecker::printState(raw_ostream &Out,
>                                           ProgramStateRef State,
>                                           const char *NL,
>                                           const char *Sep) const {
>   VersionTuple CheckedVersion = getCheckedForVersion(State);
>   if (!CheckedVersion.empty()) {
>     Out << Sep;
>     Out << "Checked availability: " << CheckedVersion.getAsString();
>   }
> }
> 
> void UnavailableMethodChecker::displayError(const ProgramStateRef State,
>                                             VersionTuple Introduced,
>                                             const NamedDecl *D,
>                                             ExplodedNode *N,
>                                             BugReporter *BR) const {
>   if (!BT)
>     BT.reset(new BugType("API Unavailable", "API Misuse"));
> 
>   SmallString<128> sbuf;
>   llvm::raw_svector_ostream os(sbuf);
>   switch (D->getKind()) {
>   case Decl::Function:
>     os << "Calling function";
>     break;
> 
>   case Decl::ObjCMethod:
>     os << "Calling method";
>     break;
> 
>   case Decl::ObjCProtocol:
>     os << "Calling method belonging to protocol ";
>     os << "'" << *D << "’";

Minor, just include ‘*D’ on the previous line:

  os << “Calling method belonging to protocol '“ << *D << “‘";

You have plenty of space.

>     break;
> 
>   case Decl::ObjCInterface:
>     os << "Calling method belonging to class ";
>     os << "'" << *D << "'";
>     break;

Same comment.

> 
>   default:
>     llvm_unreachable("Unknown Decl kind.");
>   }
> 
>   ASTContext &Ctx = State->getStateManager().getContext();
>   StringRef TargetPlatform = Ctx.getTargetInfo().getPlatformName();
>   StringRef PrettyPlatformName =
>     AvailabilityAttr::getPrettyPlatformName(TargetPlatform);
>   if (PrettyPlatformName.empty())
>     PrettyPlatformName = TargetPlatform;
> 
>   os << " introduced in " << PrettyPlatformName << " " << Introduced;
>   os << " (deployment target is " << deploymentTarget(State) << ")";
>   BugReport *R = new BugReport(*BT, os.str(), N);
>   BR->emitReport(R);
> }

Looks good.

> 
> VersionTuple
> UnavailableMethodChecker::deploymentTarget(ProgramStateRef State) const {
>   ProgramStateManager &PSM = State->getStateManager();
>   return PSM.getContext().getTargetInfo().getPlatformMinVersion();
> }

I believe that this method can just be removed.  We can query the deployment target at checker construction time, and save the VersionTuple with the checker object itself.

> 
> VersionTuple UnavailableMethodChecker::versionIntroduced(const Decl *D) const {
>   if (D)

This null check is unnecessary.  All calls to versionIntroduced() are dominated by a null check.  While this is defensive programming, it makes it looks like the contact of the function is weaker than it really is.

>     if (const AvailabilityAttr *Availability = D->getAttr<AvailabilityAttr>())
>       return Availability->getIntroduced();
> 
>   return VersionTuple();
> }
> 
> const Decl *UnavailableMethodChecker::weakDeclForSymbol(SymbolRef Sym) const {
>   const SymbolExtent *SE = dyn_cast<SymbolExtent>(Sym);
>   if (!SE)
>     return NULL;
> 
>   return weakDeclForRegion(SE->getRegion());

Minor, but this can be written more concisely as:

  if (const SymbolExtent *SE = dyn_cast<SymbolExtent>(Sym))
   return getWeakDeclForRegion(SE->getRegion());
  return NULL;

That’s 3 lines instead of 5.  Again it’s really minor; nothing really wrong with it.

> }
> 
> const Decl
> *UnavailableMethodChecker::weakDeclForRegion(const MemRegion *R) const {
>   if (!isa<FunctionTextRegion>(R))
>     return NULL;
> 
>   const FunctionTextRegion *FTR = cast<FunctionTextRegion>(R);
>   const FunctionDecl *D = cast<FunctionDecl>(FTR->getDecl());
>   return D->isWeak() ? D : NULL;
> }

Looks good.

> 
> VersionTuple
> UnavailableMethodChecker::versionCheckForSymbol(SymbolRef Sym,
>                                                 bool Assumption) const {
>   // Iterate over the expressions symbols looking for version checks.
>   for (SymExpr::symbol_iterator I = Sym->symbol_begin(), E = Sym->symbol_end();
>        I != E; ++I) {

The follow logic is basically “pattern matching” over symbolic expressions.  I sometimes think it is helpful to hard a brief comment over such logic to illustrate the idiom being matched, so it is clear what is covered and what isn’t.  It also serves as the inspiration for test cases.

>     if (const SymIntExpr *SIExpr = dyn_cast<SymIntExpr>(*I)) {
>       // A symbol is being compared to an int value. The Assumption value
>       // is adjusted so that it is checking the symbol exists.
>       BinaryOperatorKind op = SIExpr->getOpcode();
>       if (op != BO_EQ && op != BO_NE)
>           break;
> 
>       bool rhs = SIExpr->getRHS() != 0;

Please follow LLVM naming conventions consistently when it comes to variable names.  In this case, please keep ‘rhs’ as ‘RHS’:

  "Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g. Leader or Boats)."

I know the analyzer doesn’t consistently follow this style; we’re trying to gradually fix that.

>       if ((op == BO_EQ && rhs == false) || (op == BO_NE && rhs == true))
>         Assumption = !Assumption;

Minor:

  rhs == false can be: !RHS
  rhs == true can be: RHS

> 
>       continue;
>     }
> 
>     // We are only interested in checks for existence.

This comment is a bit too terse.  “existence” of what?  It’s missing the subject of the sentence.

>     if (!Assumption)
>       break;
> 

Please also include a comment here.  Why is the SymbolExtent being checked here?

>     if (const SymbolExtent *SE = dyn_cast<SymbolExtent>(*I)) {
>       if (const Decl *D = weakDeclForSymbol(SE))
>         // We have a weakly linked function, return its introduced version.
>         return versionIntroduced(D);
>     } else if (const SymbolConjured *SConj = dyn_cast<SymbolConjured>(*I)) {
>       const ObjCMessageExpr *ME = dyn_cast<ObjCMessageExpr>(SConj->getStmt());
>       if (!ME)
>         continue;
> 
>       // This is an Objective-C message, first get the class availability.
>       // If we have no class here, we cannot reason about the version checking,
>       // so give up.
>       const ObjCInterfaceDecl *CD = ME->getReceiverInterface();
>       if (!CD)
>         break;
> 
>       VersionTuple ClassVersion = versionIntroduced(CD);
> 
>       // If this message is checking for availability, return the max of the
>       // class version and the checked for method version.
>       Selector MsgSel = ME->getSelector();
>       if (MsgSel.getNumArgs() > 1)
>         break;

Minor, but in general hoist the cheap checks above the expensive ones.  Checking if the selector has more than 1 argument is cheaper than getting the version information from the map.  This might not matter from a performance perspective, depending on the frequency of which branch succeeds/fails.  In this case, ClassVersion isn’t even used until after a bunch of other checks.  You can possibly make this more lazy.  In this case it probably doesn’t make a difference, so if you prefer to leave it this way that’s fine.

> 
>       StringRef SelStr = MsgSel.getNameForSlot(0);
>       if (SelStr == "respondsToSelector" ||
>           SelStr == "instancesRespondToSelector") {
>         // The argument could be a selector expression or a SEL variable.
>         // TODO: make this work for SEL variables.
>         if (const ObjCSelectorExpr *SExp =
>             dyn_cast<ObjCSelectorExpr>(ME->getArg(0))) {
>           Selector ArgSel = SExp->getSelector();
>           const ObjCMethodDecl *D = CD->lookupInstanceMethod(ArgSel);
>           if (!D)
>             D = CD->lookupClassMethod(ArgSel);
> 
>           if (D)
>             return std::max(ClassVersion, versionIntroduced(D));
>         }
>       } else if (SelStr == "class") {
>         return ClassVersion;
>       }
>     }
>   }
> 
>   return VersionTuple();
> }
> 
> VersionTuple
> UnavailableMethodChecker::getCheckedForVersion(ProgramStateRef State) const {
>   unsigned major = State->get<CheckedVersionMajor>();
>   unsigned minor = State->get<CheckedVersionMinor>();
>   unsigned subminor = State->get<CheckedVersionSubminor>();
>   return VersionTuple(major, minor, sub minor);

Please change this method to do a single lookup for the VersionTuple, instead of three lookups.

> }
> 
> ProgramStateRef
> UnavailableMethodChecker::setCheckedForVersion(ProgramStateRef State,
>                                                VersionTuple V) const {
>   VersionTuple CheckedVersion = getCheckedForVersion(State);
>   if (CheckedVersion >= V)
>     return State;
> 
>   State = State->set<CheckedVersionMajor>(V.getMajor());
>   State = State->set<CheckedVersionMinor>(V.getMinor());
>   State = State->set<CheckedVersionSubminor>(V.getSubminor());
>   return State;

Please change this method to set the value for one set, instead of 3.

> }
> 
> void ento::registerUnavailableMethodChecker(CheckerManager &mgr) {
>   mgr.registerChecker<UnavailableMethodChecker>();

As mentioned earlier, you can put some checking for deployment information right here.  If the context doesn’t support running the checker, just don’t run it at all.

> }

For the test cases, they look like a good start.  One thing I’d like to see is a test case that shows two unsafe calls one after the other.  For example, you currently have:

> - (void)doVersion30InstanceStuff {
>   [super doVersion30InstanceStuff]; // no warning
>   [self doVersion40InstanceStuff];  // expected-warning {{Calling method introduced in OS X 40.0 (deployment target is 10.5.0)}}
> }

Try adding a third call here, maybe just repeating the second line:

> - (void)doVersion30InstanceStuff_B {
>   [super doVersion30InstanceStuff]; // no warning
>   [self doVersion40InstanceStuff];  // expected-warning {{Calling method introduced in OS X 40.0 (deployment target is 10.5.0)}}
>   [self doVersion40InstanceStuff];  // expected-warning {{Calling method introduced in OS X 40.0 (deployment target is 10.5.0)}}
> }

Right now I’d expect to see two warnings.  It would be great to get them down to one (just the first one).

Overall this is looking really great.  Thanks so much for working on this!

Cheers,
Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131203/d5815d97/attachment.html>