[PATCH] Extend PointerSubChecker to detect more bugs

Thu Jul 11 17:39:57 PDT 2013

On Jul 10, 2013, at 12:02 , Matthew Dempsky <matthew at dempsky.org> wrote:

> On Wed, Jul 10, 2013 at 10:51 AM, Jordan Rose <jordan_rose at apple.com> wrote:
> Hi, Matthew. Thanks for working on this -- these are definitely useful checks! One concern I have, though, is that plenty of people assume that comparisons between different objects are not just undefined or unspecified but actually consistent.
> 
> Well, they can assume consistent pointer comparisons just like they can assume signed integer overflow wrapping, but that's not what the language specification says. :)
> 
> It might be nice to have a separate checker flag for the comparison checks, though the implementation can still be shared in one Checker class. You can see how this works in CStringChecker or IvarInvalidationChecker.
> 
> Okay, I'll look into that.

Yeah, it's cheap to add. People get annoyed at all our core checks, and if they're not really core we'll eventually want to be able to turn them off.

>> -  // When doing pointer subtraction, if the two pointers do not point to the
>> -  // same memory chunk, emit a warning.
>> -  if (B->getOpcode() != BO_Sub)
>> +  // When doing pointer subtraction or relational comparisons, if the two
>> +  // pointers do not point to the same memory chunk, emit a warning.
>> +  if (B->getOpcode() != BO_Sub &&
>> +      B->getOpcode() != BO_LT && B->getOpcode() != BO_GT &&
>> +      B->getOpcode() != BO_LE && B->getOpcode() != BO_GE)
>>     return;
> 
> You can simplify this with B->isRelationalOp().
> 
> Thanks, will do.
>> +    DefinedOrUnknownSVal NullityMatched = SVB.evalEQ(state,
>> +      SVB.evalEQ(state, *DLV, Null), SVB.evalEQ(state, *DRV, Null));
> 
> It would be wonderful if this worked, and indeed it will appear to work for your test cases involving a literal null. However, in the general case evalEQ just constructs an expression "a == b", and can't evaluate that on its own. The constraint manager's what actually does the work.
> 
> Yeah.  I found writing the constraint as ((lhs == null) && (rhs == null)) || ((lhs != null) && (rhs != null)) worked better with the constraint manager, but just looked uglier.  I was going to look into if evalEQ like this could be improved, but in a pinch rewriting it would work too.
> 
> A better way to do this would be to use ProgramState::isNull on each value, and checking if one value is constrained to true while the other is false. However, it looks like isNull is actually going to give you the wrong answer for non-symbolic regions, so you'd have to fix that first. Barring that, you should assume() each condition individually and compare the results.
> 
> I originally assume()'d each variable separately, but I couldn't figure out how to later add an addTransition() to represent that we now know (!lhs == !rhs), because I ended up with two separate program states: one for (lhs && rhs) and one for (!lhs && !rhs).
> 
> I suppose it's not the end of the world to lost that information though.

I think it's all right, especially because people probably wouldn't expect that to be the information the analyzer picks up from a comparison. (Especially because "a < b" can't ever be true if one of the arguments is a null pointer.)

>> +    ProgramStateRef StMatch = state->assume(NullityMatched, true);
>> +    if (!StMatch) {
>> +      if (ExplodedNode *N = C.addTransition()) {
>> +        BugReport *R;
>> +        if (B->getOpcode() == BO_Sub) {
>> +          if (!BT_nullsub)
>> +            BT_nullsub.reset(new BuiltinBug("Pointer subtraction", 
>> +                                "Subtraction between a null and non-null pointer "
>> +                                "may cause incorrect result."));
>> +          R = new BugReport(*BT_nullsub, BT_nullsub->getDescription(), N);
>> +        } else {
>> +          if (!BT_nullcmp)
>> +            BT_nullcmp.reset(new BuiltinBug("Pointer comparison", 
>> +                                "Comparison between a null and non-null pointer "
>> +                                "may cause incorrect result."));
>> +          R = new BugReport(*BT_nullcmp, BT_nullcmp->getDescription(), N);
>> +        }
>> +        R->addRange(B->getSourceRange());
>> +        C.emitReport(R);
>> +      }
>> +      return;
>> +    }
> 
>> +    if (B->getOpcode() == BO_Sub || !C.getLangOpts().CPlusPlus) {
>> +      C.addTransition(StMatch);
> 
> Since you have more work to do, that's actually not a good idea! Each addTransition splits the state space, and we don't actually want to do that. I suggest just updating your 'state' variable, so that the final addTransition operates on the right state.
> 
> Basically, you want to make sure we don't actually bifurcate the path as a result of these checks, and at the same time you want to make sure that each report happens after at least one addTransition (so they don't look they're the previous statement's fault).
> 
> Sorry, not sure I follow you here.  There aren't any other non-sink transitions that get added by PointerSubChecker, so how does this bifurcate the path?  Can you sketch out how you think this should be done instead and/or point me to another checker to look at for comparison?

Whoops, I thought the base region checks generated a new node, but of course they don't. Carry on.

(...although now there might not be a state to do this with.)

> You also are checking for C++ in determining whether or not to record the assumption, but not in whether or not to warn. As tempting as it is, I don't think we can warn in C++ mode for something that's merely unspecified.
> 
> If clang analyzer's warning conventions are to only warn about undefined behavior and never to warn about merely unspecified behavior, I'm happy to abide that.
> 
> My rationale was just we should warn because the result has no specified value in C++, so users are likely to be caught off guard if/when compilers start exploiting that.  E.g., it's conforming for a C++ compiler to optimize this code
> 
>     int a, b;
>     return (&a < &b) || (&a == &b) || (&a > &b);
> 
> to merely "return false", but I think users would still be pretty shocked if this happened and find it tedious to track down themselves.

I think the warning should be consistent with the assumption, but maybe that just means we shouldn't bother with the assumption. It's not a mechanism people would normally use to communicate something being null, and it could lead to false positives later on with a very confusing path. ("Region assumed null here at this comparison....huh?")

Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130711/58f2eba1/attachment.html>