r179736 - [analyzer] Tweak getDerefExpr more to track DeclRefExprs to references.

Wed Apr 17 19:23:19 PDT 2013

On Wed, Apr 17, 2013 at 7:07 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>
> On Apr 17, 2013, at 19:01 , David Blaikie <dblaikie at gmail.com> wrote:
>
> On Wed, Apr 17, 2013 at 6:43 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>
>
> On Apr 17, 2013, at 18:35 , David Blaikie <dblaikie at gmail.com> wrote:
>
> On Wed, Apr 17, 2013 at 6:14 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>
>
> On Apr 17, 2013, at 17:15 , Anna Zaks <ganna at apple.com> wrote:
>
> +  // This is not valid C++; dynamic_cast with a reference type will throw
> an
> +  // exception if the pointer does not match the expected type. However,
> our
> +  // implementation of dynamic_cast will pass through a null pointer...or a
> +  // "null reference"! So this branch is actually possible.
> +  if (&val == 0) { //expected-note {{Assuming pointer value is null}}
>
>
> I know this is just a test, but the comment is bogus. Instead it's something
> like this:
>
> // This is not valid C++; if 'ptr' were null, creating 'ref' would be
> illegal.
> // However, this is not checked at runtime, so this branch is actually
> possible.
>
>
> Shouldn't the SA just be diagnosing the initialization of the
> reference 'val' as problematic, rather than the later use? The code is
> UB because of that reference initialization.
>
>
> Yes and no. We actually tried forbidding the creation of null lvalues for a
> while, but it turned out that not only does that seem to be legal,
>
>
> The specifics are a bit tricky here, but my understand is the
> dereference itself is not UB, but initializing the reference is. (at
> least that's how UBSan has been implemented & I'll take Richard's word
> for it - though I think the wording got better/more clear here in
> C++11, too)
>
>
> Sorry, yes, I didn't finish the thought. We rely on that by banning it
> whenever a reference crosses a function boundary.
>
>
> The next problem here is that we don't actually find out that the pointer is
> null until the if-case.  At this point the reference has already been
> defined, and whatever path we've taken to get to the purported null
> dereference might have relied on using the reference. Even if we realized
> there was a problem, it could be difficult to report it in a sensible way.
>
> What we could do is check the reference at initialization time in
> UndefinedAssignmentChecker, like we do for return values
> (ReturnUndefChecker) and function parameters (CallAndMessageChecker). At
> this point, if we can't prove a pointer is null (to warn about it), we'd
> just assume it's non-null.
>
> We've discussed this before, and I think we decided against it because
> people do make this mistake, and strictly following the C++ standard here
> would cause us not to warn about it at all. (Most compilers do not insert
> runtime checks for this undefined behavior.) The ideal case would be to make
> the assumption at bind time, and then show that the branch is dead, but the
> analyzer doesn't currently have a good way to assert that something is true
> on all paths, rather than just some path...which is critical for dead-code
> checking.
>
>
> Really? That seems strange. There's no constant condition evaluation?
> (even things as simple as "if (false)" - which this should be
> equivalent to)
>
>
> Just to illustrate, the issue is not that there's no constant condition
> evaluation, but that the analyzer evaluates things based on the path it's
> taken so far. So in this case, you reach the if-statement twice, but the
> value is known both times.
>
> int x = random() ? 0 : 1;
> if (x) {
> magic();
> }
>
>
> Conversely, in this case you reach the if-statement twice, but the branch
> you take is always the same:
>
> int x = random() ? 1 : 2;
> if (x) {
> magic();
> }
>
>
> And finally, in this case two out of three times the branch is the same, but
> the third time it isn't:
>
> int x = random() ? 0 : random() ? 1 : 2;
> if (x) {
> magic();
> }

But what about code that looks more like:

int x = random() ? 0 : 1;

if (false) {
  magic();
}

If it's still purely path-sensitive (you never look at the fact that
magic is unreachable regardless of path) then that makes sense - the
current approach just doesn't show the unreachability. You could
demonstrate that nothing path-specific was used to evaluate the
condition (which should be true in the reference case - but more work:
comparing the address of a reference to null should yield a constant
result, though it did involve the path-sensitive reference (but not
its value, really) in some irrelevant way).

> This problem is hardly intractable, but we just haven't done it yet. There
> are added difficulties with inlined functions, which we don't always
> re-analyze as top-level.
>
> Jordan