[cfe-commits] [PATCH] Checking for sometimes-uninitialized variables

Ted Kremenek kremenek at apple.com
Thu May 24 14:33:40 PDT 2012


Hi Richard,

I very much like this work.  I have some nit picky comments, but I'd like to first focus on the high-level ones.

First, I'm wondering if this should just be rolled into -Wuninitialized.  If the false positive rate is low (or none), that would seem to be the right thing to do.  Otherwise, most users aren't going to benefit from this warning unless they know about it.  It seems like the whole point of this patch is to find cases where something is not unconditionally uninitialized but we have much higher confidence in reporting an issue.  This patch doesn't try to resolve "maybe uninitialized" issues that could be resolved if we had better path-sensitivity.  Thoughts? 

Second, I honestly found the description of the algorithm in the comments of getUninitUse() to be (at first) fairly unintuitive.  Since the algorithm isn't so obvious, actually having an example in the comments would be great.  After I read the code several times, I figured it out, but it also required me working through an example on paper.  After that, the comments made sense to me.

Ok, more comments inline.

On May 17, 2012, at 1:01 PM, Richard Smith <richard at metafoo.co.uk> wrote:

> Hi,
> 
> The attached patch adds a new classification of uninitialized variables, between 'always used uninitialized' and 'may be used uninitialized', which (unlike 'may be used uninitialized') has no false positives (after making one conservative assumption), and produces more warnings than 'always used uninitialized'.
> 
> The core idea is to assume that for each condition which was explicitly written in the source code, and isn't constant, there exists a live code path which will cause it to be true and a live code path which will cause it to be false. If every such code path definitely leads to an uninitialized variable use, we classify the use as being 'sometimes uninitialized' rather than 'may be uninitialized'.
> 
> I've run this across a large body of code, and found mostly bugs, but also two flavors of "false positive":
> 
> * Dead conditions
> 
> int n = ...;
> int x;
> if (n < 0) { x = 0; }
> else if (n >= 0) { x = 1; }
> return x;

I think the CFG should just be made smarter to prune out the unreachable branch here.  Then dataflow analyses like these just get smarter.  We essentially already do this for switch statements.

> 
> - and -
> 
> int x;
> for (int n = 0; n < size; ++n) { // Redundant "n < size" test.
>  // This is somehow guaranteed to happen for some n.
>  if (f(n) == k) {
>    x = n;
>    break;
>  }
> }
> return x;

This is a common source of false positives with the static analyzer, but I'd argue that the code should just initialize 'x' to be safe.  I can't look at this in isolation and immediately tell if this is safe, because it requires me making assumptions about size, n, which aren't documented in the code.

> 
> * Value dependence within function calls
> 
> bool hasFoo = false;
> int foo;
> if (complex_condition) {
>   hasFoo = true;
>   foo = something;
> }
> f(hasFoo, foo);
> 
> Here, f doesn't use its second argument if its first argument is false. However, an lvalue-to-rvalue conversion on an uninitialized object has undefined behavior (in C++ at least), so it doesn't seem unreasonable to complain about this.

Indeed.  I'd consider this poor form.

> 
> Is this appropriate for including in trunk clang?

It is.  One thing that I really like about it is that it doesn't increasing the running time of -Wuninitialized unless we find some issues.  That's great.

> It seems to provide a better tradeoff between false positives and false negatives than either of the existing warning options for us. If so, should these warnings be part of -Wuninitialized, or should they be kept separate?

I think we should try making them part of -Wuninitialized, or at least try to do so.  That's the only way we'll properly bake it (with users).

More comments in the patch itself:

> +
> +  UninitUse getUninitUse(const Expr *ex, const VarDecl *vd, Value v) {
> +    UninitUse Use(ex, isAlwaysUninit(v));
> +
> +    assert(isUninitialized(v));
> +    if (Use.getKind() == UninitUse::Always)
> +      return Use;
> +
> +    // If an edge which leads unconditionally to this use did not initialize
> +    // the variable, we can say something stronger than 'may be uninitialized':
> +    // we can say 'either it's used uninitialized or you have dead code'.
> +    //
> +    // We are almost looking for a post-dominator frontier for the use, except
> +    // that we do not want to skip past loops (whose non-termination might be
> +    // correlated with the initialization condition), and we do not want to
> +    // walk past blocks which initialize the variable. This makes the analysis
> +    // much simpler: we just look for simple acyclic paths leading to the use.
> +    llvm::SmallVector<unsigned, 32> Queue;

Why not just use a 'const CFGBlock*' instead of unsigned?  That would avoid the later lookup.

> +    llvm::SmallVector<unsigned, 32> SuccsVisited(cfg.getNumBlockIDs(), 0);
> +    Queue.push_back(block->getBlockID());
> +    // Specify that we've already visited all successors of the starting block.
> +    // This has the dual purpose of ensuring we never add it to the queue, and
> +    // of marking it as not being a candidate element of the frontier.
> +    SuccsVisited[block->getBlockID()] = block->succ_size();
> +    while (!Queue.empty()) {
> +      const CFGBlock *B = cfg.begin()[Queue.back()];

if Queue used 'const CFGBlock *' directly, this would just be:

   const CFGBlock *B = cfg.begin();

> +      Queue.pop_back();
> +      for (CFGBlock::const_pred_iterator I = B->pred_begin(), E = B->pred_end();
> +           I != E; ++I) {

I'd prefer:

  const CFGBlock *Pred = *I;

and keep using it.

> +        if (vals.getValue(*I, B, vd) == Initialized)
> +          // This block initializes the variable.
> +          continue;
> +
> +        unsigned PredID = (*I)->getBlockID();
> +        if (++SuccsVisited[PredID] == (*I)->succ_size())

I'd prefer:

   if (++SuccsVisited(Pred->getBlockID()] == Pred->succ_size())

and ditch PredID entirely.

> +          // All paths from this block lead to the use and don't initialize the
> +          // variable.
> +          Queue.push_back(PredID);

and then use

  Queue.push_back(Pred);

> +      }
> +    }
> +
> +    // Scan the frontier, looking for blocks where the variable was
> +    // uninitialized.
> +    for (CFG::const_iterator BI = cfg.begin(), BE = cfg.end(); BI != BE; ++BI) {
> +      const CFGBlock *Block = *BI;
> +      unsigned BlockID = Block->getBlockID();
> +      if (SuccsVisited[BlockID] && SuccsVisited[BlockID] < Block->succ_size() &&
> +          Block->getTerminator()) {
> +        // This block is in the (sort-of) post-dominator frontier for the use.
> +        // If we have an edge from here to a post-dominator block, and the
> +        // variable is uninitialized on that edge, we have found a bug.
> +        for (CFGBlock::const_succ_iterator I = Block->succ_begin(),
> +             E = Block->succ_end(); I != E; ++I) {
> +          if (*I && SuccsVisited[(*I)->getBlockID()] >= (*I)->succ_size() &&
> +              vals.getValue(Block, *I, vd) == Uninitialized) {

I personally would prefer 'Succ' instead of repeatedly doing '*I', but *I is shorter.

> +            // Switch cases are a special case: report the label to the caller
> +            // as the 'terminator', not the switch statement itself. Suppress
> +            // situations where no label matched: we can't be sure that's
> +            // possible.
> +            if (isa<SwitchStmt>(Block->getTerminator())) {
> +              const Stmt *Label = (*I)->getLabel();
> +              if (!Label || !isa<SwitchCase>(Label))
> +                // Might not be possible.
> +                continue;
> +              UninitUse::Branch Branch;
> +              Branch.Terminator = Label;
> +              Branch.Output = 0; // Ignored.
> +              Use.addUninitBranch(Branch);
> +            } else {
> +              UninitUse::Branch Branch;
> +              Branch.Terminator = Block->getTerminator();
> +              Branch.Output = I - Block->succ_begin();
> +              Use.addUninitBranch(Branch);
> +            }

This code calls Block->getTerminator() three times (starting from above).  Seems worth pulling out into a local variable.

> +          }
> +        }
> +      }
> +    }
> +
> +    return Use;
> +  }

Many of the other changes in the patch seem like just nice cleanups, and not core to this work.  I'd prefer they went in separately so that the key algorithmic change is easy to spot in a single (small) patch, but I don't want to waste your time with having to break it down if it is too complicated.

Fantastic work!




More information about the cfe-commits mailing list