[PATCH] D132236: [analyzer] Fix liveness of LazyCompoundVals

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 24 15:57:59 PDT 2022


NoQ added a comment.

Nice catch but I don't think this is the right solution.

Symbol liveness corresponds to a concrete runtime property of the program: can the program obtain this value by executing further runtime instructions? The program cannot obtain a pointer to variable `n`, or even a pointer to a region `*n`, just by looking at the data in a struct that was copied from region `*n` previously. Therefore neither `n` nor `*n` should not be marked live just because they are mentioned in a base region of a lazy compound value. In particular, memory leaks should be reported against `n` if the last place it's mentioned is a base region of a lazy compound value, as the program cannot possibly free that memory. You can most likely write a MallocChecker test for that, which will become a false negative after your patch.

For the same reason, for any region `R`, if `reg<R>` or `derived<R, $x>` is live, does not mean `R` is live. //The program cannot guess a memory address by only looking at a value loaded from that address.//

If you look at how a simpler example works:

  void clang_analyzer_warnIfReached();
  
  void foo(int **n) {
    int *n2 = *n;
    if (!**n) {
      if (*n2) {
        clang_analyzer_warnIfReached();
      }
    }
  }

You will notice that it does not try to keep VarRegion `n` or even the symbol `reg_$0<int ** n>` alive. It is *sufficient* to keep `reg_$1<int * Element{SymRegion{reg_$0<int ** n>},0 S64b,int *}>` alive in order to keep the test passing, and it does match the actual definition of live symbol (`reg_$0` can no longer be retrieved by the program but `reg_$1` can).

The original code tries to do the same for lazy compound values. I suspect that you'll find the actual bug when you descend into `getInterestingValues()`.

What looks fishy about `getInterestingValues()` is that it assumes that the amount of interesting values is finite. This sounds incredibly wrong to me. If a lazy compound value contains any pointer symbol `$p`, then all values in the following infinite series are interesting:

  $p,  *$p,  **$p,  ***$p,  ...

So I think the correct solution would be to maintain a list of explicitly-live compound values. Just like we already maintain a list of explicitly live symbols - `TheLiving`, and explicitly live regions - `RegionRoots`; neither of these lists enumerates all live symbols or regions as there are infinitely many of them, but each of these lists enumerates the ones that we've learned to be definitely live, and they're sufficient to determine liveness of any other symbol through recursive descent into the identity of that symbol. So with a list of live compound value regions, we can ask a question like "is this symbol stored in any of the known-live compound values?" to determine that symbol's liveness. This will keep the stored symbol alive without making the lazy compound value's base region alive.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132236/new/

https://reviews.llvm.org/D132236



More information about the cfe-commits mailing list