[PATCH] D127874: [analyzer] Reimplement UnreachableCodeChecker using worklists

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 15 09:36:56 PDT 2022


steakhal created this revision.
steakhal added reviewers: NoQ, martong, balazske, ASDenysPetrov, Szelethus, frederic-tingaud-sonarsource.
Herald added subscribers: manas, dkrupp, donat.nagy, mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware, xazax.hun.
Herald added a project: All.
steakhal requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This checker suffers from many false-positives.
I decided to have a look at them and I found multiple areas in which
we could improve.

1. We should ignore `try` statements for now, until #55621 <https://github.com/llvm/llvm-project/issues/55621> is resolved by fixing the CFG generation for try-catch and other exception-related constructs.
2. If some checker placed a sink node into some block, the successors of the sink block was marked unreachable previously. This leads to an unpleasant situation from the user's point of view since a different bugreport gets correlated with the FPs caused by the sink node of that. This effectively means that for each unconditional sink, we will also have (preferable) an unreachable code report as well. Now, we ignore unreachable nodes if the predecessor is reachable, but none of its successors are.
3. The `sweep-back` part cannot be implemented correctly via a DFS visitation. It needs to be done in a breath-search manner, using a worklist. I'll express later why.
4. Imagine a completely detached CFG block segment, e.g.:

  void test10_chained_unreachable(void) {
    goto end;
  a:
    goto b;
  b:
    goto c;
  c:
    goto a;
  end:;
  }

This produces this CFG:

  #6(entry)      #2(goto a;)
   |              |        ^
  #5(goto end;)   |         \
   |             #4(goto b;) |
  #1(end:)        |          |
   |             #3(goto c;) |
  #0(exit)         \________/

Previously, the checker only reported `B2` as dead, since that was the
first block which it encountered, thus starting the `swipe-back` from
it. IMO picking artificially that block makes no sense. We should
either mark all or none in this case. Why would be 'only' the `B2` dead?

---

The previous algorithm:

1. Walk each visited `ExplodedNode`, find which corresponds to basic-block enter events and mark the entered `CFG` block as reachable by adding it to the corresponding set.
2. Walk all the CFG blocks starting with the artificial `CFG` exit node (`ID 0`). Do a backward DFS to find the start of each unreachable CFG block chain, to only report unreachable diagnostics to the very first unreachable block. I refer to this step as the `sweep-back` part. We do this by inserting the uninteresting CFG block IDs into the reachables set.
3. Iterate over the CFG blocks, and if the block is not present in the reachables set, then it's a candidate for diagnostic. We do some trivial checks to filter out common FPs, and emit the report otherwise.

Why do we need to do the `sweep-back` part in BFS order?
Here is some code demonstrating this:

  int foo(int x) {
    if (x != 0) return -1;
    int v1 = 100 / x; // Div by zero
  
    int clamped;
    if (v1 < 0)
      clamped = 0;
    else if (v1 > 100)
      clamped = 100;
    else if (v1 % 2 == 0)
      clamped = 0;
    else if (v1 % 2 == -1)
      clamped = -1;
    else
      clamped = v1;
  
    return clamped;
  }

The resulting CFG looks like this:

            #13(entry)
             |
            #12
           /   \
          #10   \
         /   \  |
        #8    #9|
       /  \   | |
      #6   #7 | |
     /  \   | | |
    #4   #5 | | |
   /  \   | | / |
  #2   B3 | |/  |
   \    | |//   |
    \   |///    |
     \  |//     |
      \ |/      /
       #1     #11
        \    /
         #0(exit)

Initially, `B13`, `B12`, `B10`, `B11`, `B0` are reachable, thus the rest are
unreachable.
The `swipe-back` phase starts from `B0` and marks `B1`, `B2`, `B4`, `B6`
reachable until it reaches the first unreachable block (`B8`), whose parent
is reachable. After this, it backtracks and checks if `B3` should be
refuted or not. It finds that the single predecessor block of `B3` is
reachable, thus we should report the `B3` as unreachable.
I leave the rest for the reader, but in the end, we end up having `B3`, `B5`,
`B8`, `B9` unreachable, thus 4 reports were generated for this example
previously.

If we were doing the `swipe-back` phase in BFS order, we would have only
`B8` and `B9` blocks as unreachable - as we should.

However, I'm suppressing these since they only exist because some
checker (in this case `DivByZeroChecker`) sank all paths reaching `B8` and
`B9`. Consequently, after fixing the div-by-zero bug, they would become
reachable again.

---

Unfortunately, even with this new implementation, the number of
false-positives is still too high to move this out of alpha.
I've checked a couple of those reports, and I could not find any obvious
patterns. They certainly are, but I'd need to reduce some cases and have
a deeper look - which I haven't done.
IMO even this is a good increment which is worth considering to land.

My measurements did not show any crashes, or runtime regressions.
Regarding the reports:

- In both the baseline, and in this version: 13868
- Disappeared: 4560 (32.88%)
- New: 190 (1.37%)

So, this implementation would not flood the users with new reports,
'only' remove a couple of annoying ones.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D127874

Files:
  clang/include/clang/Analysis/AnalysisDeclContext.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h
  clang/lib/StaticAnalyzer/Checkers/UnreachableCodeChecker.cpp
  clang/test/Analysis/unreachable-code-exceptions.cpp
  clang/test/Analysis/unreachable-code-path.c
  clang/test/Analysis/unreachable-code-path.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D127874.437211.patch
Type: text/x-patch
Size: 21888 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20220615/850ac1d3/attachment-0001.bin>


More information about the cfe-commits mailing list