[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Thu Jun 3 21:46:54 PDT 2021

NoQ added a comment.

Ok. Oof. Whoa. That's amazing.

Let me re-iterate to make sure that my understanding of this patchset is correct.

**Chapter 1.** //"A is correlated with B (ρ = 0.56), given C, assuming D and under E conditions"//

- Two new entities are introduced: trackers and handlers.
- Trackers are attached to bug reports. A single bug report may have multiple trackers.
  - Why not have only one tracker per report?
- Handlers are attached to trackers. Each tracker can have multiple handlers with a priority system.
  - Do you envision it growing into a dependency system later?
- A subclass of visitors known as "tracking visitors" are connected to trackers. One tracker may have multiple visitors attached to it.
- Trackers are a tiny, almost uncustomizable class. It contains is a method called `track()` that is invoked when attention of handlers is required.
- Each tracker comes with a default set of handlers.
- The only way to customize a tracker is to add extra handlers to it.
- Handlers implement a `handle()` method invoked by the tracker every time their attention is required in `track()`.
- Handlers may return a note from their `handle()` method. If a handler returns a note, this note is attached to the bug report and lower-priority handlers are not invoked at all.
- There are two major classes of handlers: expression handlers and store handlers. Expression handlers pay attention to Environment entries, store handlers pay attention to Region Store bindings.

**Chapter 2.** //"For immediate release: Scientists find potential link between A and B"//

- Bug visitors are still responsible for discovering connections across exploded nodes. On the contrary, `track()`/`handle()` workflow is instantaneous.
  - It sounds like this whole patchset is orthogonal to reimplementing value tracking visitors with note tags, as long as we can pass a reference to the appropriate tracker into the tag(?)
- Tracking starts when a checker creates a tracker and invokes `track()` on it, which instantly causes the respective `handle()` to be invoked.
- `handle()` may attach visitors. These visitors may in turn call `track()` as they reach a different exploded node.
- The termination condition is as usual: no further visitors attached.
  - It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

**Chapter 3.** //"A causes B all the time. What will this means for Obama?"//

- `trackExpressionValue()` now boils down to creation of a single tracker and an initial invocation of `track()` on the expression which in turn causes expression handlers to immediately react.
- Most of the old body of `trackExpressionValue()` is transformed into a default expression handler. So it's still part of an immediate reaction, just a bit more generalized.
- Most of the visitors that previously composed `trackExpressionValue()` are still part of the process; they're attached by the default expression handler.
- However, instead of adding notes directly, they simply invoke `track()`. All note-attaching work is moved into additional default handlers.

**Chapter 4.** //"I'm wearing this to ward off A!"//

- Basic usage in checkers is basically unchanged. They still simply call `trackExpressionValue()` and it just works.
- Advanced usage - which is the whole point of this patchset - probably implies adding custom handlers to the tracker created by `trackExpressionValue()`.
- Let's consider an example. Suppose you're implementing `evalCall` for `llvm::dyn_cast`. You could have your expression handler detect a modeled `dyn_cast` call and implement its `handle()` method to emit a checker-specific note and start tracking the argument.
- With this implementation such note will only be emitted if the value is tracked back to `dyn_cast`. Not every modeled `dyn_cast` will receive a note but only the one that, say, caused a null dereference on its return value when it failed.
  - Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.
  - It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was `dyn_cast` modeled during analysis, so it has to reverse-engineer what happened.
  - For example, in order to find out whether `dyn_cast` succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
  - Whereas a note tag would have this information accessible directly.
    - Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to //ask// the tracker whether `track()` was invoked at this node on a certain expression or region?
    - I.e., polling instead of callbacks.
    - This would naturally integrate with note tags.
    - This does not solve the priority problem. We could try to solve it separately by, say, enforcing at most one note per exploded node, and then implementing a similar priority system over visitors.
    - Or am I missing something about your solution that makes connection to note tags a non-issue?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103605/new/

https://reviews.llvm.org/D103605