[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Fri Jun 4 06:22:20 PDT 2021

martong added a comment.

In D103605#2798171 <https://reviews.llvm.org/D103605#2798171>, @NoQ wrote:

> Ok. Oof. Whoa. That's amazing.
>
> Let me re-iterate to make sure that my understanding of this patchset is correct.
>
> **Chapter 1.** //"A is correlated with B (ρ = 0.56), given C, assuming D and under E conditions"//
>
> - Two new entities are introduced: trackers and handlers.
> - Trackers are attached to bug reports. A single bug report may have multiple trackers.
>   - Why not have only one tracker per report?
> - Handlers are attached to trackers. Each tracker can have multiple handlers with a priority system.
>   - Do you envision it growing into a dependency system later?
> - A subclass of visitors known as "tracking visitors" are connected to trackers. One tracker may have multiple visitors attached to it.
> - Trackers are a tiny, almost uncustomizable class. It contains is a method called `track()` that is invoked when attention of handlers is required.
> - Each tracker comes with a default set of handlers.
> - The only way to customize a tracker is to add extra handlers to it.
> - Handlers implement a `handle()` method invoked by the tracker every time their attention is required in `track()`.
> - Handlers may return a note from their `handle()` method. If a handler returns a note, this note is attached to the bug report and lower-priority handlers are not invoked at all.
> - There are two major classes of handlers: expression handlers and store handlers. Expression handlers pay attention to Environment entries, store handlers pay attention to Region Store bindings.
>
> **Chapter 2.** //"For immediate release: Scientists find potential link between A and B"//
>
> - Bug visitors are still responsible for discovering connections across exploded nodes. On the contrary, `track()`/`handle()` workflow is instantaneous.
>   - It sounds like this whole patchset is orthogonal to reimplementing value tracking visitors with note tags, as long as we can pass a reference to the appropriate tracker into the tag(?)
> - Tracking starts when a checker creates a tracker and invokes `track()` on it, which instantly causes the respective `handle()` to be invoked.
> - `handle()` may attach visitors. These visitors may in turn call `track()` as they reach a different exploded node.
> - The termination condition is as usual: no further visitors attached.
>   - It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.
>
> **Chapter 3.** //"A causes B all the time. What will this means for Obama?"//
>
> - `trackExpressionValue()` now boils down to creation of a single tracker and an initial invocation of `track()` on the expression which in turn causes expression handlers to immediately react.
> - Most of the old body of `trackExpressionValue()` is transformed into a default expression handler. So it's still part of an immediate reaction, just a bit more generalized.
> - Most of the visitors that previously composed `trackExpressionValue()` are still part of the process; they're attached by the default expression handler.
> - However, instead of adding notes directly, they simply invoke `track()`. All note-attaching work is moved into additional default handlers.
>
> **Chapter 4.** //"I'm wearing this to ward off A!"//
>
> - Basic usage in checkers is basically unchanged. They still simply call `trackExpressionValue()` and it just works.
> - Advanced usage - which is the whole point of this patchset - probably implies adding custom handlers to the tracker created by `trackExpressionValue()`.
> - Let's consider an example. Suppose you're implementing `evalCall` for `llvm::dyn_cast`. You could have your expression handler detect a modeled `dyn_cast` call and implement its `handle()` method to emit a checker-specific note and start tracking the argument.
> - With this implementation such note will only be emitted if the value is tracked back to `dyn_cast`. Not every modeled `dyn_cast` will receive a note but only the one that, say, caused a null dereference on its return value when it failed.
>   - Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.
>   - It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was `dyn_cast` modeled during analysis, so it has to reverse-engineer what happened.
>   - For example, in order to find out whether `dyn_cast` succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
>   - Whereas a note tag would have this information accessible directly.
>     - Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to //ask// the tracker whether `track()` was invoked at this node on a certain expression or region?
>     - I.e., polling instead of callbacks.
>     - This would naturally integrate with note tags.
>     - This does not solve the priority problem. We could try to solve it separately by, say, enforcing at most one note per exploded node, and then implementing a similar priority system over visitors.
>     - Or am I missing something about your solution that makes connection to note tags a non-issue?

Now this sounds like a plan. Valeriy, this is something that I missed for such a massive refactor...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103605/new/

https://reviews.llvm.org/D103605