[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Valeriy Savchenko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 4 03:51:30 PDT 2021


vsavchenko added a comment.

In D103605#2798171 <https://reviews.llvm.org/D103605#2798171>, @NoQ wrote:

> Ok. Oof. Whoa. That's amazing.
>
> Let me re-iterate to make sure that my understanding of this patchset is correct.
>
> **Chapter 1.** //"A is correlated with B (ρ = 0.56), given C, assuming D and under E conditions"//
>
> - Two new entities are introduced: trackers and handlers.

Correct.

> - Trackers are attached to bug reports. A single bug report may have multiple trackers.
>   - Why not have only one tracker per report?

That's a good question, we can actually even make it a part of `BugReport` for that matter.  I can though think of one exotic case, where you wait till you encounter certain weird condition, where you need to spawn a very specific customized tracker only starting from that point.

> - Handlers are attached to trackers. Each tracker can have multiple handlers with a priority system.
>   - Do you envision it growing into a dependency system later?

Yes, I think in order to correctly split some parts of the logic that already exists in `trackExpressionValue`

> - A subclass of visitors known as "tracking visitors" are connected to trackers. One tracker may have multiple visitors attached to it.

Correct.

> - Trackers are a tiny, almost uncustomizable class. It contains is a method called `track()` that is invoked when attention of handlers is required.

Correct.  If you really want to, you can change the default behavior of those track functions.

> - Each tracker comes with a default set of handlers.

Correct.  This is how we migrate existing functionality into the new interface.

> - The only way to customize a tracker is to add extra handlers to it.

It's not the only one, you can subclass it and add completely different logic.  But the main way for the majority of use cases is indeed through handlers.

> - Handlers implement a `handle()` method invoked by the tracker every time their attention is required in `track()`.

Correct.

> - Handlers may return a note from their `handle()` method. If a handler returns a note, this note is attached to the bug report and lower-priority handlers are not invoked at all.

Not entirely correct, this is correct for store handlers, these are intended to customize messages for events of symbolic value being stored into the region.
Expression handlers return `Tracker::Result` that simply tells if some visitor was added or not (backward compatibility with `trackExpressionValue`) and if we should prevent all other handlers from running here.

> - There are two major classes of handlers: expression handlers and store handlers. Expression handlers pay attention to Environment entries, store handlers pay attention to Region Store bindings.

I would say that expression handler is a way to externally extend `trackExpressionValue` functionality.  Add new supported type of expression that we can track or add checker-specific visitor recursively.

> **Chapter 2.** //"For immediate release: Scientists find potential link between A and B"//
>
> - Bug visitors are still responsible for discovering connections across exploded nodes. On the contrary, `track()`/`handle()` workflow is instantaneous.

It is tricky because `track` can add visitors and they can call it back from `VisitNode`, and so on to infinity.  This is a pattern that we had between `trackExpressionValue` and `FindLastBRVisitor` (and `ReturnVisitor`) and every checker can create their own recursive tracking.

> - It sounds like this whole patchset is orthogonal to reimplementing value tracking visitors with note tags, as long as we can pass a reference to the appropriate tracker into the tag(?)

Correct, it just makes existing back-slicing tracking easier to customize.

> - Tracking starts when a checker creates a tracker and invokes `track()` on it, which instantly causes the respective `handle()` to be invoked.
> - `handle()` may attach visitors. These visitors may in turn call `track()` as they reach a different exploded node.

Exactly.

> - The termination condition is as usual: no further visitors attached.
>   - It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

Actually, tracker dies when this happens.  But hey, can you provide a use case when the checker really needs to know when this happens?

> **Chapter 3.** //"A causes B all the time. What will this means for Obama?"//
>
> - `trackExpressionValue()` now boils down to creation of a single tracker and an initial invocation of `track()` on the expression which in turn causes expression handlers to immediately react.

Correct.

> - Most of the old body of `trackExpressionValue()` is transformed into a default expression handler. So it's still part of an immediate reaction, just a bit more generalized.

Correct.

> - Most of the visitors that previously composed `trackExpressionValue()` are still part of the process; they're attached by the default expression handler.

Correction: "**All** of the visitors..."

> - However, instead of adding notes directly, they simply invoke `track()`. All note-attaching work is moved into additional default handlers.

No, they call `track` where they used to call `trackExpressionValue` or add `FindLastStoreBRVisitor`.
Right now, only stores received their special customization and can produce notes based on tracker customization.

> **Chapter 4.** //"I'm wearing this to ward off A!"//
>
> - Basic usage in checkers is basically unchanged. They still simply call `trackExpressionValue()` and it just works.

Correct

> - Advanced usage - which is the whole point of this patchset - probably implies adding custom handlers to the tracker created by `trackExpressionValue()`.

Correct

> - Let's consider an example. Suppose you're implementing `evalCall` for `llvm::dyn_cast`. You could have your expression handler detect a modeled `dyn_cast` call and implement its `handle()` method to emit a checker-specific note and start tracking the argument.

So, in here it would look like this: If I want to support `dyn_cast` for tracking, I will add extra `ExpressionHandler` that calls `track` on the expression inside of `llvm::dyn_cast`.  Additionally, if you want to change the note's message for situations like `a = dyn_cast<Ty>(b)`, you can add a `StoreHandler` that will produce a note like "a is casted from b".

> - With this implementation such note will only be emitted if the value is tracked back to `dyn_cast`. Not every modeled `dyn_cast` will receive a note but only the one that, say, caused a null dereference on its return value when it failed.

Correct.

> - Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.

I mean, it's just an orthogonal mechanism.

> - It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was `dyn_cast` modeled during analysis, so it has to reverse-engineer what happened.

Correct, but during the analysis, checker could've modeled a zillion of different `dyn_casts`, and we don't want to have notes on all of them.

> - For example, in order to find out whether `dyn_cast` succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
> - Whereas a note tag would have this information accessible directly.

Maybe we can access the note tag from the handler and kill two birds with one stone?

> - Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to //ask// the tracker whether `track()` was invoked at this node on a certain expression or region?

There are still situations when you don't have tags, but want to hook into the tracking process, so you can add your custom feature.  Let's say you modeled a system call `foo(a, b)` and you want to track `a` and `b`.  Or more generally if you want to help tracking when it's stuck.
Additionally, handlers help to tear existing `trackExpressionValue` implementation apart.

> - I.e., polling instead of callbacks.
> - This would naturally integrate with note tags.

My counter-suggestion is to create a flag on `NoteTag` (maybe `Prunable` already can fit into that, I'm not very familiar with this interface) that means "this tag should appear only when part of the tracking trace", and make a handler-visitor pair that will produce notes with messages in those tags.
IMO, it will prevent a ton of checkers repeating very similar code: make visitor that will check if some expression is tracked and generate a note out of tag

> - This does not solve the priority problem. We could try to solve it separately by, say, enforcing at most one note per exploded node, and then implementing a similar priority system over visitors.
> - Or am I missing something about your solution that makes connection to note tags a non-issue?

My understanding is that tags don't rule out tracking.  The other way around is also true.  My take on this - let's make tracking a bit more flexible, it's not gonna hurt anyone.
Another aspect that I think is very important here is the ability to make this change incrementally and to extract some parts of `trackExpressionValue` and `FindLastStoreBRVisitor` back into checkers to make it at least a bit easier for the reader.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103605/new/

https://reviews.llvm.org/D103605



More information about the cfe-commits mailing list