[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Fri Jun 4 13:10:39 PDT 2021

NoQ added a comment.

In D103605#2798581 <https://reviews.llvm.org/D103605#2798581>, @vsavchenko wrote:

> In D103605#2798171 <https://reviews.llvm.org/D103605#2798171>, @NoQ wrote:
>
>> - Trackers are attached to bug reports. A single bug report may have multiple trackers.
>>   - Why not have only one tracker per report?
>
> That's a good question, we can actually even make it a part of `BugReport` for that matter.  I can though think of one exotic case, where you wait till you encounter certain weird condition, where you need to spawn a very specific customized tracker only starting from that point.

Ok, let's take a simpler example:

  int foo(int D) {
    int A = 1;
    int B = 1;
    int C = B - A;
    return D / C; // division by zero
  }

And suppose we want to track `A` and `B` so that to explain why we think that `C` is zero. What are pros and cons of:
(1) tracking `A` and `B` within the same tracker that already helped us reach `C`?
(2) terminating the current tracker as it reaches `C` and spawning two separate trackers for `A` and `B`?

>> - The termination condition is as usual: no further visitors attached.
>>   - It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.
>
> Actually, tracker dies when this happens.  But hey, can you provide a use case when the checker really needs to know when this happens?

Inlined defensive check suppressions are a major example. Basically null dereference warnings are suppressed based on properties of the program point at which the tracker terminates. If we know when this happens, we can refactor inlined defensive check suppressions into a checker specific handler.

>> - However, instead of adding notes directly, they simply invoke `track()`. All note-attaching work is moved into additional default handlers.
>
> No, they call `track` where they used to call `trackExpressionValue` or add `FindLastStoreBRVisitor`.
> Right now, only stores received their special customization and can produce notes based on tracker customization.

What's the ultimate plan though, do we want to fully move all notes into handlers?

>> - Let's consider an example. Suppose you're implementing `evalCall` for `llvm::dyn_cast`. You could have your expression handler detect a modeled `dyn_cast` call and implement its `handle()` method to emit a checker-specific note and start tracking the argument.
>
> So, in here it would look like this: If I want to support `dyn_cast` for tracking, I will add extra `ExpressionHandler` that calls `track` on the expression inside of `llvm::dyn_cast`.  Additionally, if you want to change the note's message for situations like `a = dyn_cast<Ty>(b)`, you can add a `StoreHandler` that will produce a note like "a is casted from b".
>
>> - With this implementation such note will only be emitted if the value is tracked back to `dyn_cast`. Not every modeled `dyn_cast` will receive a note but only the one that, say, caused a null dereference on its return value when it failed.
>
> Correct.
>
>> - Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.
>
> I mean, it's just an orthogonal mechanism.
>
>> - It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was `dyn_cast` modeled during analysis, so it has to reverse-engineer what happened.
>
> Correct, but during the analysis, checker could've modeled a zillion of different `dyn_casts`, and we don't want to have notes on all of them.
>
>> - For example, in order to find out whether `dyn_cast` succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
>> - Whereas a note tag would have this information accessible directly.
>
> Maybe we can access the note tag from the handler and kill two birds with one stone?

If we can, one way or another, I believe we absolutely should kill two birds with one stone. We shouldn't have two mutually exclusive solutions for two aspects of the same problem. In order to produce good notes, it's necessary to *both* know what happened during analysis *and* know whether the value is tracked.

>> - Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to //ask// the tracker whether `track()` was invoked at this node on a certain expression or region?
>
> There are still situations when you don't have tags, but want to hook into the tracking process, so you can add your custom feature.  Let's say you modeled a system call `foo(a, b)` and you want to track `a` and `b`.  Or more generally if you want to help tracking when it's stuck.

Yes, so like in my `a = dyn_cast<...>(b)` example the proposed solution would be like:

  if (isTracked(a)) {
    emitNote();
    track(b);
  }

Such code can be put into either a visitor or a note tag depending on what you have. It is entirely equivalent to the current prevalent idiom

  if (isInteresting(a)) {
    emitNote();
    track(b);
  }

except it's asking the right question, with your facility allowing to answer it. This approach is very easy to understand for checker developers, basically adds no extra code (doesn't require people to inherit classes).

> Additionally, handlers help to tear existing `trackExpressionValue` implementation apart.
>
>> - I.e., polling instead of callbacks.
>> - This would naturally integrate with note tags.
>
> My counter-suggestion is to create a flag on `NoteTag` (maybe `Prunable` already can fit into that, I'm not very familiar with this interface) that means "this tag should appear only when part of the tracking trace", and make a handler-visitor pair that will produce notes with messages in those tags.

I think this works too. I like the idea that like note tags are for //execution paths//, we could make "track tags" for //value paths// (or, well, "value tracks"? - we should really use this concept more often, it's very effective; this patchset separates these two notions of a path that we were previously confusing).

It sounds a bit more restrictive on what kinds of conditions are allowed. For example, if we have `auto [a, b] = foo(c)`, we may decide to track `c` either when both `a` and `b` are tracked, or when only one of them is tracked; I think both approaches could be reasonable.

In particular, a simple flag (a-la `Prunable`) is not enough. There needs to be a way to communicate *which* value in the state is of interest. The data structure describing such conditions functionally may potentially grow pretty large and this may be a reason to go imperative but I don't think we'll actually run into this problem soon.

> IMO, it will prevent a ton of checkers repeating very similar code: make visitor that will check if some expression is tracked and generate a note out of tag

Yes, absolutely, any solution must eliminate the need for such visitors.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103605/new/

https://reviews.llvm.org/D103605