[PATCH] D58367: [analyzer] NFC: Improve upon the concept of BugReporterVisitor.

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 20 18:21:49 PST 2019


NoQ added a comment.

Thx!~

In D58368#1404747 <https://reviews.llvm.org/D58368#1404747>, @Charusso wrote:

> This is a cool approach, but it is difficult to use this API in other checkers. If you do not out-chain D58367 <https://reviews.llvm.org/D58367> I would like to see something like that here:
>
>   SmallString<64> Str;
>   llvm::raw_svector_ostream OS(Str);
>   OS << "Deallocating object passed through parameter '" << PVD->getName()
>      << '\'';
>  
>   C.addNote(C.getState()->set<ReleasedParameter>(true), OS.str());
>


The idea to //allow// passing a raw std::string instead of a lambda that generates a string sounds like a great improvement, i'll do this.

I don't think i want to //disallow// lambdas entirely, because

- Some dynamic logic is definitely often necessary - eg., the MallocChecker example above. Essentially, in most cases we don't know what message should we produce (or even whether we should produce it at all) until the bug report is actually emitted. This specific checker is very lucky in that regard: every transition produced by this checker on the execution path does deserve a message, and this message doesn't depend on the bug report in any manner, but only on the event that caused it to appear.
- I want to make the API flexible enough to avoid performance problems. Generating the string is usually cheap, but transitions are made much more often than warnings are issued (simply because most code doesn't cause us to produce a bug), so producing the message every time we make a transition is a much higher multiplier for the cost of generating the string than producing the message every time we make a transition-that-leads-to-a-bug.

----

Adding a comfy wrapper around `addTransition` that constructs the tag automatically is also a great idea, i'll do this, but i'd like to delay this a little bit because, well, `addTransition` API is generally a large source of pain for checker developers and could use a rework in order to make it more understandable.

In particular, the implementation you propose does a lot more than adding a note: it also adds, well, a transition. Namely, the reader would most likely expect the following code

  State1 = State->set<Trait1>(Value1);
  C.addNote(State1, "Message 1");
  State2 = State1->set<Trait2>(Value2);
  C.addNote(State2, "Message 2");

to make two updates to the state and display a note for each of them:

  /-------------\
  | Predecessor |
  | State       |
  \-------------/
        ||
        \/
  /-------------\
  | Node1       |
  | State1      |
  | "Message 1" |
  \-------------/
        ||
        \/
  /-------------\
  | Node2       |
  | State2      |
  | "Message 2" |
  \-------------/

The actual behavior, however, would be to split the execution path in two parallel paths, analyze them separately later, and mark each of them with the respective message:

           /-------------\
           | Predecessor |
           | State       |
           \-------------/
             ||       ||
             \/       \/
  /-------------\   /-------------\
  | Node1       |   | Node2       |
  | State1      |   | State2      |
  | "Message 1" |   | "Message 2" |
  \-------------/   \-------------/

We already have this problem with `generateNonFatalErrorNode()`: even though it explicitly states that it will generate a node, it's very unobvious that generating two error nodes not only allows you to report two bugs, but also causes an accidental state split. Additionally, such accidental state splits are hard to notice because most of the time it doesn't cause any visible differences apart from a 2x slowdown of the remaining analysis on that execution path. In case of `addNote()` it should be a bit better because you'd be able to notice that one of the state updates isn't taking place, leading to false positives, but in case of `generateNonFatalErrorNode()` there are often no state updates being made (which is bad on its own, but for a separate reason).

I do have one idea on how to make this less confusing, but it's a bigger piece of work.

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> we do not care that much about the core `BugReporterVisitors` as they do their job enough well.


If only that was so :)

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> So with that, I would out-chain this patch from the MIG-patches because it is a lot more serious problem-set. I also would like to ask you to take it more serious, as all your mentioned benefits will rely on this simple and cool approach, so it has to be flexible as possible.


I've been thinking for about a month about this until now, and i'm actually very surprised that i see no obvious flexibility issues here. Stateful visitors (eg., the ones that only highlight the *last* interesting event) can be easily implemented by keeping via lambdas as a private state data in the bug reporter. Mutually recursive systems of multiple visitors that add each other dynamically during the visit (such as the `trackExpressionValue` that's infamous for that exact reason) should be (and most likely could be) untangled anyway, and once they're untangled (eg., by keeping just one instance of the visitor while dynamically updating its tracking information), the flexibility issue disappears; i'm almost happy that it would no longer be possible to entangle the code that way. Dunno, this is weird - usually i quickly come up with examples of why something wouldn't work and decide to implement it anyway, but this time i'm surprisingly secure about implementing it.

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> I am not into the internal stuff that much, so I cannot argue about the lack of `CheckerContext` functions, but I think this is a huge problem and breaks the flexibility a lot.


The issue here is that the checker doesn't always have an ability to mutate the graph by adding nodes. Now it becomes more of a problem because the checker may want to emit notes more often than it wants to generate nodes. This problem can be easily mitigated by passing a different sort of context (not `CheckerContext` but something else) into these callbacks that will store lambdas in a different manner. It slightly sucks that we'd have to come up with a duplicate API for that, but that's not something we need to solve immediately - i believe that we can always add these later.

Another way to describe this limitation is that "lambda visitors" are only capable of adding notes to nodes that were produced by the checker itself. Old-style visitors, on the other hand, can drop notes in arbitrary places - only once per node, but it's still something. Lambdas, of course, also produce at most one message per node, but for lambdas it's not a problem because the checker can make as many nodes as it wants. Whereas an arbitrary visitor cannot retroactively provide itself with more nodes if they weren't constructed artificially ahead of time. I think this is not a big flexibility issue on its own because it should be possible to make lambdas from multiple sources cooperate with each other by sharing a certain amount of arbitrary state information within the BugReport object.

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> `getNoteTag()`: you could differentiate a setter with the true getter what you only use in `TagVisitor::VisitNode()` (where you truly hook your `Callback` to get extra information later on):
>
>   const NoteTag *setNoteTag(StringRef Message) {
>     return Eng.getNoteTags().setNoteTag(Message);
>   }
>   


Mmm, i don't think i understand this part. Like, `getNoteTag()` is not really a getter. It's a factory method on an allocator that causes it to produce an object. What is the semantics of a setter that would correspond to it?

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> - Documentation: if we are about the rewrite the entire bug-reporting business, it worth some long doxygen comments.


Yup, totally.

Like, we're not really rewriting anything yet, just trying out a new approach. I'll definitely not jump into rewriting everything until the new approach gets tested a lot and turns out to actually work.

One important thing to document is the lifetime of these lambdas (equal to ExprEngine's lifetime, which is a fairly popular lifetime to have in the Analyzer). Captures would need to live that long and can probably be cleaned up at top-level `checkBeginFunction()` if they are memory-managed manually.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58367/new/

https://reviews.llvm.org/D58367





More information about the cfe-commits mailing list