[clang] [NFC][analyzer] Remove StmtNodeBuilder (PR #181431)
DonĂ¡t Nagy via cfe-commits
cfe-commits at lists.llvm.org
Sun Feb 15 13:58:34 PST 2026
NagyDonat wrote:
> I'm gonna be honest with you that my approach was so far to find something similar to what I want and copy paste it. Then carefully check the exploded graph to see if it's correct.
After surveying code that uses `NodeBuilder`s my impression is that probably everyone followed this approach (or at least the first part, I wouldn't bet on everyone "carefully check"ing. There are several code fragments that are completely non-functional (e.g. add a node to a set, then immediately remove it), but as they do not cause any problems, they preserved and copied around.
This approach has the advantage that it actually gets things done (instead of digging deep in rabbit holes and unearthing the skeleton armies buried in our codebase), but I fear that it doesn't cover the "rare" cases.
In fact, even before I read this comment, I had the feeling that these part of the engine are "fragile" in the sense that they have lots of quirks that are usually irrelevant, but can differ from the "natural" behavior in some rare corner case. (E.g. adding a node to a set then removing it is usually a no-op, but if the node was somehow already present in the set, then it gets removed, which may be an error.)
My intuition is that anything that uses `NodeBuilder`s has a chance to behave incorrectly on exceptional input:
- encountering a sink node from a checker (on callbacks where that is rare),
- encountering a "node already exists" situation (cache out, node creation returns `nullptr`),
- encountering a posteriorly overconstrained state (probably the most exceptional case).
This is my primary motivation for cleaning up these parts of the codebase. Our data model is full of rare-but-can-happen-anywhere exceptional things (e.g. Undefined/Unknown values, sinks, caching out, posteriorly overconstrained states, reaching various budgets and complexity limits etc.) so I feel that it is infeasible to cover all corner cases experimentally with tests -- and instead I'm trying to ground the correctness of the code in theoretical understanding, invariants and readability. (Of course, we should _also_ have tests in addition to theory, but I feel that it is not enough to test the project as a black box full of skeletons.)
> Objectively, the engineers were really smart and had so many abstraction layers over time, and only few of them made long term sense.
I feel that code written around 2010-12 (IIUC the early days of the analyzer) often contains overeager pretentious abstractions and generalizations that are challenging the readers ("are you as smart as me?") but provide no practical benefits. I don't know whether these are products of hubris ("I'm SO smart"), written with well-meaning naivety ("generic solutions will help development in the future") or just ruins of an ancient age -- maybe they _had_ some practical benefits in the past which was lost during some refactoring.
I also admit that there were some old abstractions that appeared to be useless at first, but since then I realized that they have some relevant (and sometimes even important) role.
In fact, since I wrote my complaints about the overcomplicated management of the `Frontier` (in `NodeBuilder`s), I realized that this is probably relevant _during the evaluation of checker callbacks_ (when `CheckerContext::addTransition` calls can and do perform arbitrarily complex transition sequences and it is important to continue with the _frontier_ after the checker callback).
My current plan is that I would move this `Frontier` management to `CheckerContext` -- because it is irrelevant and confusing in the dozens of other throwaway `NodeBuilder`s that are never used for checker execution.
> I don't mind and I actually want to encourage anyone simplifying things. One way is to try to get rid of something and if it was easy, then chances are that the "thing" wasn't really needed after all.
I like where this is going.
I hope that I will be able to meaningfully simplify the codebase :smile:
However it is likely that I won't do this immediately. During the next days I'll try to make `switch` statements trigger the `BranchCondition` callback (I have an unpublished almost-ready commit which should work correctly now that `SwitchNodeBuilder`s are `NodeBuilder`s) and then merge optin.core.UnconditionalVAArg (https://github.com/llvm/llvm-project/pull/175602, which would work correctly once `switch` triggers the `BranchCondition` callback). After these are done, I'll switch to working on `security.ArrayBound` and generalizing its logic for use in other checkers (like e.g. `ReturnPtrRange`), and I will work on this `NodeBuilder` removal only after that (or within "breaks" when I'm blocked on the bounds checking front).
https://github.com/llvm/llvm-project/pull/181431
More information about the cfe-commits
mailing list