[cfe-dev] [RFC] A dataflow analysis framework for Clang AST

Wed Oct 27 15:30:07 PDT 2021

One more question. You say that you have a generic machinery for things 
like heap and constraint solving. You also say that you plan your 
machinery to be fast enough to be used in compiler warnings. Do you 
think you can achieve both *simultaneously*? Or is your plan to make it 
highly modular so that it was possible to opt out of expensive modeling 
when a specific warning doesn't need it?

On 10/19/21 10:47 AM, Dmitri Gribenko wrote:
> Hi Gábor,
>
> On Mon, Oct 18, 2021 at 9:04 PM Gábor Horváth <xazax.hun at gmail.com> wrote:
>>> This is an open question. Already, users can model the effects of particular function calls. They would add a case for such calls in their transfer function, where they would update the environment accordingly. The full API for manipulating the environment is available to the user. But, we expect there's more room for supporting this kind of domain-specific semantics, either a simpler API or, likely, a richer one.
>> One of the key learnings for the static analyzer was that we want a strict separation between checks that issue warnings and "checks" that model behavior.
> Yes, I'm aware :) I agree that the right direction here would be to
> separate the warning and modelling plugins, but probably we should
> take it one step further. Running multiple warnings in a shared at the
> same time seems to be the root of the problem. So I think that maybe
> each warning should configure the modeling (and unsoundness
> assumptions in the core) according to its own needs.
>
>> The store is likely to change very little in each of the transitions. The static analyzer tries to leverage this fact by using immutable data structures, so we can share the common parts between states. This can help reduce memory consumption. And the analyzer favors significantly smaller memory use to slightly better performance because this lets users leverage parallelism across TUs better with the same amount of memory. I was wondering whether you considered using immutable data structures (like llvm/ADT/ImmutableMap.h) or is this just a first version?
> It is just the first version, we haven't looked at optimization
> opportunities deeply. However, when we evaluate the transfer functions
> within a basic block, we keep mutating the same environment object
> while applying transfer functions. We only keep one copy of the
> environment per basic block, as we must do that to check fixpoint
> convergence.
>
>> Also, how will this model handle expressions that evaluate to rvalues? Do you have locations for rvalues?
> We have storage locations for rvalues. It is correct for xvalues, but
> it is wrong for prvalues.
>
>> I am perfectly comfortable with unsound behavior by default, there is a whole movement behind this approach: Soundiness Home Page :)
> I'm also OK with it, but I think that different analyses should feel
> free to choose the unsoundness they want, and avoid unsoundness that
> hurts them.
>
>>> The environment is of course subject to the join operation. To join distinct Value objects, we have a built-in rule for booleans (we build a new Value that is constrained to be a disjunction of the values being joined), and we allow the user-defined checker to handle joining values of domain-specific types. Expanding the built-in rules to cover more types is desirable, but would likely create a hard dependency on an SMT (not just a SAT) solver.
>> When a pointer point to two different locations in two different branches do you join them to a new location and add constraints just like you do for booleans?
> That's what we would do if we would use an SMT solver.
>
> Currently we drop such a pointer from the environment in the join
> operation, which in dataflow terms means that we assign the bottom
> element, which is not correct, we should be using top, but we don't
> have that implemented in our model for pointer values.
>
>> Agreed. I suspect the proper solution would involve many non-trivial steps including escape analysis.
> Yes. And I would very much want to see what ideas from separation
> logic could be transferred into a dataflow setting.
>
> Dmitri
>