[PATCH] D130306: [clang][dataflow] Analyze calls to in-TU functions

Fri Jul 22 07:51:02 PDT 2022

samestep added a comment.

In D130306#3670259 <https://reviews.llvm.org/D130306#3670259>, @xazax.hun wrote:

> There are many ways to introduce context sensitivity into the framework, this patch seems to take the "inline substitution" approach, the same approach the Clang Static Analyzer is taking. While this approach is relatively easy to implement and has great precision, it also has some scalability problems. Did you also consider a summary-based approach? In general, I believe the inline substitution approach results in an easier to use interface for the users of the framework, but I am a bit concerned about the scalability problems.

Good point, thanks! Yes, we considered a summary-based approach, but we decided not to use it because (as you mentioned) it would be much more difficult to implement, especially for callees with nontrivial CFGs, which would result in a nontrivial flow condition instead of just `Value`s in the `Environment`.

Could you elaborate on what specific scalability problems you are concerned about? The main one that comes to mind for me is unpredictable cost due to the potential for arbitrary callee bodies to be present in the translation unit. While this particular patch doesn't address that concern, we definitely have plans to do so: I'm guessing that will take the form of providing the analysis an allowlist of symbols of which it is allowed to analyze the bodies, so it would treat any symbols not in that list as if their bodies are not available in the TU regardless.

To clarify, the main reason we want context-sensitive analysis is to allow us to simplify our definitions of some models, such as the `optional` checker model. The goal is to provide the analysis a mock implementation of an `optional` type, and then use context-sensitive analysis (probably just one or two layers deep) to model the constructors and methods.

> Some other related questions:
>
> - Why call noop analysis? As far as I understand, this would only update the environment but not the lattice of the current analysis, i.e., if the analysis is computing some information like liveness, that information would not be context sensitive. Do I miss something?

The alternative in this case would be to use the same analysis for the callee that is already being used for the caller? I agree that would be nicer to serve a broader category of use cases. I didn't do that in this patch for a couple reasons:

1. In the short term, that would require threading more information through to the builtin transfer function, while this patch was meant to just be a minimum viable product.
2. In the longer term, we probably don't need that for our specific goals (just modeling simple fields of mock classes) mentioned above.

However, if you have a suggestion for a way to construct an instance of the outer analysis here, that would definitely be useful.

> - Why limit the call depth to 1? The patch mentions recursive functions. In case of the Clang Static Analyzer, the call depth is 4. I think if we go with the inline substitution approach, we want this parameter to be tunable, because different analyses might have different sweet spots for the call stack depth.

There's no particular reason for this. We plan to support more call stack depth soon. This would probably make sense as a field in the `TransferOptions` struct.

> - The CSA also has other tunables, e.g., small functions are always inlined and large functions are never inlined.

See my response earlier about an allowlist for symbols to inline.

> - Currently, it looks like the framework assumes functions that cannot be inlined are not doing anything. This is an unsound assumption, and I wonder if we should change that before we try to settle on the best values for the tunable parameters.

Agreed, this assumption is unsound. However, the framework already makes many other unsound assumptions in a similar spirit, so this one doesn't immediately strike me as one that needs to change. I'll defer to people with more context.

> - The current code might do a bit too much work. E.g. consider:
>
>   while (...) {
>     inlinableCall();
>   }
>
> As far as I understand, the current approach would start the analysis of `inlinableCall` from scratch in each iteration. I wonder if we actually want to preserve the state between the iterations, so we do not always reevaluate the call from scratch. Currently, it might not be a big deal as the fixed-point iteration part is disabled. But this could be a perf problem in the future, unless I miss something.

Agreed, this is somewhat wasteful. Note that not everything is thrown away, because the same `DataflowAnalysisContext` is reused when analyzing the callee. Still, we would like to handle this in a smarter way in the future, as you mention. For now, though, while just building up functionality behind a feature flag, we don't plan to worry as much about this particular performance concern.

> - I think the environment currently is not really up to context-sensitive evaluation. Consider a recursive function:
>
>   void f(int a) {
>     ++a;
>     if (a < 10)
>         f(a);
>   }
>
> Here, when the recursive call is evaluated, it is ambiguous what `a` refers to. Is it the argument of the caller or the callee? To solve this ambiguity, the calling context needs to be the part of the keys in the environment.

Yes, I mentioned this in the patch description and in a comment in the `pushCall` implementation: we plan to do this by modifying the set of fields in the `DataflowAnalysisContext` class. I plan to do this in my next patch, once this one is landed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130306/new/

https://reviews.llvm.org/D130306