[PATCH] D130306: [clang][dataflow] Analyze calls to in-TU functions

Fri Jul 22 14:34:35 PDT 2022

xazax.hun added a comment.

In D130306#3671852 <https://reviews.llvm.org/D130306#3671852>, @samestep wrote:

> In D130306#3670259 <https://reviews.llvm.org/D130306#3670259>, @xazax.hun wrote:
>
>> 
>
> Yes, we considered a summary-based approach, but we decided not to use it because (as you mentioned) it would be much more difficult to implement, especially for callees with nontrivial CFGs, which would result in a nontrivial flow condition instead of just `Value`s in the `Environment`.

Thanks! I think this is a major design decision, and I'd love to see a discussion about the alternatives considered before jumping to an implementation. Specifically, I'd like to know if summaries are out of scope or something you would consider in the future. Knowing this is useful because this can influence how the code should be reviewed. E.g., if you plan to have multiple ways to do context sensitive analysis in the future, we should make sure that the current code will not lock us in in the inline substitution approach. If summaries are not planned at all, this is not a concern.

> Could you elaborate on what specific scalability problems you are concerned about?

The Clang Static Analyzer is using this approach and it was a long way to iron out all the corner cases where the performance was bad. It has many cuts including maximum number of visits for a basic block, maximum call stack depth, not inlining functions after a certain size threshold and so on. Basically, it took some time to get the right performance and precision. But overall, the inline substitution approach will never scale to large call stacks/long calling contexts. On the other hand, a summary-based approach can potentially find bugs across a really large number of function calls with reasonable costs. Mainly because the same function is not reanalyzed for every context.

> To clarify, the main reason we want context-sensitive analysis is to allow us to simplify our definitions of some models, such as the `optional` checker model. The goal is to provide the analysis a mock implementation of an `optional` type, and then use context-sensitive analysis (probably just one or two layers deep) to model the constructors and methods.

Thanks, this is also useful context!

>> Some other related questions:
>>
>> - Why call noop analysis?
>
> The alternative in this case would be to use the same analysis for the callee that is already being used for the caller? I agree that would be nicer to serve a broader category of use cases. I didn't do that in this patch for a couple reasons:
>
> 1. In the short term, that would require threading more information through to the builtin transfer function, while this patch was meant to just be a minimum viable product.
> 2. In the longer term, we probably don't need that for our specific goals (just modeling simple fields of mock classes) mentioned above.

I see. This was not clear to me from the description of the patch notes. It seems to me that the goal here is to simplify the modeling of certain types, not general context-sensitive analysis. I reviewed this patch with the wrong idea in mind. If that is the goal, the current approach makes sense. But I think the comments should make clear what the intended use case and the limitations of the current approach is.

>> - Currently, it looks like the framework assumes functions that cannot be inlined are not doing anything. This is an unsound assumption, and I wonder if we should change that before we try to settle on the best values for the tunable parameters.
>
> Agreed, this assumption is unsound. However, the framework already makes many other unsound assumptions in a similar spirit, so this one doesn't immediately strike me as one that needs to change. I'll defer to people with more context.

The main reason I wanted to call this out because it increasingly seems to be whenever a decision needs to be made, the framework is getting less and less sound. Basically, many design decisions here are very similar to what the Clang Static Analyzer is doing. Basically, since Clang already has an analysis framework for unsound analysis, I wanted to avoid you reinventing the wheel and doing something similar to CSA instead of providing something new for the use cases that cannot be covered by the CSA.

>> - I think the environment currently is not really up to context-sensitive evaluation. Consider a recursive function:
>
> Yes, I mentioned this in the patch description and in a comment in the `pushCall` implementation: we plan to do this by modifying the set of fields in the `DataflowAnalysisContext` class. I plan to do this in my next patch, once this one is landed.

Oh, my bad! I somehow blinked and totally skipped that comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130306/new/

https://reviews.llvm.org/D130306