[PATCH] D130306: [clang][dataflow] Analyze calls to in-TU functions

Mon Jul 25 08:06:26 PDT 2022

samestep added a comment.

In D130306#3673120 <https://reviews.llvm.org/D130306#3673120>, @xazax.hun wrote:

> Thanks! I think this is a major design decision, and I'd love to see a discussion about the alternatives considered before jumping to an implementation. Specifically, I'd like to know if summaries are out of scope or something you would consider in the future. Knowing this is useful because this can influence how the code should be reviewed. E.g., if you plan to have multiple ways to do context sensitive analysis in the future, we should make sure that the current code will not lock us in in the inline substitution approach. If summaries are not planned at all, this is not a concern.

This is a good question. I shared with you our design doc, which may help clarify somewhat; please let me know if you have further concerns.

> The Clang Static Analyzer is using this approach and it was a long way to iron out all the corner cases where the performance was bad. It has many cuts including maximum number of visits for a basic block, maximum call stack depth, not inlining functions after a certain size threshold and so on. Basically, it took some time to get the right performance and precision. But overall, the inline substitution approach will never scale to large call stacks/long calling contexts. On the other hand, a summary-based approach can potentially find bugs across a really large number of function calls with reasonable costs. Mainly because the same function is not reanalyzed for every context.

That makes a lot of sense. From what you're saying, it sounds like we'll avoid that in our plan by keeping contexts small due to only context-sensitively analyzing simple models that we write ourselves.

> I see. This was not clear to me from the description of the patch notes. It seems to me that the goal here is to simplify the modeling of certain types, not general context-sensitive analysis. I reviewed this patch with the wrong idea in mind. If that is the goal, the current approach makes sense. But I think the comments should make clear what the intended use case and the limitations of the current approach is.

Fair point. Hopefully the intended use case will become more clear in the code itself as I follow up with further patches; if not then I can modify the comments to clarify that. Are there any specific things you'd like to see written comments in this patch itself before landing?

> The main reason I wanted to call this out because it increasingly seems to be whenever a decision needs to be made, the framework is getting less and less sound. Basically, many design decisions here are very similar to what the Clang Static Analyzer is doing.  Since Clang already has an analysis framework for unsound analysis, I wanted to avoid you reinventing the wheel and doing something similar to CSA instead of providing something new for the use cases that cannot be covered by the CSA.

This makes sense; in this case, though, the unsoundness is already present (this patch does nothing to change the way we analyze calls to functions for which we can't see the body), so if anything, unsoundness is reduced here. I'll let @ymandel respond to this in more detail, though.

> Oh, my bad! I somehow blinked and totally skipped that comment.

No worries! This patch is a bit large so it's easy to miss. Thanks for taking the time to review in such detail!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130306/new/

https://reviews.llvm.org/D130306