[PATCH] D82561: [analyzer][CrossTU] Lower CTUImportThreshold default value

Fri Jun 26 02:41:06 PDT 2020

xazax.hun added a comment.

In D82561#2116194 <https://reviews.llvm.org/D82561#2116194>, @gamesh411 wrote:

> In D82561#2116091 <https://reviews.llvm.org/D82561#2116091>, @balazske wrote:
>
> > That means perform a get CTU definition if the TU to be imported (where the function comes from) is small? Otherwise it does not matter how small the function is, it can result in importing of large amount of code. Determining parameters (like "smallness") of the TU is probably not simple.
>
>
> Measuring the smallness of a function is currently not trivial if the function is in another TU. But theoretically, the creation process of the CTU index has access to the number of declarations in a function, which could be used as a metric for complexity.

The number of declarations is not a good proxy for complexity. 
We talk about 3 calculations here:

1. Loading the TU from disk (which might involve parsing in case the analysis is on demand)
2. Merging the AST sub"trees"
3. Actual inline analysis

1-2 Is cached and the complexity of 2 is largely determined by the number of dependencies in a declaration. I suspect that the number of declarations in a function is not a good estimate of the number of unmerged dependencies (classes and so on).

For 3, we have the CFG node size as a complexity estimate.

The question is how the processing time is split between all these phases.

> Another concern is memory usage. Currently I am working on a solution to approximate the memory usage of declarations by measuring the total memory usage, and the number of declarations inside a TU. Then I can come up with average values for memory used per declaration for projects I test. I could measure multiple projects per language and come up with some statistics that way, but I'm sure that I am not alone with the feeling that there should be a better metric for memory usage other than this roundabout method. All would be well if there were a way to get `sizeof(astunit)`, but I am not aware of any.

Yeah, loading a big TU into the memory just for a single small function might not be the best to do. However, when we dramatically reducing the CTU threshold, the analysis will be more unstable in the sense that adding a new function call will change the order in which we analyze functions. That will change the order of CTU imports. That will change the set of TUs that will be imported. It might not be a problem as the analyzer already can exhibit this behavior: a small code change can make unrelated diagnostics appear or disappear. But a small CTU threshold will likely make this phenomenon worse.

Just to be clear I am on board with this patch, I was merely brainstorming for future improvements :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82561/new/

https://reviews.llvm.org/D82561