[cfe-dev] [analyzer][tooling] Analyzer architecture

Wed Apr 29 02:56:28 PDT 2020

It's worth carefully thinking about your design goals for this system.
Particularly how much you value:
 - predictability (isolation and debugging)
 - efficiency (e.g. in terms of total CPU usage)
 - scalability (often in tension with efficiency)

We've had some good experience with a mapreduce approach for cross-TU
analysis, for dead-code analysis etc.
The idea is your analysis is composed of pure functions that run on a
single TU.
e.g. for inline-function, this would be:
 1. [Prepare] analyze the TU containing the target function, this is a
function (input spec, TU AST) -> function AST
 2. [Map] analyze every TU to find occurrences and compute edits, this is a
function (TU AST, function AST) -> [(file, edit)]
 3. [Reduce] group by file and reconcile edits, this is a function (file,
[edit]) -> edit

It trades off a bit of efficiency to be highly predictable (pure functions
are easy to test, intermediate states can be saved for analysis, bugs are
easily localizable to TUs) and scalable.
It does require your intermediate data to be serializable, but distributing
over a network server does too. Having the "framework part" not be too
opinionated about the form of this data gives some useful flexibility.

Compared to this, your ASTServer seems to sacrifice scalability and
predictability for efficiency if I'm understanding it correctly, it's worth
carefully considering whether this is the right tradeoff (e.g. it only
makes sense if your analyses are often slow enough to be worth squeezing
this efficiency out of, but fast enough that they don't need to be
seriously distributed).

The Tooling libraries have fair support for Map steps, but none for Reduce
and nothing very useful for stringing steps together. It's possible to bolt
this stuff on but I regret that we haven't added it.

On Wed, Apr 29, 2020 at 10:15 AM Endre Fülöp <Endre.Fulop at sigmatechnology.se>
wrote:

> Hi!
>
>
>
> In order to not overburden the previous discussion about Analyzer and
> Tooling, I would like to ask you opinions on a related but slightly
> orthogonal matter.
>
> Gabor and I had a brainstorming session about the issues CTU analysis and
> compilation command handling (previous topic) brought up recently.
>
> Note that these points are to be regarded as cursory expeditions into the
> hypothetical (at best).
>
>
>
> The train of thought regarding CTU analysis had the following outline:
>
>    - We need a tool that gets a `FunctionDecl` (the function which we
>    would like to inline) and returns with an AST to its TU.
>       - the fitting abstraction level of the result seems to be the TU
>       level
>       - `externalDefMapping.txt` is just an implementation detail,
>       actually we don't need that.
>    - Let's call this tool `*ASTServer*`.
>    - ASTServer has some resemblance to `clangd`.
>       - Works on the whole project
>       - Uses compilation DB
>       - Persists already parsed ASTs in its memory (up to a limit)
>          - (Cache eviction strategies? LRU?)
>       - The AST would be returned on a socket and in a serialized form
>    (ASTReader/Writer).
>       - could also work over the network, promoting distribution
>    - We need another tool: `*clang-analyzer*` !!!
>       - Actually we should have done this earlier
>       - Utilizes clang for analysis purposes
>       - Handles comm with `ASTServer`
>          - Caches ASTs from the server
>       - external orchestrator tool CodeChecker tool would launch
>    ASTServer and then would call clang-analyzer tool for each TU, thus
>    conducting the analysis.
>
> The reasoning behind the separation:
>
> The analyzer is a complex subsystem of Clang. The valid concern of clang
> binary growing out of proportion, and the increasing need for
>
> tooling dependencies surfacing due to CTU analysis indicate the need
> reorganizing facilities.
>
> The point is further backed by the argument that a complex functionality
> of interprocess communication (over sockets in our example)
>
> is even less desirable inside the clang binary than binary size bloat.
>
> Also the complexity of the whole solution could be distributed, and
> concerns of build system management, build configuration formats
>
> can be separated from the analyzer itself (but allows for a wide variety
> of build-system vs analysis cooperation schemes to be implemented).
>
>
>
> Again, the scope of these ideas is not trivial to assess, and would
> probably require a considerable amount of effort,
>
> but I hope an open discussion would outline a solution that benefits the
> structure of the whole project.
>
>
>
> Cheers,
>
> Endre
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200429/fcb8a8d8/attachment.html>