[cfe-dev] [analyzer][tooling] Architectural questions about Clang and ClangTooling

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Apr 28 10:38:38 PDT 2020

On 4/28/20 6:23 PM, David Blaikie wrote:
> On Tue, Apr 28, 2020 at 3:09 AM Artem Dergachev via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>     Hey!
>     1. I'm glad that we're finally trying to avoid dumping PCH-s on disk!
>     2. As far as I understand, dependencies are mostly about Clang binary
>     size. I don't know for sure but that's what I had to consider when
>     I was
>     adding libASTMatchers into the Clang binary a few years ago.
>     3. I strongly disagree that JSON compilation database is "just
>     right for
>     this purpose". I don't mind having explicit improved support for
>     it but
>     I would definitely prefer not to hardcode it as the only possible
>     option. Compilation databases are very limited and we cannot drop
>     projects or entire build systems simply because they can't be
>     represented accurately via a compilation database. So I believe that
>     this is not the right solution for CTU in particular. Instead, an
>     external tool like scan-build should be guiding CTU analysis and
>     coordinate the work of different Clang instances so that to abstract
>     Clang away from the build system.
> What functionality do you picture the scan-build-like tool having that 
> couldn't be supported if that tool instead built a compilation 
> database & the CTU/CSA was powered by the database? (that would 
> separate concerns: build command discovery from execution, and make 
> scan-build-like tool more general purpose, rather than specific only 
> to the CSA)

Here are a few examples (please let me know if i'm unaware of the latest 
developments in the area of compilation databases!)

- Suppose the project uses precompiled headers. In order to analyze a 
file that includes a pch, we need to first rebuild the pch with the 
clang that's used for analysis, and only then try to analyze the file. 
This introduces a notion of dependency between compilation database 
entries; unless entries are ordered in their original compilation order 
and we're analyzing with -j1, race conditions will inevitably cause us 
to occasionally fail to find the pch. I didn't try to figure out what 
happens when modules are used, but i suspect it's worse. But if analysis 
is conducted alongside compilation and the build system waits for the 
analysis to finish like it waits for compilation to finish before 
compiling dependent translation units, race conditions are eliminated. 
This is how scan-build currently works: it substitutes the compiler with 
a fake compiler that both invokes the original compiler and clang for 
analysis. Of course, cross-translation-unit analysis won't be conducted 
in parallel with compilation; it's multi-pass by design. The problem is 
the same though: it should compile pch files first but there's no notion 
of "compile this first" in an unstructured compilation database.

- Suppose the project builds the same translation unit multiple times, 
say with different flags, say for different architectures. When we're 
trying to lookup such file in the compilation database, how do we figure 
out which instance do we take? If we are to ever solve this problem, we 
have to introduce a notion of a "shipped binary" (an ultimate linking 
target) in the compilation database and perform cross-translation-unit 
analysis of one shipped binary at a time.

- There is a variety of hacks that people can introduce in their 
projects if they add arbitrary scripts to their build system. For 
instance, they can mutate contents of an autogenerated header in the 
middle of the build. We can always say "Well, you shouldn't do that", 
but people will do that anyway. This makes me believe that no purely 
declarative compilation database format will ever be able to handle such 
Turing-complete hacks and there's no other way to integrate analysis 
into build perfectly other than by letting the build system guide the 

I'm also all for separation of concerns and I don't think any of this is 
specific to our static analysis.

>     On 4/28/20 11:31 AM, Endre Fülöp via cfe-dev wrote:
>     >
>     > Hi!
>     >
>     > Question:
>     >
>     > Why is the dependency on ClangTooling ill-advised inside ClangSA
>     (also
>     > meaning the Clang binary) itself ?
>     >
>     > Context:
>     >
>     > Currently I am working on an alternative way to import external TU
>     > AST-s during analysis ( https://reviews.llvm.org/D75665 ).
>     >
>     > In order to produce AST-s, I use a compilation database to
>     extract the
>     > necessary flags, and finally use ClangTool::buildAST.
>     >
>     > I am aware that I have other options for this as well (like
>     manually
>     > coding the compdb handling for my specific case for the
>     >
>     > first step, and maybe even dumping ASTs as pch-s into an in-memory
>     > buffer), but still consuming JSONCompilationDatabase
>     >
>     > is just too convenient. I would not want to introduce another
>     format
>     > when compilation database is just right for this purpose.
>     >
>     > Elaboration:
>     >
>     > While I understand that introducing dependencies has its downsides,
>     > but not being able to reuse code from Tooling is also not ideal.
>     >
>     > I would very much like to be enlightened by someone more
>     familiar with
>     > architectural decision already made why this is the case,
>     >
>     > and optionally how I could proceed with my efforts so that I can
>     come
>     > up with the most fitting solution i.e. not a hack.
>     >
>     > Thanks,
>     >
>     > Endre Fülöp
>     >
>     >
>     > _______________________________________________
>     > cfe-dev mailing list
>     > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     _______________________________________________
>     cfe-dev mailing list
>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

More information about the cfe-dev mailing list