[cfe-dev] [analyzer][tooling] Architectural questions about Clang and ClangTooling

Tue Apr 28 14:53:23 PDT 2020

(+Sam, who works on clang tooling)

On Tue, Apr 28, 2020 at 10:38 AM Artem Dergachev <noqnoqneo at gmail.com>
wrote:

> On 4/28/20 6:23 PM, David Blaikie wrote:
> > On Tue, Apr 28, 2020 at 3:09 AM Artem Dergachev via cfe-dev
> > <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> >
> >     Hey!
> >
> >     1. I'm glad that we're finally trying to avoid dumping PCH-s on disk!
> >
> >     2. As far as I understand, dependencies are mostly about Clang binary
> >     size. I don't know for sure but that's what I had to consider when
> >     I was
> >     adding libASTMatchers into the Clang binary a few years ago.
> >
> >     3. I strongly disagree that JSON compilation database is "just
> >     right for
> >     this purpose". I don't mind having explicit improved support for
> >     it but
> >     I would definitely prefer not to hardcode it as the only possible
> >     option. Compilation databases are very limited and we cannot drop
> >     projects or entire build systems simply because they can't be
> >     represented accurately via a compilation database. So I believe that
> >     this is not the right solution for CTU in particular. Instead, an
> >     external tool like scan-build should be guiding CTU analysis and
> >     coordinate the work of different Clang instances so that to abstract
> >     Clang away from the build system.
> >
> >
> > What functionality do you picture the scan-build-like tool having that
> > couldn't be supported if that tool instead built a compilation
> > database & the CTU/CSA was powered by the database? (that would
> > separate concerns: build command discovery from execution, and make
> > scan-build-like tool more general purpose, rather than specific only
> > to the CSA)
>
> Here are a few examples (please let me know if i'm unaware of the latest
> developments in the area of compilation databases!)
>
> - Suppose the project uses precompiled headers. In order to analyze a
> file that includes a pch, we need to first rebuild the pch with the
> clang that's used for analysis, and only then try to analyze the file.
> This introduces a notion of dependency between compilation database
> entries; unless entries are ordered in their original compilation order
> and we're analyzing with -j1, race conditions will inevitably cause us
> to occasionally fail to find the pch. I didn't try to figure out what
> happens when modules are used, but i suspect it's worse.

Google certainly uses clang tooling, with a custom compilation database on
a build that uses explicit modules - I believe the way that's done is to
ignore/strip the modules-related flags so the clang tooling uses
non-modules related compilation. But I could be wrong there. You could do
some analysis to see any inputs/outputs - or reusing the existing outputs
in the original build.

> But if analysis
> is conducted alongside compilation and the build system waits for the
> analysis to finish like it waits for compilation to finish before
> compiling dependent translation units, race conditions are eliminated.
> This is how scan-build currently works: it substitutes the compiler with
> a fake compiler that both invokes the original compiler and clang for
> analysis. Of course, cross-translation-unit analysis won't be conducted
> in parallel with compilation; it's multi-pass by design. The problem is
> the same though: it should compile pch files first but there's no notion
> of "compile this first" in an unstructured compilation database.
>
> - Suppose the project builds the same translation unit multiple times,
> say with different flags, say for different architectures. When we're
> trying to lookup such file in the compilation database, how do we figure
> out which instance do we take? If we are to ever solve this problem, we
> have to introduce a notion of a "shipped binary" (an ultimate linking
> target) in the compilation database and perform cross-translation-unit
> analysis of one shipped binary at a time.
>

I believe in that case the compilation database would include both
compilations of the file - and presumably for the static analyzer, it would
want to build all of them (or scan-build would have to have some logic for
filtering them out/deciding which one is the interesting one - same sort of
thing would have to be done on the compilation database)

> - There is a variety of hacks that people can introduce in their
> projects if they add arbitrary scripts to their build system. For
> instance, they can mutate contents of an autogenerated header in the
> middle of the build. We can always say "Well, you shouldn't do that",
> but people will do that anyway. This makes me believe that no purely
> declarative compilation database format will ever be able to handle such
> Turing-complete hacks and there's no other way to integrate analysis
> into build perfectly other than by letting the build system guide the
> analysis.
>

Yep, a mutating build where you need to observe the state before/after such
mutations, etc, not much you could do about it. (& how would CTU SA work in
that sort of case? You have to run the whole build multiple times?)

Clang tools are essentially static analysis - so it seems weird that we
have two different approaches to static analysis discovery/lookup in the
Clang project, but not wholely unacceptable, potentially different goals,
etc.

> I'm also all for separation of concerns and I don't think any of this is
> specific to our static analysis.
>
> >     On 4/28/20 11:31 AM, Endre Fülöp via cfe-dev wrote:
> >     >
> >     > Hi!
> >     >
> >     > Question:
> >     >
> >     > Why is the dependency on ClangTooling ill-advised inside ClangSA
> >     (also
> >     > meaning the Clang binary) itself ?
> >     >
> >     > Context:
> >     >
> >     > Currently I am working on an alternative way to import external TU
> >     > AST-s during analysis ( https://reviews.llvm.org/D75665 ).
> >     >
> >     > In order to produce AST-s, I use a compilation database to
> >     extract the
> >     > necessary flags, and finally use ClangTool::buildAST.
> >     >
> >     > I am aware that I have other options for this as well (like
> >     manually
> >     > coding the compdb handling for my specific case for the
> >     >
> >     > first step, and maybe even dumping ASTs as pch-s into an in-memory
> >     > buffer), but still consuming JSONCompilationDatabase
> >     >
> >     > is just too convenient. I would not want to introduce another
> >     format
> >     > when compilation database is just right for this purpose.
> >     >
> >     > Elaboration:
> >     >
> >     > While I understand that introducing dependencies has its downsides,
> >     > but not being able to reuse code from Tooling is also not ideal.
> >     >
> >     > I would very much like to be enlightened by someone more
> >     familiar with
> >     > architectural decision already made why this is the case,
> >     >
> >     > and optionally how I could proceed with my efforts so that I can
> >     come
> >     > up with the most fitting solution i.e. not a hack.
> >     >
> >     > Thanks,
> >     >
> >     > Endre Fülöp
> >     >
> >     >
> >     > _______________________________________________
> >     > cfe-dev mailing list
> >     > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> >     > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
> >     _______________________________________________
> >     cfe-dev mailing list
> >     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> >     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200428/9b437169/attachment-0001.html>