[cfe-dev] [analyzer][tooling] Architectural questions about Clang and ClangTooling
David Blaikie via cfe-dev
cfe-dev at lists.llvm.org
Tue Apr 28 14:53:23 PDT 2020
(+Sam, who works on clang tooling)
On Tue, Apr 28, 2020 at 10:38 AM Artem Dergachev <noqnoqneo at gmail.com>
wrote:
> On 4/28/20 6:23 PM, David Blaikie wrote:
> > On Tue, Apr 28, 2020 at 3:09 AM Artem Dergachev via cfe-dev
> > <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> >
> > Hey!
> >
> > 1. I'm glad that we're finally trying to avoid dumping PCH-s on disk!
> >
> > 2. As far as I understand, dependencies are mostly about Clang binary
> > size. I don't know for sure but that's what I had to consider when
> > I was
> > adding libASTMatchers into the Clang binary a few years ago.
> >
> > 3. I strongly disagree that JSON compilation database is "just
> > right for
> > this purpose". I don't mind having explicit improved support for
> > it but
> > I would definitely prefer not to hardcode it as the only possible
> > option. Compilation databases are very limited and we cannot drop
> > projects or entire build systems simply because they can't be
> > represented accurately via a compilation database. So I believe that
> > this is not the right solution for CTU in particular. Instead, an
> > external tool like scan-build should be guiding CTU analysis and
> > coordinate the work of different Clang instances so that to abstract
> > Clang away from the build system.
> >
> >
> > What functionality do you picture the scan-build-like tool having that
> > couldn't be supported if that tool instead built a compilation
> > database & the CTU/CSA was powered by the database? (that would
> > separate concerns: build command discovery from execution, and make
> > scan-build-like tool more general purpose, rather than specific only
> > to the CSA)
>
> Here are a few examples (please let me know if i'm unaware of the latest
> developments in the area of compilation databases!)
>
> - Suppose the project uses precompiled headers. In order to analyze a
> file that includes a pch, we need to first rebuild the pch with the
> clang that's used for analysis, and only then try to analyze the file.
> This introduces a notion of dependency between compilation database
> entries; unless entries are ordered in their original compilation order
> and we're analyzing with -j1, race conditions will inevitably cause us
> to occasionally fail to find the pch. I didn't try to figure out what
> happens when modules are used, but i suspect it's worse.
Google certainly uses clang tooling, with a custom compilation database on
a build that uses explicit modules - I believe the way that's done is to
ignore/strip the modules-related flags so the clang tooling uses
non-modules related compilation. But I could be wrong there. You could do
some analysis to see any inputs/outputs - or reusing the existing outputs
in the original build.
> But if analysis
> is conducted alongside compilation and the build system waits for the
> analysis to finish like it waits for compilation to finish before
> compiling dependent translation units, race conditions are eliminated.
> This is how scan-build currently works: it substitutes the compiler with
> a fake compiler that both invokes the original compiler and clang for
> analysis. Of course, cross-translation-unit analysis won't be conducted
> in parallel with compilation; it's multi-pass by design. The problem is
> the same though: it should compile pch files first but there's no notion
> of "compile this first" in an unstructured compilation database.
>
> - Suppose the project builds the same translation unit multiple times,
> say with different flags, say for different architectures. When we're
> trying to lookup such file in the compilation database, how do we figure
> out which instance do we take? If we are to ever solve this problem, we
> have to introduce a notion of a "shipped binary" (an ultimate linking
> target) in the compilation database and perform cross-translation-unit
> analysis of one shipped binary at a time.
>
I believe in that case the compilation database would include both
compilations of the file - and presumably for the static analyzer, it would
want to build all of them (or scan-build would have to have some logic for
filtering them out/deciding which one is the interesting one - same sort of
thing would have to be done on the compilation database)
> - There is a variety of hacks that people can introduce in their
> projects if they add arbitrary scripts to their build system. For
> instance, they can mutate contents of an autogenerated header in the
> middle of the build. We can always say "Well, you shouldn't do that",
> but people will do that anyway. This makes me believe that no purely
> declarative compilation database format will ever be able to handle such
> Turing-complete hacks and there's no other way to integrate analysis
> into build perfectly other than by letting the build system guide the
> analysis.
>
Yep, a mutating build where you need to observe the state before/after such
mutations, etc, not much you could do about it. (& how would CTU SA work in
that sort of case? You have to run the whole build multiple times?)
Clang tools are essentially static analysis - so it seems weird that we
have two different approaches to static analysis discovery/lookup in the
Clang project, but not wholely unacceptable, potentially different goals,
etc.
> I'm also all for separation of concerns and I don't think any of this is
> specific to our static analysis.
>
> > On 4/28/20 11:31 AM, Endre Fülöp via cfe-dev wrote:
> > >
> > > Hi!
> > >
> > > Question:
> > >
> > > Why is the dependency on ClangTooling ill-advised inside ClangSA
> > (also
> > > meaning the Clang binary) itself ?
> > >
> > > Context:
> > >
> > > Currently I am working on an alternative way to import external TU
> > > AST-s during analysis ( https://reviews.llvm.org/D75665 ).
> > >
> > > In order to produce AST-s, I use a compilation database to
> > extract the
> > > necessary flags, and finally use ClangTool::buildAST.
> > >
> > > I am aware that I have other options for this as well (like
> > manually
> > > coding the compdb handling for my specific case for the
> > >
> > > first step, and maybe even dumping ASTs as pch-s into an in-memory
> > > buffer), but still consuming JSONCompilationDatabase
> > >
> > > is just too convenient. I would not want to introduce another
> > format
> > > when compilation database is just right for this purpose.
> > >
> > > Elaboration:
> > >
> > > While I understand that introducing dependencies has its downsides,
> > > but not being able to reuse code from Tooling is also not ideal.
> > >
> > > I would very much like to be enlightened by someone more
> > familiar with
> > > architectural decision already made why this is the case,
> > >
> > > and optionally how I could proceed with my efforts so that I can
> > come
> > > up with the most fitting solution i.e. not a hack.
> > >
> > > Thanks,
> > >
> > > Endre Fülöp
> > >
> > >
> > > _______________________________________________
> > > cfe-dev mailing list
> > > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200428/9b437169/attachment-0001.html>
More information about the cfe-dev
mailing list