[cfe-dev] [analyzer][tooling] Architectural questions about Clang and ClangTooling

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Wed Apr 29 14:04:15 PDT 2020


On Tue, Apr 28, 2020 at 3:43 PM Sam McCall <sammccall at google.com> wrote:

> On Tue, Apr 28, 2020 at 11:53 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> (+Sam, who works on clang tooling)
>>
>> On Tue, Apr 28, 2020 at 10:38 AM Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>> On 4/28/20 6:23 PM, David Blaikie wrote:
>>> > On Tue, Apr 28, 2020 at 3:09 AM Artem Dergachev via cfe-dev
>>> > <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>> >
>>> >     Hey!
>>> >
>>> >     1. I'm glad that we're finally trying to avoid dumping PCH-s on
>>> disk!
>>> >
>>> >     2. As far as I understand, dependencies are mostly about Clang
>>> binary
>>> >     size. I don't know for sure but that's what I had to consider when
>>> >     I was
>>> >     adding libASTMatchers into the Clang binary a few years ago.
>>> >
>>> >     3. I strongly disagree that JSON compilation database is "just
>>> >     right for
>>> >     this purpose". I don't mind having explicit improved support for
>>> >     it but
>>> >     I would definitely prefer not to hardcode it as the only possible
>>> >     option. Compilation databases are very limited and we cannot drop
>>> >     projects or entire build systems simply because they can't be
>>> >     represented accurately via a compilation database. So I believe
>>> that
>>> >     this is not the right solution for CTU in particular. Instead, an
>>> >     external tool like scan-build should be guiding CTU analysis and
>>> >     coordinate the work of different Clang instances so that to
>>> abstract
>>> >     Clang away from the build system.
>>> >
>>> >
>>> > What functionality do you picture the scan-build-like tool having that
>>> > couldn't be supported if that tool instead built a compilation
>>> > database & the CTU/CSA was powered by the database? (that would
>>> > separate concerns: build command discovery from execution, and make
>>> > scan-build-like tool more general purpose, rather than specific only
>>> > to the CSA)
>>>
>>> Here are a few examples (please let me know if i'm unaware of the latest
>>> developments in the area of compilation databases!)
>>>
>>> - Suppose the project uses precompiled headers. In order to analyze a
>>> file that includes a pch, we need to first rebuild the pch with the
>>> clang that's used for analysis, and only then try to analyze the file.
>>> This introduces a notion of dependency between compilation database
>>> entries; unless entries are ordered in their original compilation order
>>> and we're analyzing with -j1, race conditions will inevitably cause us
>>> to occasionally fail to find the pch. I didn't try to figure out what
>>> happens when modules are used, but i suspect it's worse.
>>
>>
>> Google certainly uses clang tooling, with a custom compilation database
>> on a build that uses explicit modules - I believe the way that's done is to
>> ignore/strip the modules-related flags so the clang tooling uses
>> non-modules related compilation.
>>
> Yeah. In fact we put the build system into a mode where it generates
> compile commands for a non-modules build, I'm not sure whether it's always
> easy to do this given a modular compile command.
>

Ah, fair enough *nod*


> I suspect to make c++20 modular builds work with tools (and we must),
> we'll need an extension or peer to compile_commands.json that build systems
> can use to define the modules in a project. That will at least give tools
> what they need to maintain a module cache.
>

Yeah, if we end up going with the... I forget the current name of the
prototype idea GCC has - the sort of build oracle (the compiler interacts
over a socket to request certain prerequisite modules be built - trivially,
that could invoke the compiler directly or it could inform a build system
like ninja to queue that work, knowing that the current process will be
idle until the work can be provided (so less resource starvation than
Clang's current implicit modules)), a compiler shim like scan-build could
also insert itself in as an oracle proxy to retrieve those module build
requests and record them (& the dependency information - this module build
request came from the compilation of this other file, etc) as well.


>
>
>> But I could be wrong there. You could do some analysis to see any
>> inputs/outputs - or reusing the existing outputs in the original build.
>>
> The problem with reusing existing outputs (at least serialized ASTs like
> PCH or modules) is because there's no stable format, you need to
> version-lock your tool to your compiler (which must be clang!) to ensure
> compatibility.
> I think this is what Artem was alluding to by "with the clang that's used
> for analysis".
>

Oh, right!


> But if analysis
>>> is conducted alongside compilation and the build system waits for the
>>> analysis to finish like it waits for compilation to finish before
>>> compiling dependent translation units, race conditions are eliminated.
>>> This is how scan-build currently works: it substitutes the compiler with
>>> a fake compiler that both invokes the original compiler and clang for
>>> analysis. Of course, cross-translation-unit analysis won't be conducted
>>> in parallel with compilation; it's multi-pass by design. The problem is
>>> the same though: it should compile pch files first but there's no notion
>>> of "compile this first" in an unstructured compilation database.
>>>
>> FWIW our internal database plugin (not compile_commands.json) prepares
> the inputs for a target when its compile command is queried. This works as
> long as your build system doesn't delete files or give them different
> content based on what you're building.
>
> - Suppose the project builds the same translation unit multiple times,
>>> say with different flags, say for different architectures. When we're
>>> trying to lookup such file in the compilation database, how do we figure
>>> out which instance do we take? If we are to ever solve this problem, we
>>> have to introduce a notion of a "shipped binary" (an ultimate linking
>>> target) in the compilation database and perform cross-translation-unit
>>> analysis of one shipped binary at a time.
>>>
>>
>> I believe in that case the compilation database would include both
>> compilations of the file - and presumably for the static analyzer, it would
>> want to build all of them (or scan-build would have to have some logic for
>> filtering them out/deciding which one is the interesting one - same sort of
>> thing would have to be done on the compilation database)
>>
>>
>>> - There is a variety of hacks that people can introduce in their
>>> projects if they add arbitrary scripts to their build system. For
>>> instance, they can mutate contents of an autogenerated header in the
>>> middle of the build. We can always say "Well, you shouldn't do that",
>>> but people will do that anyway. This makes me believe that no purely
>>> declarative compilation database format will ever be able to handle such
>>> Turing-complete hacks and there's no other way to integrate analysis
>>> into build perfectly other than by letting the build system guide the
>>> analysis.
>>>
>>
>> Yep, a mutating build where you need to observe the state before/after
>> such mutations, etc, not much you could do about it. (& how would CTU SA
>> work in that sort of case? You have to run the whole build multiple times?)
>>
>> Clang tools are essentially static analysis - so it seems weird that we
>> have two different approaches to static analysis discovery/lookup in the
>> Clang project, but not wholely unacceptable, potentially different goals,
>> etc.
>>
>>
>>> I'm also all for separation of concerns and I don't think any of this is
>>> specific to our static analysis.
>>>
>>> >     On 4/28/20 11:31 AM, Endre Fülöp via cfe-dev wrote:
>>> >     >
>>> >     > Hi!
>>> >     >
>>> >     > Question:
>>> >     >
>>> >     > Why is the dependency on ClangTooling ill-advised inside ClangSA
>>> >     (also
>>> >     > meaning the Clang binary) itself ?
>>> >     >
>>> >     > Context:
>>> >     >
>>> >     > Currently I am working on an alternative way to import external
>>> TU
>>> >     > AST-s during analysis ( https://reviews.llvm.org/D75665 ).
>>> >     >
>>> >     > In order to produce AST-s, I use a compilation database to
>>> >     extract the
>>> >     > necessary flags, and finally use ClangTool::buildAST.
>>> >     >
>>> >     > I am aware that I have other options for this as well (like
>>> >     manually
>>> >     > coding the compdb handling for my specific case for the
>>> >     >
>>> >     > first step, and maybe even dumping ASTs as pch-s into an
>>> in-memory
>>> >     > buffer), but still consuming JSONCompilationDatabase
>>> >     >
>>> >     > is just too convenient. I would not want to introduce another
>>> >     format
>>> >     > when compilation database is just right for this purpose.
>>> >     >
>>> >     > Elaboration:
>>> >     >
>>> >     > While I understand that introducing dependencies has its
>>> downsides,
>>> >     > but not being able to reuse code from Tooling is also not ideal.
>>> >     >
>>> >     > I would very much like to be enlightened by someone more
>>> >     familiar with
>>> >     > architectural decision already made why this is the case,
>>> >     >
>>> >     > and optionally how I could proceed with my efforts so that I can
>>> >     come
>>> >     > up with the most fitting solution i.e. not a hack.
>>> >     >
>>> >     > Thanks,
>>> >     >
>>> >     > Endre Fülöp
>>> >     >
>>> >     >
>>> >     > _______________________________________________
>>> >     > cfe-dev mailing list
>>> >     > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>> >     > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> >
>>> >     _______________________________________________
>>> >     cfe-dev mailing list
>>> >     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>> >     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200429/5f5f0529/attachment-0001.html>


More information about the cfe-dev mailing list