<div dir="ltr"><div dir="ltr">(+Sam, who works on clang tooling)<br><br>On Tue, Apr 28, 2020 at 10:38 AM Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 4/28/20 6:23 PM, David Blaikie wrote:<br>

> On Tue, Apr 28, 2020 at 3:09 AM Artem Dergachev via cfe-dev <br>

> <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a> <mailto:<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>> wrote:<br>

><br>

>     Hey!<br>

><br>

>     1. I'm glad that we're finally trying to avoid dumping PCH-s on disk!<br>

><br>

>     2. As far as I understand, dependencies are mostly about Clang binary<br>

>     size. I don't know for sure but that's what I had to consider when<br>

>     I was<br>

>     adding libASTMatchers into the Clang binary a few years ago.<br>

><br>

>     3. I strongly disagree that JSON compilation database is "just<br>

>     right for<br>

>     this purpose". I don't mind having explicit improved support for<br>

>     it but<br>

>     I would definitely prefer not to hardcode it as the only possible<br>

>     option. Compilation databases are very limited and we cannot drop<br>

>     projects or entire build systems simply because they can't be<br>

>     represented accurately via a compilation database. So I believe that<br>

>     this is not the right solution for CTU in particular. Instead, an<br>

>     external tool like scan-build should be guiding CTU analysis and<br>

>     coordinate the work of different Clang instances so that to abstract<br>

>     Clang away from the build system.<br>

><br>

><br>

> What functionality do you picture the scan-build-like tool having that <br>

> couldn't be supported if that tool instead built a compilation <br>

> database & the CTU/CSA was powered by the database? (that would <br>

> separate concerns: build command discovery from execution, and make <br>

> scan-build-like tool more general purpose, rather than specific only <br>

> to the CSA)<br>

<br>

Here are a few examples (please let me know if i'm unaware of the latest <br>

developments in the area of compilation databases!)<br>

<br>

- Suppose the project uses precompiled headers. In order to analyze a <br>

file that includes a pch, we need to first rebuild the pch with the <br>

clang that's used for analysis, and only then try to analyze the file. <br>

This introduces a notion of dependency between compilation database <br>

entries; unless entries are ordered in their original compilation order <br>

and we're analyzing with -j1, race conditions will inevitably cause us <br>

to occasionally fail to find the pch. I didn't try to figure out what <br>

happens when modules are used, but i suspect it's worse. </blockquote><div><br>Google certainly uses clang tooling, with a custom compilation database on a build that uses explicit modules - I believe the way that's done is to ignore/strip the modules-related flags so the clang tooling uses non-modules related compilation. But I could be wrong there. You could do some analysis to see any inputs/outputs - or reusing the existing outputs in the original build.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">But if analysis <br>

is conducted alongside compilation and the build system waits for the <br>

analysis to finish like it waits for compilation to finish before <br>

compiling dependent translation units, race conditions are eliminated. <br>

This is how scan-build currently works: it substitutes the compiler with <br>

a fake compiler that both invokes the original compiler and clang for <br>

analysis. Of course, cross-translation-unit analysis won't be conducted <br>

in parallel with compilation; it's multi-pass by design. The problem is <br>

the same though: it should compile pch files first but there's no notion <br>

of "compile this first" in an unstructured compilation database.<br>

<br>

- Suppose the project builds the same translation unit multiple times, <br>

say with different flags, say for different architectures. When we're <br>

trying to lookup such file in the compilation database, how do we figure <br>

out which instance do we take? If we are to ever solve this problem, we <br>

have to introduce a notion of a "shipped binary" (an ultimate linking <br>

target) in the compilation database and perform cross-translation-unit <br>

analysis of one shipped binary at a time.<br></blockquote><div><br>I believe in that case the compilation database would include both compilations of the file - and presumably for the static analyzer, it would want to build all of them (or scan-build would have to have some logic for filtering them out/deciding which one is the interesting one - same sort of thing would have to be done on the compilation database)<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">- There is a variety of hacks that people can introduce in their <br>

projects if they add arbitrary scripts to their build system. For <br>

instance, they can mutate contents of an autogenerated header in the <br>

middle of the build. We can always say "Well, you shouldn't do that", <br>

but people will do that anyway. This makes me believe that no purely <br>

declarative compilation database format will ever be able to handle such <br>

Turing-complete hacks and there's no other way to integrate analysis <br>

into build perfectly other than by letting the build system guide the <br>

analysis.<br></blockquote><div><br>Yep, a mutating build where you need to observe the state before/after such mutations, etc, not much you could do about it. (& how would CTU SA work in that sort of case? You have to run the whole build multiple times?)<br><br>Clang tools are essentially static analysis - so it seems weird that we have two different approaches to static analysis discovery/lookup in the Clang project, but not wholely unacceptable, potentially different goals, etc.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm also all for separation of concerns and I don't think any of this is <br>

specific to our static analysis.<br>

<br>

>     On 4/28/20 11:31 AM, Endre Fülöp via cfe-dev wrote:<br>

>     ><br>

>     > Hi!<br>

>     ><br>

>     > Question:<br>

>     ><br>

>     > Why is the dependency on ClangTooling ill-advised inside ClangSA<br>

>     (also<br>

>     > meaning the Clang binary) itself ?<br>

>     ><br>

>     > Context:<br>

>     ><br>

>     > Currently I am working on an alternative way to import external TU<br>

>     > AST-s during analysis ( <a href="https://reviews.llvm.org/D75665" rel="noreferrer" target="_blank">https://reviews.llvm.org/D75665</a> ).<br>

>     ><br>

>     > In order to produce AST-s, I use a compilation database to<br>

>     extract the<br>

>     > necessary flags, and finally use ClangTool::buildAST.<br>

>     ><br>

>     > I am aware that I have other options for this as well (like<br>

>     manually<br>

>     > coding the compdb handling for my specific case for the<br>

>     ><br>

>     > first step, and maybe even dumping ASTs as pch-s into an in-memory<br>

>     > buffer), but still consuming JSONCompilationDatabase<br>

>     ><br>

>     > is just too convenient. I would not want to introduce another<br>

>     format<br>

>     > when compilation database is just right for this purpose.<br>

>     ><br>

>     > Elaboration:<br>

>     ><br>

>     > While I understand that introducing dependencies has its downsides,<br>

>     > but not being able to reuse code from Tooling is also not ideal.<br>

>     ><br>

>     > I would very much like to be enlightened by someone more<br>

>     familiar with<br>

>     > architectural decision already made why this is the case,<br>

>     ><br>

>     > and optionally how I could proceed with my efforts so that I can<br>

>     come<br>

>     > up with the most fitting solution i.e. not a hack.<br>

>     ><br>

>     > Thanks,<br>

>     ><br>

>     > Endre Fülöp<br>

>     ><br>

>     ><br>

>     > _______________________________________________<br>

>     > cfe-dev mailing list<br>

>     > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a> <mailto:<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>><br>

>     > <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

><br>

>     _______________________________________________<br>

>     cfe-dev mailing list<br>

>     <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a> <mailto:<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>><br>

>     <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

><br>

<br>

</blockquote></div></div>