<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:549805655;
mso-list-template-ids:41728406;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:36.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:72.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:108.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:144.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:180.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:216.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:252.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:288.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:324.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="HU" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Thanks for the insight. It could definitely open new horizons if we could distribute AST manipulation in a map-reduce fashion.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">I would just like to point out, that on IR level, the workflow introduced is definitely worthwhile to consider, because the edits<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">in that case is just inlining IR instructions at callsites. While there may arise some difficulties there (I am just making assumptions as<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">I am no expert in IR or CodeGen), (almost-) context-insensitivity helps a lot. In case of ASTs however one function node can contain references to<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">other nodes all around the TU. It is not trivial to inline a function, ASTImporter library does exactly this, and It works by importing the function<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">itself and all of its dependent declaration and types. This lead to the consideration, that TUDecls should be the atomic units of CTU.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">I am still not fully aware of the implications of these arguments on the architecture you mentioned, but I will be investigating, and any discussion is welcome on my part.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Endre<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> Sam McCall <sammccall@google.com>
<br>
<b>Sent:</b> 2020. április 29., szerda 11:56<br>
<b>To:</b> Endre Fülöp <Endre.Fulop@sigmatechnology.se><br>
<b>Cc:</b> cfe-dev@lists.llvm.org; richard@metafoo.co.uk; David Blaikie <dblaikie@gmail.com>; Artem Dergachev <noqnoqneo@gmail.com>; Gábor Márton <gabor.marton@ericsson.com><br>
<b>Subject:</b> Re: [analyzer][tooling] Analyzer architecture<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">It's worth carefully thinking about your design goals for this system. Particularly how much you value:<o:p></o:p></p>
<div>
<p class="MsoNormal"> - predictability (isolation and debugging)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> - efficiency (e.g. in terms of total CPU usage)<o:p></o:p></p>
<div>
<p class="MsoNormal"> - scalability (often in tension with efficiency)<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">We've had some good experience with a mapreduce approach for cross-TU analysis, for dead-code analysis etc.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">The idea is your analysis is composed of pure functions that run on a single TU.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">e.g. for inline-function, this would be:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> 1. [Prepare] analyze the TU containing the target function, this is a function (input spec, TU AST) -> function AST<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> 2. [Map] analyze every TU to find occurrences and compute edits, this is a function (TU AST, function AST) -> [(file, edit)]<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> 3. [Reduce] group by file and reconcile edits, this is a function (file, [edit]) -> edit<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">It trades off a bit of efficiency to be highly predictable (pure functions are easy to test, intermediate states can be saved for analysis, bugs are easily localizable to TUs) and scalable.<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal">It does require your intermediate data to be serializable, but distributing over a network server does too. Having the "framework part" not be too opinionated about the form of this data gives some useful flexibility.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Compared to this, your ASTServer seems to sacrifice scalability and predictability for efficiency if I'm understanding it correctly, it's worth carefully considering whether this is the right tradeoff (e.g. it only makes sense if your analyses
are often slow enough to be worth squeezing this efficiency out of, but fast enough that they don't need to be seriously distributed).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">The Tooling libraries have fair support for Map steps, but none for Reduce and nothing very useful for stringing steps together. It's possible to bolt this stuff on but I regret that we haven't added it.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Wed, Apr 29, 2020 at 10:15 AM Endre Fülöp <<a href="mailto:Endre.Fulop@sigmatechnology.se">Endre.Fulop@sigmatechnology.se</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Hi!</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">In order to not overburden the previous discussion about Analyzer and Tooling, I would like to ask you opinions on a related but slightly orthogonal matter.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Gabor and I had a brainstorming session about the issues CTU analysis and compilation command handling (previous topic) brought up recently.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Note that these points are to be regarded as cursory expeditions into the hypothetical (at best).</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The train of thought regarding CTU analysis had the following outline:</span><o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span lang="EN-US">We need a tool that gets a `FunctionDecl` (the function which we would like to inline) and returns with an AST to its TU.</span><o:p></o:p></li></ul>
<ul type="disc">
<ul type="circle">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">the fitting abstraction level of the result seems to be the TU level</span><o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">`externalDefMapping.txt` is just an implementation detail, actually we don't need that.</span><o:p></o:p></li></ul>
</ul>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span lang="EN-US">Let's call this tool `<b>ASTServer</b>`.</span><o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span lang="EN-US">ASTServer has some resemblance to `clangd`.</span><o:p></o:p></li></ul>
<ul type="disc">
<ul type="circle">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">Works on the whole project</span><o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">Uses compilation DB</span><o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">Persists already parsed ASTs in its memory (up to a limit)</span><o:p></o:p></li></ul>
</ul>
<ul type="disc">
<ul type="circle">
<ul type="square">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level3 lfo1">
<span lang="EN-US">(Cache eviction strategies? LRU?)</span><o:p></o:p></li></ul>
</ul>
</ul>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span lang="EN-US">The AST would be returned on a socket and in a serialized form (ASTReader/Writer).</span><o:p></o:p></li></ul>
<ul type="disc">
<ul type="circle">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">could also work over the network, promoting distribution</span><o:p></o:p></li></ul>
</ul>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span lang="EN-US">We need another tool: `<b>clang-analyzer</b>` !!!</span><o:p></o:p></li></ul>
<ul type="disc">
<ul type="circle">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">Actually we should have done this earlier</span><o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">Utilizes clang for analysis purposes</span><o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span lang="EN-US">Handles comm with `ASTServer`</span><o:p></o:p></li></ul>
</ul>
<ul type="disc">
<ul type="circle">
<ul type="square">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level3 lfo1">
<span lang="EN-US">Caches ASTs from the server</span><o:p></o:p></li></ul>
</ul>
</ul>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span lang="EN-US">external orchestrator tool CodeChecker tool would launch ASTServer and then would call clang-analyzer tool for each TU, thus conducting the analysis.</span><o:p></o:p></li></ul>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The reasoning behind the separation:</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The analyzer is a complex subsystem of Clang. The valid concern of clang binary growing out of proportion, and the increasing need for</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">tooling dependencies surfacing due to CTU analysis indicate the need reorganizing facilities.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The point is further backed by the argument that a complex functionality of interprocess communication (over sockets in our example)</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">is even less desirable inside the clang binary than binary size bloat.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Also the complexity of the whole solution could be distributed, and concerns of build system management, build configuration formats</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">can be separated from the analyzer itself (but allows for a wide variety of build-system vs analysis cooperation schemes to be implemented).</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Again, the scope of these ideas is not trivial to assess, and would probably require a considerable amount of effort,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">but I hope an open discussion would outline a solution that benefits the structure of the whole project.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Cheers,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Endre</span><o:p></o:p></p>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>