<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1027944799;
mso-list-type:hybrid;
mso-list-template-ids:796272240 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-text:"%1\)";
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I have a question regarding the assumptions and correct usage of the AST serialization (regarding C and C++ sources).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I have done the following:<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="mso-list:Ignore">1)<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>I have implemented a ClangTool which builds ASTs from compilation databases.<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="mso-list:Ignore">2)<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>I have dumped the contents of the ASTs in both textual and binary formats.<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="mso-list:Ignore">3)<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>Then I have read in the serialized binary, and dumped that one again in both formats.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">What I have noticed, is that dump of the different generations are different in size (up to a magnitude). Textual dumps also differ.<o:p></o:p></p>
<p class="MsoNormal">I would have assumed the serialization and deserialization steps to produce an AST which is the same as the original.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Maybe I have done it the wrong way, in the following outline I try to give the gist of the method used:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">void textual_dump_to_file(const ASTUnit& unit, StringRef file_path) {<o:p></o:p></p>
<p class="MsoNormal"> using namespace llvm::sys::fs;<o:p></o:p></p>
<p class="MsoNormal"> using namespace llvm::sys::path;<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> // mkdir -p<o:p></o:p></p>
<p class="MsoNormal"> create_directories(parent_path(file_path));<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> std::error_code EC;<o:p></o:p></p>
<p class="MsoNormal"> llvm::raw_fd_ostream out {file_path, EC};<o:p></o:p></p>
<p class="MsoNormal"> unit.getASTContext().getTranslationUnitDecl()->dump(out, /*deserialize*/ true);<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">void experiment_with_unit(CompilerInstance& CI, ASTUnit& Unit, StringRef MethodPrefix, StringRef SourcePath) {<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> using namespace llvm::sys::fs;<o:p></o:p></p>
<p class="MsoNormal"> using namespace llvm::sys::path;<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();<o:p></o:p></p>
<p class="MsoNormal"> TextDiagnosticPrinter *DiagClient = new TextDiagnosticPrinter(llvm::errs(), &*DiagOpts);<o:p></o:p></p>
<p class="MsoNormal"> IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());<o:p></o:p></p>
<p class="MsoNormal"> IntrusiveRefCntPtr<DiagnosticsEngine> Diags(<o:p></o:p></p>
<p class="MsoNormal"> new DiagnosticsEngine(DiagID, &*DiagOpts, DiagClient));<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> llvm::SmallString<256> TextDumpPath{MethodPrefix};<o:p></o:p></p>
<p class="MsoNormal"> TextDumpPath.append(SourcePath);<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> llvm::SmallString<256> BinaryDumpPath {TextDumpPath};<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> replace_extension(TextDumpPath, ".txt1");<o:p></o:p></p>
<p class="MsoNormal"> replace_extension(BinaryDumpPath, ".bin1");<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> Unit.Save(BinaryDumpPath);<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> textual_dump_to_file(Unit, TextDumpPath);<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> auto Dump1Loaded = ASTUnit::LoadFromASTFile(<o:p></o:p></p>
<p class="MsoNormal"> std::string(BinaryDumpPath), CI.getPCHContainerOperations()->getRawReader(),<o:p></o:p></p>
<p class="MsoNormal"> ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> replace_extension(TextDumpPath, ".txt2");<o:p></o:p></p>
<p class="MsoNormal"> replace_extension(BinaryDumpPath, ".bin2");<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> Dump1Loaded->Save(BinaryDumpPath);<o:p></o:p></p>
<p class="MsoNormal"> textual_dump_to_file(*Dump1Loaded, TextDumpPath);<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Files with extensions txt1 and txt2 differ, and bin1 and bin2 as well.<o:p></o:p></p>
<p class="MsoNormal">I would think that if there is a problem in the reproducibility of the AST, then it would affect modules, and the analyzer as well.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Any thoughts on this?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks<o:p></o:p></p>
</div>
</body>
</html>