[cfe-dev] AST Serialization

Endre Fülöp via cfe-dev cfe-dev at lists.llvm.org
Thu Feb 20 01:35:21 PST 2020


I have a question regarding the assumptions and correct usage of the AST serialization (regarding C and C++ sources).

I have done the following:

1)      I have implemented a ClangTool which builds ASTs from compilation databases.

2)      I have dumped the contents of the ASTs in both textual and binary formats.

3)      Then I have read in the serialized binary, and dumped that one again in both formats.

What I have noticed, is that dump of the different generations are different in size (up to a magnitude). Textual dumps also differ.
I would have assumed the serialization and deserialization steps to produce an AST which is the same as the original.

Maybe I have done it the wrong way, in the following outline I try to give the gist of the method used:

void textual_dump_to_file(const ASTUnit& unit, StringRef file_path) {
    using namespace llvm::sys::fs;
   using namespace llvm::sys::path;

   // mkdir -p

  std::error_code EC;
  llvm::raw_fd_ostream out {file_path, EC};
  unit.getASTContext().getTranslationUnitDecl()->dump(out, /*deserialize*/ true);

void experiment_with_unit(CompilerInstance& CI, ASTUnit& Unit, StringRef MethodPrefix, StringRef SourcePath) {

  using namespace llvm::sys::fs;
  using namespace llvm::sys::path;

  IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();
  TextDiagnosticPrinter *DiagClient = new TextDiagnosticPrinter(llvm::errs(), &*DiagOpts);
   IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());
   IntrusiveRefCntPtr<DiagnosticsEngine> Diags(
           new DiagnosticsEngine(DiagID, &*DiagOpts, DiagClient));

   llvm::SmallString<256> TextDumpPath{MethodPrefix};

   llvm::SmallString<256> BinaryDumpPath {TextDumpPath};

   replace_extension(TextDumpPath, ".txt1");
   replace_extension(BinaryDumpPath, ".bin1");


   textual_dump_to_file(Unit, TextDumpPath);

   auto Dump1Loaded = ASTUnit::LoadFromASTFile(
        std::string(BinaryDumpPath), CI.getPCHContainerOperations()->getRawReader(),
       ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());

   replace_extension(TextDumpPath, ".txt2");
   replace_extension(BinaryDumpPath, ".bin2");

   textual_dump_to_file(*Dump1Loaded, TextDumpPath);

Files with extensions txt1 and txt2 differ, and bin1 and bin2 as well.
I would think that if there is a problem in the reproducibility of the AST, then it would affect modules, and the analyzer as well.

Any thoughts on this?

