[PATCH] D60910: [WIP] Dumping the AST to JSON

Sat Apr 20 09:06:51 PDT 2019

aaron.ballman marked 2 inline comments as done.
aaron.ballman added a comment.

In D60910#1473441 <https://reviews.llvm.org/D60910#1473441>, @riccibruno wrote:

> A few comments/questions:
>
> 1. How stable is the format going to be, and how much state is going to be exposed ?

I don't expect it to be particularly stable (I'm not suggesting we conform to some particular schema) and for my own needs, I'm happy enough if new Clang releases break us due to new, deleted, or renamed information. I need to expose as much state as required for our research team to import the data and reason about the AST; I'd guess it's going to be roughly the same as the textual dumper, but different in spots.

> 2. I am wondering if it would be possible to separate the logic about how to visit a given AST node, from the logic about what to do with the node's state. For example it seems to me that `JSONNodeDumper::VisitVarDecl` and `TextNodeDumper::VisitVarDecl` are somewhat similar. You could instead imagine having a generic way to visit each node, which would then inform `JSONNodeDumper` or `TextNodeDumper` about the various bits of state in the node. With the proposed implementation essentially all of the logic for what is in each node has to be duplicated. This is not an objection about the current patch, just something I am wondering about.

This was something I had been pondering when we were doing the great AST dump refactoring earlier this year, but I didn't think we could get it to be sufficiently general to suit various purposes. On the one hand, there's definitely a ton of overlap between any two "dump the AST in this format" approaches, but on the other hand, different approaches have different requirements that may change what data is displayed. For instance, the textual dump will likely write out more information than the JSON dump because the textual dump is purely for human readability (so having extra information near the node may be critical) whereas the JSON dump is for machine readability (so having extra information may or may not be prohibitive, but hooking up disparate data within the file may be trivial).

================
Comment at: include/clang/AST/JSONNodeDumper.h:164
+  std::string createAccessSpecifier(AccessSpecifier AS);
+
+  void writePreviousDeclImpl(...) {}
----------------
riccibruno wrote:
> @lebedev.ri might want to comment on the use of `llvm::json`. I think that there has been issues in the recent past where it was unexpectedly slow (although it might not matter for this use case).
I'm currently only using the JSON functionality for wholly-contained "leaf" nodes, so I don't think there will be a strong performance concern here from using it. However, there's no strong requirement that I continue to use it (I just thought it was a bit cleaner), so I could also replace it with the streaming functionality I use elsewhere.

================
Comment at: lib/AST/ASTDumper.cpp:231
+
+  if (ADOF_JSON == Format) {
+    JSONDumper P(OS, SM, Ctx.getPrintingPolicy());
----------------
riccibruno wrote:
> (Ah, Yoda conditionals ! (I am not saying to change it))
> 
> Do you plan to do the same modification to the others `dump` methods ?
Yes, I do (though probably not all of them, like `dumpColor()`)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60910/new/

https://reviews.llvm.org/D60910