[cfe-dev] RFC: abstract serialization

Wed Sep 18 15:50:26 PDT 2019

Swift’s AST is largely self-contained, but it occasionally needs to 
refer to entities from Clang’s AST.  Up until now, we’ve only needed 
to embed the occasional `clang::Decl*`, but we’ve recently found a 
reason why it’d be useful to embed a `clang::Type*`  That creates a 
problem for us, because while we know how to serialize a reference to an 
external Clang declaration (or at least a subset of them), we don’t 
have a way to serialize a reference to an external Clang *type*.  Now, 
obviously we could reproduce the structure of that Clang type in our 
serialization and deserialization code, but the reason we want to use 
Clang’s AST in the first place is that C types can have a surprising 
amount of structure; for example, function types can have calling 
conventions, `regparm` attributes, ARC parameter conventions, and all 
sorts of other things that have been added over the years by various 
extensions.  Including all of that structure, across the entire AST, 
would be a significant ongoing maintenance burden.  Therefore, we’d 
rather find some way to take advantage of Clang’s own serialization 
logic.

At the same time, Clang has a longstanding problem with debugging dumps. 
  We have several different debugging-dump formats, and they’re all 
pretty much destined to be incomplete because anybody augmenting the AST 
has to remember to include the new information in all the dumping code.  
Exhaustiveness checking lets us verify that we haven’t forgotten an 
entire node class, but it doesn’t tell us whether we’ve forgotten a 
field of that class.  We only have one piece of code that *has* to get 
that information right, and that’s the serialization logic.

I’d like to propose solving both of these problems in one pass by 
introducing a new level of abstraction into the serializer and 
deserializer.  The basic idea is that we’d write the node-specific 
serialization and deserialization code as if it were generating and 
consuming some simple JSON-like structured format; it would be templated 
to make calls against some abstract physical serialization layer.

That is, for code today that looks like this:

```
void ASTTypeWriter::VisitVariableArrayType(const VariableArrayType *T) {
   VisitArrayType(T);
   Record.AddSourceLocation(T->getLBracketLoc());
   Record.AddSourceLocation(T->getRBracketLoc());
   Record.AddStmt(T->getSizeExpr());
   Code = TYPE_VARIABLE_ARRAY;
}
```

We’d instead write something more like:

```
void AbstractTypeWriter<Serializer>::VisitVariableArrayType(const 
VariableArrayType *T) {
   VisitArrayType(T);
   S.addSourceLocation(TYPE_VARIABLE_ARRAY_LBRACKET_LOC, 
T->getLBRacketLoc());
   S.addSourceLocation(TYPE_VARIABLE_ARRAY_RBRACKET_LOC, 
T->getRBRacketLoc());
   S.addStmt(TYPE_VARIABLE_ARRAY_SIZE_EXPR, T->getSizeExpr());
   S.setNodeKind(TYPE_VARIABLE_ARRAY);
}
```

And the `Serializer` type would be expected to implement a dozen or so 
of these `addFoo` methods: bool, int, string, begin/end array, begin/end 
substructure, SourceLocation, types, sub-statements, declaration 
references, maybe some cases I’m forgetting.

On the deserialization side, we would promise to make deserialization 
calls in the same order that we make serialization calls so that we can 
continue to use a flat representation in our main serialization path.

The current deserialization code does not actually check for failure in 
deserializing components, and I would probably continue that for now.

I haven’t thought very carefully about what these attribute arguments 
would be.  They could be strings, but an enum might allow clever 
metaprograms.  Maybe some of this could be tblgen’ed.

Thoughts?

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190918/ff3b48ef/attachment.html>