[cfe-dev] RFC: abstract serialization
John McCall via cfe-dev
cfe-dev at lists.llvm.org
Wed Sep 18 15:50:26 PDT 2019
Swift’s AST is largely self-contained, but it occasionally needs to
refer to entities from Clang’s AST. Up until now, we’ve only needed
to embed the occasional `clang::Decl*`, but we’ve recently found a
reason why it’d be useful to embed a `clang::Type*` That creates a
problem for us, because while we know how to serialize a reference to an
external Clang declaration (or at least a subset of them), we don’t
have a way to serialize a reference to an external Clang *type*. Now,
obviously we could reproduce the structure of that Clang type in our
serialization and deserialization code, but the reason we want to use
Clang’s AST in the first place is that C types can have a surprising
amount of structure; for example, function types can have calling
conventions, `regparm` attributes, ARC parameter conventions, and all
sorts of other things that have been added over the years by various
extensions. Including all of that structure, across the entire AST,
would be a significant ongoing maintenance burden. Therefore, we’d
rather find some way to take advantage of Clang’s own serialization
logic.
At the same time, Clang has a longstanding problem with debugging dumps.
We have several different debugging-dump formats, and they’re all
pretty much destined to be incomplete because anybody augmenting the AST
has to remember to include the new information in all the dumping code.
Exhaustiveness checking lets us verify that we haven’t forgotten an
entire node class, but it doesn’t tell us whether we’ve forgotten a
field of that class. We only have one piece of code that *has* to get
that information right, and that’s the serialization logic.
I’d like to propose solving both of these problems in one pass by
introducing a new level of abstraction into the serializer and
deserializer. The basic idea is that we’d write the node-specific
serialization and deserialization code as if it were generating and
consuming some simple JSON-like structured format; it would be templated
to make calls against some abstract physical serialization layer.
That is, for code today that looks like this:
```
void ASTTypeWriter::VisitVariableArrayType(const VariableArrayType *T) {
VisitArrayType(T);
Record.AddSourceLocation(T->getLBracketLoc());
Record.AddSourceLocation(T->getRBracketLoc());
Record.AddStmt(T->getSizeExpr());
Code = TYPE_VARIABLE_ARRAY;
}
```
We’d instead write something more like:
```
void AbstractTypeWriter<Serializer>::VisitVariableArrayType(const
VariableArrayType *T) {
VisitArrayType(T);
S.addSourceLocation(TYPE_VARIABLE_ARRAY_LBRACKET_LOC,
T->getLBRacketLoc());
S.addSourceLocation(TYPE_VARIABLE_ARRAY_RBRACKET_LOC,
T->getRBRacketLoc());
S.addStmt(TYPE_VARIABLE_ARRAY_SIZE_EXPR, T->getSizeExpr());
S.setNodeKind(TYPE_VARIABLE_ARRAY);
}
```
And the `Serializer` type would be expected to implement a dozen or so
of these `addFoo` methods: bool, int, string, begin/end array, begin/end
substructure, SourceLocation, types, sub-statements, declaration
references, maybe some cases I’m forgetting.
On the deserialization side, we would promise to make deserialization
calls in the same order that we make serialization calls so that we can
continue to use a flat representation in our main serialization path.
The current deserialization code does not actually check for failure in
deserializing components, and I would probably continue that for now.
I haven’t thought very carefully about what these attribute arguments
would be. They could be strings, but an enum might allow clever
metaprograms. Maybe some of this could be tblgen’ed.
Thoughts?
John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190918/ff3b48ef/attachment.html>
More information about the cfe-dev
mailing list