[cfe-dev] RFC: abstract serialization

Aaron Ballman via cfe-dev cfe-dev at lists.llvm.org
Thu Sep 19 06:20:03 PDT 2019


On Wed, Sep 18, 2019 at 6:50 PM John McCall via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>
> Swift’s AST is largely self-contained, but it occasionally needs to refer to entities from Clang’s AST. Up until now, we’ve only needed to embed the occasional clang::Decl*, but we’ve recently found a reason why it’d be useful to embed a clang::Type* That creates a problem for us, because while we know how to serialize a reference to an external Clang declaration (or at least a subset of them), we don’t have a way to serialize a reference to an external Clang type. Now, obviously we could reproduce the structure of that Clang type in our serialization and deserialization code, but the reason we want to use Clang’s AST in the first place is that C types can have a surprising amount of structure; for example, function types can have calling conventions, regparm attributes, ARC parameter conventions, and all sorts of other things that have been added over the years by various extensions. Including all of that structure, across the entire AST, would be a significant ongoing maintenance burden. Therefore, we’d rather find some way to take advantage of Clang’s own serialization logic.
>
> At the same time, Clang has a longstanding problem with debugging dumps. We have several different debugging-dump formats, and they’re all pretty much destined to be incomplete because anybody augmenting the AST has to remember to include the new information in all the dumping code. Exhaustiveness checking lets us verify that we haven’t forgotten an entire node class, but it doesn’t tell us whether we’ve forgotten a field of that class. We only have one piece of code that has to get that information right, and that’s the serialization logic.
>
> I’d like to propose solving both of these problems in one pass by introducing a new level of abstraction into the serializer and deserializer. The basic idea is that we’d write the node-specific serialization and deserialization code as if it were generating and consuming some simple JSON-like structured format; it would be templated to make calls against some abstract physical serialization layer.
>
> That is, for code today that looks like this:
>
> void ASTTypeWriter::VisitVariableArrayType(const VariableArrayType *T) {
>   VisitArrayType(T);
>   Record.AddSourceLocation(T->getLBracketLoc());
>   Record.AddSourceLocation(T->getRBracketLoc());
>   Record.AddStmt(T->getSizeExpr());
>   Code = TYPE_VARIABLE_ARRAY;
> }
>
> We’d instead write something more like:
>
> void AbstractTypeWriter<Serializer>::VisitVariableArrayType(const VariableArrayType *T) {
>   VisitArrayType(T);
>   S.addSourceLocation(TYPE_VARIABLE_ARRAY_LBRACKET_LOC, T->getLBRacketLoc());
>   S.addSourceLocation(TYPE_VARIABLE_ARRAY_RBRACKET_LOC, T->getRBRacketLoc());
>   S.addStmt(TYPE_VARIABLE_ARRAY_SIZE_EXPR, T->getSizeExpr());
>   S.setNodeKind(TYPE_VARIABLE_ARRAY);
> }
>
> And the Serializer type would be expected to implement a dozen or so of these addFoo methods: bool, int, string, begin/end array, begin/end substructure, SourceLocation, types, sub-statements, declaration references, maybe some cases I’m forgetting.
>
> On the deserialization side, we would promise to make deserialization calls in the same order that we make serialization calls so that we can continue to use a flat representation in our main serialization path.
>
> The current deserialization code does not actually check for failure in deserializing components, and I would probably continue that for now.
>
> I haven’t thought very carefully about what these attribute arguments would be. They could be strings, but an enum might allow clever metaprograms. Maybe some of this could be tblgen’ed.
>
> Thoughts?

I think the idea has a lot of merit and is definitely worth exploring.
Thank you for bringing the idea up! The only concern I have is if the
plan is to implement AST dumping through this interface, you should be
aware that both the default and JSON dumpers have some odd quirks that
may make it difficult to get identical output through another
interface.

~Aaron

>
> John.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



More information about the cfe-dev mailing list