[cfe-dev] RFC: abstract serialization
John McCall via cfe-dev
cfe-dev at lists.llvm.org
Thu Sep 19 10:29:34 PDT 2019
On 19 Sep 2019, at 9:20, Aaron Ballman wrote:
> On Wed, Sep 18, 2019 at 6:50 PM John McCall via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>>
>> Swift’s AST is largely self-contained, but it occasionally needs to
>> refer to entities from Clang’s AST. Up until now, we’ve only
>> needed to embed the occasional clang::Decl*, but we’ve recently
>> found a reason why it’d be useful to embed a clang::Type* That
>> creates a problem for us, because while we know how to serialize a
>> reference to an external Clang declaration (or at least a subset of
>> them), we don’t have a way to serialize a reference to an external
>> Clang type. Now, obviously we could reproduce the structure of that
>> Clang type in our serialization and deserialization code, but the
>> reason we want to use Clang’s AST in the first place is that C
>> types can have a surprising amount of structure; for example,
>> function types can have calling conventions, regparm attributes, ARC
>> parameter conventions, and all sorts of other things that have been
>> added over the years by various extensions. Including all of that
>> structure, across the entire AST, would be a significant ongoing
>> maintenance burden. Therefore, we’d rather find some way to take
>> advantage of Clang’s own serialization logic.
>>
>> At the same time, Clang has a longstanding problem with debugging
>> dumps. We have several different debugging-dump formats, and
>> they’re all pretty much destined to be incomplete because anybody
>> augmenting the AST has to remember to include the new information in
>> all the dumping code. Exhaustiveness checking lets us verify that we
>> haven’t forgotten an entire node class, but it doesn’t tell us
>> whether we’ve forgotten a field of that class. We only have one
>> piece of code that has to get that information right, and that’s
>> the serialization logic.
>>
>> I’d like to propose solving both of these problems in one pass by
>> introducing a new level of abstraction into the serializer and
>> deserializer. The basic idea is that we’d write the node-specific
>> serialization and deserialization code as if it were generating and
>> consuming some simple JSON-like structured format; it would be
>> templated to make calls against some abstract physical serialization
>> layer.
>>
>> That is, for code today that looks like this:
>>
>> void ASTTypeWriter::VisitVariableArrayType(const VariableArrayType
>> *T) {
>> VisitArrayType(T);
>> Record.AddSourceLocation(T->getLBracketLoc());
>> Record.AddSourceLocation(T->getRBracketLoc());
>> Record.AddStmt(T->getSizeExpr());
>> Code = TYPE_VARIABLE_ARRAY;
>> }
>>
>> We’d instead write something more like:
>>
>> void AbstractTypeWriter<Serializer>::VisitVariableArrayType(const
>> VariableArrayType *T) {
>> VisitArrayType(T);
>> S.addSourceLocation(TYPE_VARIABLE_ARRAY_LBRACKET_LOC,
>> T->getLBRacketLoc());
>> S.addSourceLocation(TYPE_VARIABLE_ARRAY_RBRACKET_LOC,
>> T->getRBRacketLoc());
>> S.addStmt(TYPE_VARIABLE_ARRAY_SIZE_EXPR, T->getSizeExpr());
>> S.setNodeKind(TYPE_VARIABLE_ARRAY);
>> }
>>
>> And the Serializer type would be expected to implement a dozen or so
>> of these addFoo methods: bool, int, string, begin/end array,
>> begin/end substructure, SourceLocation, types, sub-statements,
>> declaration references, maybe some cases I’m forgetting.
>>
>> On the deserialization side, we would promise to make deserialization
>> calls in the same order that we make serialization calls so that we
>> can continue to use a flat representation in our main serialization
>> path.
>>
>> The current deserialization code does not actually check for failure
>> in deserializing components, and I would probably continue that for
>> now.
>>
>> I haven’t thought very carefully about what these attribute
>> arguments would be. They could be strings, but an enum might allow
>> clever metaprograms. Maybe some of this could be tblgen’ed.
>>
>> Thoughts?
>
> I think the idea has a lot of merit and is definitely worth exploring.
> Thank you for bringing the idea up! The only concern I have is if the
> plan is to implement AST dumping through this interface, you should be
> aware that both the default and JSON dumpers have some odd quirks that
> may make it difficult to get identical output through another
> interface.
Thank you. Do people consider the existing dumper output stable?
I certainly wouldn’t.
John.
More information about the cfe-dev
mailing list