[cfe-dev] RFC: abstract serialization

Thu Sep 19 10:29:34 PDT 2019

On 19 Sep 2019, at 9:20, Aaron Ballman wrote:
> On Wed, Sep 18, 2019 at 6:50 PM John McCall via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>>
>> Swift’s AST is largely self-contained, but it occasionally needs to 
>> refer to entities from Clang’s AST. Up until now, we’ve only 
>> needed to embed the occasional clang::Decl*, but we’ve recently 
>> found a reason why it’d be useful to embed a clang::Type* That 
>> creates a problem for us, because while we know how to serialize a 
>> reference to an external Clang declaration (or at least a subset of 
>> them), we don’t have a way to serialize a reference to an external 
>> Clang type. Now, obviously we could reproduce the structure of that 
>> Clang type in our serialization and deserialization code, but the 
>> reason we want to use Clang’s AST in the first place is that C 
>> types can have a surprising amount of structure; for example, 
>> function types can have calling conventions, regparm attributes, ARC 
>> parameter conventions, and all sorts of other things that have been 
>> added over the years by various extensions. Including all of that 
>> structure, across the entire AST, would be a significant ongoing 
>> maintenance burden. Therefore, we’d rather find some way to take 
>> advantage of Clang’s own serialization logic.
>>
>> At the same time, Clang has a longstanding problem with debugging 
>> dumps. We have several different debugging-dump formats, and 
>> they’re all pretty much destined to be incomplete because anybody 
>> augmenting the AST has to remember to include the new information in 
>> all the dumping code. Exhaustiveness checking lets us verify that we 
>> haven’t forgotten an entire node class, but it doesn’t tell us 
>> whether we’ve forgotten a field of that class. We only have one 
>> piece of code that has to get that information right, and that’s 
>> the serialization logic.
>>
>> I’d like to propose solving both of these problems in one pass by 
>> introducing a new level of abstraction into the serializer and 
>> deserializer. The basic idea is that we’d write the node-specific 
>> serialization and deserialization code as if it were generating and 
>> consuming some simple JSON-like structured format; it would be 
>> templated to make calls against some abstract physical serialization 
>> layer.
>>
>> That is, for code today that looks like this:
>>
>> void ASTTypeWriter::VisitVariableArrayType(const VariableArrayType 
>> *T) {
>>   VisitArrayType(T);
>>   Record.AddSourceLocation(T->getLBracketLoc());
>>   Record.AddSourceLocation(T->getRBracketLoc());
>>   Record.AddStmt(T->getSizeExpr());
>>   Code = TYPE_VARIABLE_ARRAY;
>> }
>>
>> We’d instead write something more like:
>>
>> void AbstractTypeWriter<Serializer>::VisitVariableArrayType(const 
>> VariableArrayType *T) {
>>   VisitArrayType(T);
>>   S.addSourceLocation(TYPE_VARIABLE_ARRAY_LBRACKET_LOC, 
>> T->getLBRacketLoc());
>>   S.addSourceLocation(TYPE_VARIABLE_ARRAY_RBRACKET_LOC, 
>> T->getRBRacketLoc());
>>   S.addStmt(TYPE_VARIABLE_ARRAY_SIZE_EXPR, T->getSizeExpr());
>>   S.setNodeKind(TYPE_VARIABLE_ARRAY);
>> }
>>
>> And the Serializer type would be expected to implement a dozen or so 
>> of these addFoo methods: bool, int, string, begin/end array, 
>> begin/end substructure, SourceLocation, types, sub-statements, 
>> declaration references, maybe some cases I’m forgetting.
>>
>> On the deserialization side, we would promise to make deserialization 
>> calls in the same order that we make serialization calls so that we 
>> can continue to use a flat representation in our main serialization 
>> path.
>>
>> The current deserialization code does not actually check for failure 
>> in deserializing components, and I would probably continue that for 
>> now.
>>
>> I haven’t thought very carefully about what these attribute 
>> arguments would be. They could be strings, but an enum might allow 
>> clever metaprograms. Maybe some of this could be tblgen’ed.
>>
>> Thoughts?
>
> I think the idea has a lot of merit and is definitely worth exploring.
> Thank you for bringing the idea up! The only concern I have is if the
> plan is to implement AST dumping through this interface, you should be
> aware that both the default and JSON dumpers have some odd quirks that
> may make it difficult to get identical output through another
> interface.

Thank you.  Do people consider the existing dumper output stable?
I certainly wouldn’t.

John.