[cfe-dev] facebook clang plugins
Sean Silva
chisophugis at gmail.com
Tue Jun 24 10:29:04 PDT 2014
On Tue, Jun 24, 2014 at 10:47 AM, Mathieu Baudet <mathieubaudet at fb.com>
wrote:
> Thanks for the feedback, Sean, Manuel. I had not thought about the
> ASTMatchers before, but this sounds interesting (see comments below,
> though).
>
> On my side, here are a few thoughts on extending the “ASTExporter"
> plugin to make it useful upstream. Let me stress that this is all highly
> speculative, and that I am not promising anything :)
>
> 0) With some tweaks, it seems feasible to code an ASCII-art tree-like
> output for the ASTExporter, that is close enough to the current ASTDumper.
>
> 1) Similarly, it should be easy to make the ASTExporter emit binary
> outputs instead of Json (e.g. in this format:
> http://mjambon.com/biniou-format.txt which ocaml understands)
>
If we have to choose a binary format, I think we should stick with the LLVM
bitcode binary format <http://llvm.org/docs/BitCodeFormat.html>.
>
> 2) Then, one could write a dual “ASTImporter”, together with a binary
> parsing library, so as to provide a full alternative solution for AST
> (de)serialization.
>
> At this stage, the resulting code would be about the same size, but
> arguably more modular than the current binary (de)serialization (since it
> would support Json and possibly text outputs).
> The binary and Json (de)serialization could be tested alone (write then
> read) and also by interoperating with Ocaml (if we choose the format above).
> To be fair, the whole thing could also be a little less efficient (in
> size and time) because of the use of a uniform format.
>
> 3) With more work, the schema (currently “atd” annotations) could be
> used to generate a in-memory representation of the AST in terms of
> tree-like plain data (“PODS"). The ASTExporter and ASTImporter classes,
> plus appropriate generated modules, would take care of the translation to
> and from this “protocol" representation.
>
> At this stage, it should be rather easy to meta-generate visitors and
> perhaps matchers on these PODS. However, be aware that one would not
> visit/match the original AST, but a copy of it, with a different style of
> data structures. To me, this observation will always hold if we try to
> generate visitors or matchers directly (i.e. without generating first an
> in-memory copy of the AST) from a language-agnostic schema.
>
While a "protocol" representation would be revolutionary for language
bindings, I think an important first step to bringing this in-tree is to
using the schema to directly generate C++ code interoperating with the
current C++ data structure.
Realistically, the schema must be something that in-tree developers will
*want* to keep up to date because it saves them a lot of time and effort.
Otherwise, it will be considered as some sort of parasitic decoration on
the AST and will not be kept up to date (not through malice, but simply
through the "this doesn't affect me so I sometimes forget its there"
effect). And a schema that one can't be confident is kept up to date is
almost worse than no schema at all.
> Lastly, I wouldn’t expect a meta-generated API to match an existing
> handwritten API word for word, even if the general style can be maintained.
>
As I mentioned to Manuel, largely it would be about simplifying the
implementation. A lot of the matchers just involve stamping out some
repetitive piece of code (e.g. for each AST node type, or for particular
methods on nodes, etc.).
However, an alternative matching paradigm would also be great. In
particular, I think that a "protocol" representation would be really
beneficial for pushing interaction with the AST to the next level.
Currently, the matchers are basically "grep-like", in that they go through
the entire AST looking for matching nodes. A "protocol" representation with
a proper schema would significantly simplify e.g. indexing the AST and
other common techniques for speeding up searches.
-- Sean Silva
> — Mathieu
>
> On Jun 21, 2014, at 4:16 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
>
>
> On Sat, Jun 21, 2014 at 3:52 AM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Sat, Jun 21, 2014 at 2:33 AM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>>
>>> I'd just like to say that even if OCaml tools parsing JSON is out of
>>> scope as Nico suggests, the work you have done to "schematize" the Clang
>>> AST could be the start of something really useful for upstream. Currently,
>>> I can think of at least two places that would benefit greatly from having
>>> such a schema as a "single point of truth": Serialization and ASTMatchers.
>>>
>>> Being able to auto-generate those two from a schema (maybe in the form
>>> of annotations in the header files) plus a relatively small amount of
>>> generator code could eliminate thousands of lines of code.
>>>
>>> If RecursiveASTVisitor (~2500 lines) and TreeTransform (~10k lines)
>>> could also be generated from the "single point of truth" with relatively
>>> little code, then that would be a tremendous savings.
>>>
>>> I think your OCaml tool would be quite easy to write with Clang
>>> annotated as such, but you could let the Clang developers maintain the
>>> annotations for you ;)
>>>
>>> This might also pave the way for a more "data-driven" approach to the
>>> DynamicASTMatchers, which could significantly reduce the binary size (which
>>> is enormous, and IIRC is mostly due to the fact that is just
>>> pre-instantiates all the static templates). The same approach might work
>>> for the "static" ASTMatchers too, letting the compiler essentially
>>> constant-fold all the indirections (which will largely be member pointers I
>>> imagine). This might also improve compile time (which is an issue; see
>>> http://llvm.org/bugs/show_bug.cgi?id=20061
>>> <https://urldefense.proofpoint.com/v1/url?u=http://llvm.org/bugs/show_bug.cgi?id%3D20061&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=DySJRSwIPwJgrlWcFuOjhjgW2TvEV7mDN%2BhK5RWHkOA%3D%0A&m=ahGUYFk4kISsh5qTf%2FzXoauvNBJxMH0u%2FRQtQpTMgRg%3D%0A&s=cda8f1c864387fec5f14d60863d8defe002d4f16803825e36fcf3aad2c1d0eb8>
>>> ).
>>>
>>
>> Generating code for the AST matchers is something we would like, but I
>> think it's orthogonal to the general design of the matchers (we still want
>> the functional composition), so I'm not sure how it would help with the
>> things you mention (apart from getting rid of the manually written
>> matchers, which would still be a big win).
>>
>
> I was talking about a possible simplification of the implementation, not
> the API it exposes to users.
>
>
>>
>> One of the big problems would be how we auto-generate the documentation
>> for the AST matchers, though. I agree that it would be better to have the
>> documentation (and the examples) on the nodes, but that'd be a lot of work.
>>
>>
> The documentation that you guys have produced for the matchers is quite
> good and could largely be reused/shared/migrated/adapted to the relevant
> part of the AST itself. (As you said, it would be a lot of work though).
>
>
> -- Sean Silva
>
>
>>
>>>
>>> -- Sean Silva
>>>
>>>
>>> On Thu, Jun 19, 2014 at 10:30 AM, Mathieu Baudet <mathieubaudet at fb.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am looking for feedback on the possibility of contributing some of
>>>> the clang plugins used at Facebook back to clang.
>>>>
>>>> We just made available a first subset of plugins here:
>>>> https://github.com/facebook/facebook-clang-plugins
>>>> <https://urldefense.proofpoint.com/v1/url?u=https://github.com/facebook/facebook-clang-plugins&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=DySJRSwIPwJgrlWcFuOjhjgW2TvEV7mDN%2BhK5RWHkOA%3D%0A&m=ahGUYFk4kISsh5qTf%2FzXoauvNBJxMH0u%2FRQtQpTMgRg%3D%0A&s=f7149a4ff8c412eceabd4f8931a1d5c734cab475367ea4308908e75349230417>
>>>>
>>>> The plugins fall into two groups:
>>>> 1) Clang analyzer checkers for iOS;
>>>> 2) A clang frontend plugin to export the internal AST of clang in an
>>>> Ocaml-friendly Json. This plugin comes with Ocaml libraries for testing,
>>>> parsing, and visiting the AST.
>>>>
>>>> Except for the naming conventions, which are not uniform yet, and the
>>>> need to update the referenced version of clang, the code should be in a
>>>> relatively good state. In particular, everything has been tested quite at
>>>> scale.
>>>>
>>>> Thanks!
>>>> —
>>>> Mathieu Baudet
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>> <https://urldefense.proofpoint.com/v1/url?u=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=DySJRSwIPwJgrlWcFuOjhjgW2TvEV7mDN%2BhK5RWHkOA%3D%0A&m=ahGUYFk4kISsh5qTf%2FzXoauvNBJxMH0u%2FRQtQpTMgRg%3D%0A&s=f0cfd44d8db0133c3518d14dd9cf7b68f62b2f735fa8c51d6c6ab3795d2c4b6d>
>>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>> <https://urldefense.proofpoint.com/v1/url?u=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=DySJRSwIPwJgrlWcFuOjhjgW2TvEV7mDN%2BhK5RWHkOA%3D%0A&m=ahGUYFk4kISsh5qTf%2FzXoauvNBJxMH0u%2FRQtQpTMgRg%3D%0A&s=f0cfd44d8db0133c3518d14dd9cf7b68f62b2f735fa8c51d6c6ab3795d2c4b6d>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140624/83d140e7/attachment.html>
More information about the cfe-dev
mailing list