[cfe-dev] RFC: Extend clang-format to support more/all C-like languages

Fri Nov 1 11:12:46 PDT 2013

On Nov 1, 2013, at 10:15 AM, Daniel Jasper <djasper at google.com> wrote:

> The context-free parser implemented in clang-format is capable of (or can easily be extended to) understand the structure of basically all C-like languages. A basic definition of C-like languages is the “Influenced” section of C’s Wikipedia page [1].
> 
> At most, this includes: AMPL, AWK, csh, C++, C--, C#, Objective-C, BitC, D, Go, Rust, Java, JavaScript, Limbo, LPC, Perl, PHP, Pike, Processing, Seed7, Verilog. Today, clang-format already supports C, C++, Objective-C and Objective-C++. Starting from there, it seems almost trivial to extend support to JavaScript and Java which only contain a small number of additional syntactical constructs and can be tokenized with Clang’s lexer. Eventually, we also imagine (and would love to see patches for) formatting C#, D, Go, Rust, and PHP based on their similar syntax and active usage. Maybe there are others the community would be interested in seeing?
> 
> The benefit of using the same format tool for all these jobs is that many of clang-format’s advanced formatting algorithms (e.g. Tex-like analysis of the entire solution space) are immediately available to the other languages.
> 
> Concrete proposal:
> 
> Start with JavaScript. Syntactically, it is very close to C++ and there are already different efforts going on to combine JavaScript and LLVM. Add an additional LanguageStandard (this flag already supports C++03 and C++11) in clang-format’s configuration and gate JavaScript specific formatting decisions (e.g. indentation of JavaScript’s namespace-equivalent on it).
> 
> To make clang-format more useful to the LLVM project itself, support for the tblgen language seems another worthy goal that can be achieved in the same way as JavaScript.
> 
> Thoughts? Comments? Concerns with this direction?

This sounds pretty interesting to me.  The only concern I would have is that we don't want Clang's lexer to have to become a "Grand unified lexer for all languages".  The token rules of languages in these families are related, but different.  Would it be better to have clang-format support using multiple different lexers?

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20131101/c6b3146b/attachment.html>