[cfe-dev] RFC: Extend clang-format to support more/all C-like languages

Sean Silva silvas at purdue.edu
Sat Nov 2 09:26:42 PDT 2013


On Fri, Nov 1, 2013 at 1:15 PM, Daniel Jasper <djasper at google.com> wrote:

> The context-free parser implemented in clang-format is capable of (or can
> easily be extended to) understand the structure of basically all C-like
> languages. A basic definition of C-like languages is the “Influenced”
> section of C’s Wikipedia page [1].
>
> At most, this includes: AMPL, AWK, csh, C++, C--, C#, Objective-C, BitC,
> D, Go, Rust, Java, JavaScript, Limbo, LPC, Perl, PHP, Pike, Processing,
> Seed7, Verilog. Today, clang-format already supports C, C++, Objective-C
> and Objective-C++. Starting from there, it seems almost trivial to extend
> support to JavaScript and Java which only contain a small number of
> additional syntactical constructs and can be tokenized with Clang’s lexer.
> Eventually, we also imagine (and would love to see patches for) formatting
> C#, D, Go, Rust, and PHP based on their similar syntax and active usage.
> Maybe there are others the community would be interested in seeing?
>
> The benefit of using the same format tool for all these jobs is that many
> of clang-format’s advanced formatting algorithms (e.g. Tex-like analysis of
> the entire solution space) are immediately available to the other languages.
>
> Concrete proposal:
>
> Start with JavaScript. Syntactically, it is very close to C++ and there
> are already different efforts going on to combine JavaScript and LLVM. Add
> an additional LanguageStandard (this flag already supports C++03 and C++11)
> in clang-format’s configuration and gate JavaScript specific formatting
> decisions (e.g. indentation of JavaScript’s namespace-equivalent on it).
>

JavaScript's "namespace equivalent" is just an anonymous function, so I'm
not sure how you intend to detect this lexically.


>
> To make clang-format more useful to the LLVM project itself, support for
> the tblgen language seems another worthy goal that can be achieved in the
> same way as JavaScript.
>

I've been really wanting something like clang-format for tablegen. By happy
coincidence AFAIK the only situation where tablegen is lexically different
from C++ (excluding keyword differences) is a special form of string
literal that contains only C++ code, and the delimiters for the string
literal are `[{` and `}]` which will be lexed and can be recognized, and
then clang-format could recursively format the C++ code.


>
> Thoughts? Comments? Concerns with this direction?
>

If preliminary experiments show that it is feasible to work with Clang's
lexer and lexically different languages, then I think this probably makes
sense. It may just be easier (not to mention more correct) to be able to
plug in different lexers though.

-- Sean Silva


>
> If this is in line with LLVM’s/Clangs’s roadmap, we'll start working on
> the few features missing for formatting JavaScript and we should have
> rudimentary support towards the end of the year.
>
> [1] http://en.wikipedia.org/wiki/C_(programming_language)
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20131102/15a49a4a/attachment.html>


More information about the cfe-dev mailing list