<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Nov 1, 2013, at 10:15 AM, Daniel Jasper <<a href="mailto:djasper@google.com">djasper@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div style="font-family:arial,sans-serif;font-size:13px">The context-free parser implemented in clang-format is capable of (or can easily be extended to) understand the structure of basically all C-like languages. A basic definition of C-like languages is the “Influenced” section of C’s Wikipedia page [1].</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">At most, this includes: AMPL, AWK, csh, C++, C--, C#, Objective-C, BitC, D, Go, Rust, Java, JavaScript, Limbo, LPC, Perl, PHP, Pike, Processing, Seed7, Verilog. Today, clang-format already supports C, C++, Objective-C and Objective-C++. Starting from there, it seems almost trivial to extend support to JavaScript and Java which only contain a small number of additional syntactical constructs and can be tokenized with Clang’s lexer. Eventually, we also imagine (and would love to see patches for) formatting C#, D, Go, Rust, and PHP based on their similar syntax and active usage. Maybe there are others the community would be interested in seeing?</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">The benefit of using the same format tool for all these jobs is that many of clang-format’s advanced formatting algorithms (e.g. Tex-like analysis of the entire solution space) are immediately available to the other languages.</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Concrete proposal:</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
Start with JavaScript. Syntactically, it is very close to C++ and there are already different efforts going on to combine JavaScript and LLVM. Add an additional LanguageStandard (this flag already supports C++03 and C++11) in clang-format’s configuration and gate JavaScript specific formatting decisions (e.g. indentation of JavaScript’s namespace-equivalent on it).</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">To make clang-format more useful to the LLVM project itself, support for the tblgen language seems another worthy goal that can be achieved in the same way as JavaScript.</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Thoughts? Comments? Concerns with this direction?</div></div></blockquote><br></div><div>This sounds pretty interesting to me. The only concern I would have is that we don't want Clang's lexer to have to become a "Grand unified lexer for all languages". The token rules of languages in these families are related, but different. Would it be better to have clang-format support using multiple different lexers?</div><div><br></div><div>-Chris</div><br></body></html>