[cfe-dev] RFC: Extend clang-format to support more/all C-like languages

Joshua Cranmer 🐧 Pidgeot18 at gmail.com
Sat Nov 2 10:42:21 PDT 2013


On 11/1/2013 12:15 PM, Daniel Jasper wrote:
> Start with JavaScript. Syntactically, it is very close to C++ and 
> there are already different efforts going on to combine JavaScript and 
> LLVM. Add an additional LanguageStandard (this flag already supports 
> C++03 and C++11) in clang-format’s configuration and gate JavaScript 
> specific formatting decisions (e.g. indentation of JavaScript’s 
> namespace-equivalent on it).

JavaScript has some extremely different syntax that is likely to play 
havoc with a straight C/C++ lexer. It also depends on exactly which 
variant of JavaScript you want to support--ES5? ES6? Mozilla's JS 
extensions? Support E4X as well?
1. Regular expressions. I don't recall off the top of my head, but I 
believe it boils down to "/ starts a regular expression if you're 
expecting an operand and is a division operator if you're not"--you'll 
need to do at least enough parsing to distinguish those two cases.
2. Array comprehensions (ES 6/Mozilla JS 1.8.5 enhancements): [x for (x 
in obj)], [x for each (x in obj)], [x for (x of obj)]. The middle is not 
in ES 6 (it's actually a holdover from E4X that sticks around because it 
was introduced well before the for-of statement was, and found 
relatively widespread use in Mozilla which made the JS people keep it 
around when we killed E4X), and I don't recall if the generator form 
(without enclosing brackets) is in ES 6 or not.
3. Generators: function*() { yield y; yield* x; }. I don't even know 
recommended style guides for the star-variant, as I only just 
retrofitted my code to have them two or three days ago.
4. Object literals:
var x = {
   get y() { return z; },
   x: 13,
   q: function () { return this.x; }
};
5. Semicolon insertion. Newlines can become semicolons in the right 
circumstances (the worst misfeature in JS, IMHO).
6. Reading the ES6 draft, it supports \u in IdentifierNames just like 
Java does.
7. Some other operators are in play. === and !== have been brought up/ 
there is also >>> and >>>= (like Java's operators), and ... (array 
spread operator).
8. Some ES6 features I haven't played with that may or may not have been 
added to some libraries yet: template strings, and there's also a 
=>-like notation for functions IIRC.

Regular expression literal support will definitely need different lexing 
paths than C/C++, although (excluding template strings and some E4X 
literals--the former of which is too new to be widely supported and the 
latter of which has already been ripped out from the only major engine 
that it) I think it is otherwise close enough to reuse a lot of the 
lexing capabilities of C-family languages. Just be forewarned that the 
shallow parsing that needs to be done for JS is likely to be rather 
different from that down for C/C++, even if their lexing streams look 
more or less similar.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist




More information about the cfe-dev mailing list