[cfe-dev] RFC: Extend clang-format to support more/all C-like languages
Joshua Cranmer 🐧
Pidgeot18 at gmail.com
Sat Nov 2 10:42:21 PDT 2013
On 11/1/2013 12:15 PM, Daniel Jasper wrote:
> Start with JavaScript. Syntactically, it is very close to C++ and
> there are already different efforts going on to combine JavaScript and
> LLVM. Add an additional LanguageStandard (this flag already supports
> C++03 and C++11) in clang-format’s configuration and gate JavaScript
> specific formatting decisions (e.g. indentation of JavaScript’s
> namespace-equivalent on it).
JavaScript has some extremely different syntax that is likely to play
havoc with a straight C/C++ lexer. It also depends on exactly which
variant of JavaScript you want to support--ES5? ES6? Mozilla's JS
extensions? Support E4X as well?
1. Regular expressions. I don't recall off the top of my head, but I
believe it boils down to "/ starts a regular expression if you're
expecting an operand and is a division operator if you're not"--you'll
need to do at least enough parsing to distinguish those two cases.
2. Array comprehensions (ES 6/Mozilla JS 1.8.5 enhancements): [x for (x
in obj)], [x for each (x in obj)], [x for (x of obj)]. The middle is not
in ES 6 (it's actually a holdover from E4X that sticks around because it
was introduced well before the for-of statement was, and found
relatively widespread use in Mozilla which made the JS people keep it
around when we killed E4X), and I don't recall if the generator form
(without enclosing brackets) is in ES 6 or not.
3. Generators: function*() { yield y; yield* x; }. I don't even know
recommended style guides for the star-variant, as I only just
retrofitted my code to have them two or three days ago.
4. Object literals:
var x = {
get y() { return z; },
x: 13,
q: function () { return this.x; }
};
5. Semicolon insertion. Newlines can become semicolons in the right
circumstances (the worst misfeature in JS, IMHO).
6. Reading the ES6 draft, it supports \u in IdentifierNames just like
Java does.
7. Some other operators are in play. === and !== have been brought up/
there is also >>> and >>>= (like Java's operators), and ... (array
spread operator).
8. Some ES6 features I haven't played with that may or may not have been
added to some libraries yet: template strings, and there's also a
=>-like notation for functions IIRC.
Regular expression literal support will definitely need different lexing
paths than C/C++, although (excluding template strings and some E4X
literals--the former of which is too new to be widely supported and the
latter of which has already been ripped out from the only major engine
that it) I think it is otherwise close enough to reuse a lot of the
lexing capabilities of C-family languages. Just be forewarned that the
shallow parsing that needs to be done for JS is likely to be rather
different from that down for C/C++, even if their lexing streams look
more or less similar.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
More information about the cfe-dev
mailing list